Re: [jr3] Synchronized sessions
Hi! +1 for all that Thomas said. Andrey From: Thomas Müller thomas.muel...@day.com To: dev@jackrabbit.apache.org Sent: Thu, 25 February, 2010 22:24:13 Subject: Re: [jr3] Synchronized sessions Hi http://issues.apache.org/jira/browse/JCR-2443. Unfortunately this bug doesn't have a test case. Also I didn't find a thread dump that shows what the problem was exactly. I can't say what was the problem there. Observation is definitely an area where synchronization can potentially lead to deadlocks. Maybe observation needs to use its own session(s) so that it can't block. This is not a new issue however: most writes are already synchronized (not all writes however). I'm hesitant to change synchronization with the current implementation: doing that would very likely lead to Java level deadlocks. We need to make sure synchronization is always done on the same level, and in the same order. With the current implementation, that's challenging. Of course performance and concurrency is very important. But the current approach (mutable data structures, some writes are synchronized) is quite dangerous. Instead, immutable data structures should be used, at least for values and objects in the shared cache. Everything else should be properly synchronized if mutable, or - if that's too slow - the proper data structures should be used, for example ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, On Thu, Feb 25, 2010 at 10:24 PM, Thomas Müller thomas.muel...@day.com wrote: http://issues.apache.org/jira/browse/JCR-2443. Unfortunately this bug doesn't have a test case. Also I didn't find a thread dump that shows what the problem was exactly. I can't say what was the problem there. This problem was caused by observation delivery trying to synchronize on the receiving session while that session was waiting for an internal lock that the event source was still holding. I had a thread dump and could reproduce this behaviour in our CQ5 product (search for JCR-2443 in our issue tracker) where it occurred as a result of heavy concurrent writing and observation, but writing a standalone test case proved quite challenging. The good thing about this is that without the synchronization we would sooner or later have seen cases where the state of the internal map got corrupted due to concurrent updates. Such problems would have been much more difficult to troubleshoot than the deadlock that clearly pointed to the incorrect locking order. I'm quite sure that some of the hairier Jackrabbit issues we see reported may in fact be caused by exactly such concurrent session access without proper synchronization. BR, Jukka Zitting
[jira] Created: (JCR-2521) WorkspaceImporter throws exception
WorkspaceImporter throws exception -- Key: JCR-2521 URL: https://issues.apache.org/jira/browse/JCR-2521 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Reporter: Tobias Bocanegra Assignee: Tobias Bocanegra sessio.getWorkspace().getImportContentHandler() throws java.lang.UnsupportedOperationException: Workspace-Import of protected nodes: Not yet implement. suggest to issue warning instead of throwing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2521) WorkspaceImporter throws exception
[ https://issues.apache.org/jira/browse/JCR-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Bocanegra resolved JCR-2521. --- Resolution: Fixed Fix Version/s: 2.1.0 fixed as suggested. WorkspaceImporter throws exception -- Key: JCR-2521 URL: https://issues.apache.org/jira/browse/JCR-2521 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-core Reporter: Tobias Bocanegra Assignee: Tobias Bocanegra Fix For: 2.1.0 sessio.getWorkspace().getImportContentHandler() throws java.lang.UnsupportedOperationException: Workspace-Import of protected nodes: Not yet implement. suggest to issue warning instead of throwing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (JCR-2522) unable to workspace import XML.
unable to workspace import XML. --- Key: JCR-2522 URL: https://issues.apache.org/jira/browse/JCR-2522 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-jcr-server Reporter: Tobias Bocanegra Assignee: Tobias Bocanegra tika detects xml as application/xml thus breaking the org.apache.jackrabbit.server.io.XmlHandler which just checks for text/xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (JCR-2522) unable to workspace import XML.
[ https://issues.apache.org/jira/browse/JCR-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tobias Bocanegra resolved JCR-2522. --- Resolution: Fixed Fix Version/s: 2.1.0 fixed by accepting 2ndary mime type: application/xml unable to workspace import XML. --- Key: JCR-2522 URL: https://issues.apache.org/jira/browse/JCR-2522 Project: Jackrabbit Content Repository Issue Type: Bug Components: jackrabbit-jcr-server Reporter: Tobias Bocanegra Assignee: Tobias Bocanegra Fix For: 2.1.0 tika detects xml as application/xml thus breaking the org.apache.jackrabbit.server.io.XmlHandler which just checks for text/xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jr3] EventJournal / who merges changes
Sorry for top posting, I am not certain where to put this request. Currently adding child nodes is almost serialised since its not possible to merge concurrent changes in a single multi valued property. *If* MVCC with abort on conflict is going to make this situation worse, then that IMHO would be a mistake. If however the the probability of conflict when updating a multivalued property is reduced then that would be good. (ie giving certain properties a different storage layout that avoided conflicts, I think you elude to this) eg At the moment, (JR16) when adding users to our jackrabbit (Sling) based system, we have to do this single threaded to avoid conflicts, since even with 3 threads, conflicts are far too common. To reduce contention put the new nodes sharded tree (eg .../ff/ff/ff/ff/user_node ), but we still get lots of contention, estimated at 1 in 20 operations for the first 20K users, worse at the start. (btw, num of users ranges 10K -4M). Ian On 25 Feb 2010, at 13:38, Thomas Müller wrote: There are low level merge and high level merge. A low level merge is problematic it can result in unexpected behavior. I would even say the way Jackrabbit merges changes currently (by looking at the data itself, not at the operations) is problematic. Example: Currently, orderBefore can not be done at the same time as addNode or another orderBefore. I'm not saying this is important, but it's one case that is complex. Another example: Let's say the low level representation would split nodes if here are more than 1000 child nodes (add one layer of hidden internal nodes). That means adding a node to a list of 1000 nodes could cause a (b-tree-) split. If two sessions do that concurrently it will get messy. Session 1 will create new internal nodes, session 2 will create new internal nodes as well (but different ones), and merging the result will (probably) duplicate all 1000 nodes. Or worse. The idea is to _not_ try to merge by looking at the data, but merge by re-applying the operation. If saving the new data fails (by looking at the timestamp/version numbers), then refresh the data, and re-apply the operation (orderBefore, addNode,...). This is relatively easy to implement, and works in more cases than what Jackrabbit can do now. Jackrabbit anyway needs to keep the EventJournal, so this is will not use more memory. This is not a new idea, it is how MVCC works (at least how I understand it). From http://en.wikipedia.org/wiki/Multiversion_concurrency_control - if a transaction [fails], the transaction ... is aborted and restarted. Regards, Thomas
Re: [jr3] EventJournal / who merges changes
Hi Ian, Could you describe your use case? probability of conflict when updating a multivalued property is reduced What methods do you call, and how should the conflict be resolved? Example: if you currently use the following code: 1) session1.getNode(test).setProperty(multi, new String[]{a, b},..); 2) session2.getNode(test).setProperty(multi, new String[]{d, e},..); 3) session1.save(); 4) session2.save(); Then that would be a conflict. How would you resolve it? One option is to silently overwrite in line 4. Regards, Thomas
Re: [jr3] Synchronized sessions
Hi, On Thu, Feb 25, 2010 at 19:14, Felix Meschberger fmesc...@gmail.com wrote: Hi, On 25.02.2010 17:55, Marcel Reutegger wrote: Hi, On Thu, Feb 25, 2010 at 15:49, Felix Meschberger fmesc...@gmail.com wrote: Hi, On 24.02.2010 21:19, Thomas Müller wrote: Hi, deadlocks I think it's relatively simple to synchronize all methods on the session. Yes, but this creates a big potential for deadlocks ... If we want to make sessions thread-safe, we should use proper implementations. Yes, that's what I want to write: a proper implementation. I disagree that this would be a proper implementation. can you please elaborate what you think is a proper implementation in this context? Just off-the-top-of-my-head: Using a better read-mostly guarding locking mechanism (i.e. readers don't block each other, writers need exclusive access [still not entirely save]); not at the global method level, but more intelligently guarding the shared data; not using the Session object itself for locking that's pretty much what we do currently. we have read-write locks in various places. the problem with that approach is that the sequence of lock acquisition is very important. such fine grained locking is very difficult to control and lead to various deadlock situation that were hard to analyse and fix. IMO we should get rid of as many of those locks as we can and synchronize on a more coarse grained level, which is easier to maintain and more predictable. regards marcel Regards Felix regards marcel any concurrent use of the same session is unsupported. The disadvantage of this is that there is no way to enforce correct usage. In some cases, incorrect usage leads to data corruption. I believe data corrupt is not acceptable, even if the user made a mistake. Anything can go wrong -- and if people do the wrong things, well, fine, let them do ... And I don't say, we should not make Session thread-safe. But if we set out to do it, we should do it right. And just synchronizing all methods is just not right. Regards Felix
Re: [jr3] Synchronized sessions
Hi, On 26.02.2010 14:38, Marcel Reutegger wrote: Hi, On Thu, Feb 25, 2010 at 19:14, Felix Meschberger fmesc...@gmail.com wrote: Hi, On 25.02.2010 17:55, Marcel Reutegger wrote: Hi, On Thu, Feb 25, 2010 at 15:49, Felix Meschberger fmesc...@gmail.com wrote: Hi, On 24.02.2010 21:19, Thomas Müller wrote: Hi, deadlocks I think it's relatively simple to synchronize all methods on the session. Yes, but this creates a big potential for deadlocks ... If we want to make sessions thread-safe, we should use proper implementations. Yes, that's what I want to write: a proper implementation. I disagree that this would be a proper implementation. can you please elaborate what you think is a proper implementation in this context? Just off-the-top-of-my-head: Using a better read-mostly guarding locking mechanism (i.e. readers don't block each other, writers need exclusive access [still not entirely save]); not at the global method level, but more intelligently guarding the shared data; not using the Session object itself for locking that's pretty much what we do currently. we have read-write locks in various places. the problem with that approach is that the sequence of lock acquisition is very important. such fine grained locking is very difficult to control and lead to various deadlock situation that were hard to analyse and fix. IMO we should get rid of as many of those locks as we can and synchronize on a more coarse grained level, which is easier to maintain and more predictable. The problem is, that the code is very complicated, not to say convoluted in places. Combined with mulitple locking mechanisms (JVM locks and read-write locks) this creates much potential for deadlocks, right. But to take the sledge hammer of synchronizing all methods out of the box IMVHO is the wrong way to go. Regards Felix regards marcel Regards Felix regards marcel any concurrent use of the same session is unsupported. The disadvantage of this is that there is no way to enforce correct usage. In some cases, incorrect usage leads to data corruption. I believe data corrupt is not acceptable, even if the user made a mistake. Anything can go wrong -- and if people do the wrong things, well, fine, let them do ... And I don't say, we should not make Session thread-safe. But if we set out to do it, we should do it right. And just synchronizing all methods is just not right. Regards Felix
Re: [jr3] Synchronized sessions
Hi, On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger fmesc...@gmail.com wrote: Consider two threads T1 and T2 each modifying data from the same session: T1 makes some modifications T2 makes some modifications T1 saves the session (incl. both T1's and T2's modifs) T2 makes some more modifications T2 decides to rollback At the end the content is inconsistent from the POV of T2 because some modifications have been persistent and some haven't. This has nothing to do with synchronizing session access. If T2 wants a separate transient space, it should use a separate session. All we're trying to achieve here is ensure internal consistency even when clients do something like the above (for whatever reason, intentional or not). BR, Jukka Zitting
Re: [jr3] Synchronized sessions
On Fri, Feb 26, 2010 at 6:11 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: All we're trying to achieve here is ensure internal consistency even when clients do something like the above (for whatever reason, intentional or not). jdbc connection is not thread safe. jcr session works similar way and I prefer follow the same pattern. We should promote developer to do the right thing instead of technically encourage them to do bad design. Shared session is only useful for read access in some case, if it's related to write, should not share the session. -Guo
[jira] Updated: (JCR-2426) Deadlock in lucene (Jackrabbit 1.4.4)
[ https://issues.apache.org/jira/browse/JCR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Martinez updated JCR-2426: -- Attachment: deadlock_jackrabbit1.6.txt Deadlock in lucene (Jackrabbit 1.4.4) - Key: JCR-2426 URL: https://issues.apache.org/jira/browse/JCR-2426 Project: Jackrabbit Content Repository Issue Type: Bug Components: indexing Affects Versions: core 1.4.4 Reporter: Antonio Martinez Priority: Blocker Attachments: deadlock_2nd_setup.txt, deadlock_jackrabbit1.6.txt, deadlock_summary.txt We get a deadlock in lucene part of jackrabbit (see deadlock_summary.txt) This issue has been observed in two different production setups running Jackrabbit 1.4.4 in cluster configuration -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jr3] Synchronized sessions
Hi, On 26.02.2010 19:11, Jukka Zitting wrote: Hi, On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger fmesc...@gmail.com wrote: Consider two threads T1 and T2 each modifying data from the same session: T1 makes some modifications T2 makes some modifications T1 saves the session (incl. both T1's and T2's modifs) T2 makes some more modifications T2 decides to rollback At the end the content is inconsistent from the POV of T2 because some modifications have been persistent and some haven't. This has nothing to do with synchronizing session access. If T2 wants a separate transient space, it should use a separate session. All we're trying to achieve here is ensure internal consistency even when clients do something like the above (for whatever reason, intentional or not). To what avail ? Quoting Thomas Müller: In some cases, incorrect usage leads to data corruption. I believe data corrupt is not acceptable, even if the user made a mistake. Now, you say, actual data consistency is not the goal, but internal concistency is. All end-users (not the ones doing the coding) really care about is data consistency. They don't care for a distinciton of internal and external consistency. Plus: This is *not* about users like my grand-mother who never touched a computer in her life. This is about programmers who must adhere to a programming model and to API contracts. If they do not, it is their fault and they have to live with the consequences of their doing the wrong thing. If a Session can do better when used concurrently, fine. But not with synchronizing all methods. E.g.: How about taking note of the current thread when the transient space is first used by a thread making modifications. As soon as another thread is trying to use the same transient space, an exception might be thrown. This way the transient space is owned by a session until refresh or commit. This is IMHO super-simple, fast and safe. The only thing to care about -- and find a solution -- is, that the Session might effectively become read-only if a thread starts modifying content and then abandons without commit or refresh. Regards Felix