Re: [jr3] Synchronized sessions

2010-02-26 Thread Andrey Adamovich
Hi!

+1 for all that Thomas said.

 Andrey 





From: Thomas Müller thomas.muel...@day.com
To: dev@jackrabbit.apache.org
Sent: Thu, 25 February, 2010 22:24:13
Subject: Re: [jr3] Synchronized sessions

Hi

 http://issues.apache.org/jira/browse/JCR-2443.

Unfortunately this bug doesn't have a test case. Also I didn't find a
thread dump that shows what the problem was exactly. I can't say what
was the problem there.

Observation is definitely an area where synchronization can
potentially lead to deadlocks. Maybe observation needs to use its own
session(s) so that it can't block. This is not a new issue however:
most writes are already synchronized (not all writes however).

I'm hesitant to change synchronization with the current
implementation: doing that would very likely lead to Java level
deadlocks. We need to make sure synchronization is always done on the
same level, and in the same order. With the current implementation,
that's challenging.

Of course performance and concurrency is very important. But the
current approach (mutable data structures, some writes are
synchronized) is quite dangerous. Instead, immutable data structures
should be used, at least for values and objects in the shared cache.
Everything else should be properly synchronized if mutable, or - if
that's too slow - the proper data structures should be used, for
example ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet.

Regards,
Thomas



  

Re: [jr3] Synchronized sessions

2010-02-26 Thread Jukka Zitting
Hi,

On Thu, Feb 25, 2010 at 10:24 PM, Thomas Müller thomas.muel...@day.com wrote:
 http://issues.apache.org/jira/browse/JCR-2443.

 Unfortunately this bug doesn't have a test case. Also I didn't find a
 thread dump that shows what the problem was exactly. I can't say what
 was the problem there.

This problem was caused by observation delivery trying to synchronize
on the receiving session while that session was waiting for an
internal lock that the event source was still holding. I had a thread
dump and could reproduce this behaviour in our CQ5 product (search for
JCR-2443 in our issue tracker) where it occurred as a result of heavy
concurrent writing and observation, but writing a standalone test case
proved quite challenging.

The good thing about this is that without the synchronization we would
sooner or later have seen cases where the state of the internal map
got corrupted due to concurrent updates. Such problems would have been
much more difficult to troubleshoot than the deadlock that clearly
pointed to the incorrect locking order. I'm quite sure that some of
the hairier Jackrabbit issues we see reported may in fact be caused by
exactly such concurrent session access without proper synchronization.

BR,

Jukka Zitting


[jira] Created: (JCR-2521) WorkspaceImporter throws exception

2010-02-26 Thread Tobias Bocanegra (JIRA)
WorkspaceImporter throws exception
--

 Key: JCR-2521
 URL: https://issues.apache.org/jira/browse/JCR-2521
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-core
Reporter: Tobias Bocanegra
Assignee: Tobias Bocanegra


sessio.getWorkspace().getImportContentHandler() throws 
java.lang.UnsupportedOperationException: Workspace-Import of protected nodes: 
Not yet implement.

suggest to issue warning instead of throwing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2521) WorkspaceImporter throws exception

2010-02-26 Thread Tobias Bocanegra (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Bocanegra resolved JCR-2521.
---

   Resolution: Fixed
Fix Version/s: 2.1.0

fixed as suggested.

 WorkspaceImporter throws exception
 --

 Key: JCR-2521
 URL: https://issues.apache.org/jira/browse/JCR-2521
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-core
Reporter: Tobias Bocanegra
Assignee: Tobias Bocanegra
 Fix For: 2.1.0


 sessio.getWorkspace().getImportContentHandler() throws 
 java.lang.UnsupportedOperationException: Workspace-Import of protected nodes: 
 Not yet implement.
 suggest to issue warning instead of throwing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (JCR-2522) unable to workspace import XML.

2010-02-26 Thread Tobias Bocanegra (JIRA)
unable to workspace import XML.
---

 Key: JCR-2522
 URL: https://issues.apache.org/jira/browse/JCR-2522
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-jcr-server
Reporter: Tobias Bocanegra
Assignee: Tobias Bocanegra


tika detects xml as application/xml thus breaking the 
org.apache.jackrabbit.server.io.XmlHandler
which just checks for text/xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (JCR-2522) unable to workspace import XML.

2010-02-26 Thread Tobias Bocanegra (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Bocanegra resolved JCR-2522.
---

   Resolution: Fixed
Fix Version/s: 2.1.0

fixed by accepting 2ndary mime type: application/xml

 unable to workspace import XML.
 ---

 Key: JCR-2522
 URL: https://issues.apache.org/jira/browse/JCR-2522
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: jackrabbit-jcr-server
Reporter: Tobias Bocanegra
Assignee: Tobias Bocanegra
 Fix For: 2.1.0


 tika detects xml as application/xml thus breaking the 
 org.apache.jackrabbit.server.io.XmlHandler
 which just checks for text/xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jr3] EventJournal / who merges changes

2010-02-26 Thread Ian Boston
Sorry for top posting, I am not certain where to put this request.

Currently adding child nodes is almost serialised since its not possible to 
merge concurrent changes in a single multi valued property.
*If* MVCC with abort on conflict is going to make this situation worse, then 
that IMHO would be a mistake.
If however the the probability of conflict when updating a multivalued property 
is reduced then that would be good. (ie giving certain properties a different 
storage layout that avoided conflicts, I think you elude to this)

eg
At the moment, (JR16) when adding users to our jackrabbit (Sling) based system, 
we have to do this single threaded to avoid conflicts, since even with 3 
threads, conflicts are far too common. To reduce contention put the new nodes 
sharded tree (eg .../ff/ff/ff/ff/user_node ), but we still get lots of 
contention, estimated at 1 in 20 operations for the first 20K users, worse at 
the start. (btw, num of users ranges 10K -4M).

Ian
On 25 Feb 2010, at 13:38, Thomas Müller wrote:

 There are low level merge and high level merge. A low level
 merge is problematic it can result in unexpected behavior. I would
 even say the way Jackrabbit merges changes currently (by looking at
 the data itself, not at the operations) is problematic.
 
 Example: Currently, orderBefore can not be done at the same time as
 addNode or another orderBefore. I'm not saying this is important, but
 it's one case that is complex. Another example: Let's say the low
 level representation would split nodes if here are more than 1000
 child nodes (add one layer of hidden internal nodes). That means
 adding a node to a list of 1000 nodes could cause a (b-tree-) split.
 If two sessions do that concurrently it will get messy. Session 1 will
 create new internal nodes, session 2 will create new internal nodes as
 well (but different ones), and merging the result will (probably)
 duplicate all 1000 nodes. Or worse.
 
 The idea is to _not_ try to merge by looking at the data, but merge by
 re-applying the operation. If saving the new data fails (by looking at
 the timestamp/version numbers), then refresh the data, and re-apply
 the operation (orderBefore, addNode,...). This is relatively easy
 to implement, and works in more cases than what Jackrabbit can do now.
 Jackrabbit anyway needs to keep the EventJournal, so this is will not
 use more memory.
 
 This is not a new idea, it is how MVCC works (at least how I
 understand it). From
 http://en.wikipedia.org/wiki/Multiversion_concurrency_control  - if a
 transaction [fails], the transaction ... is aborted and restarted.
 
 Regards,
 Thomas



Re: [jr3] EventJournal / who merges changes

2010-02-26 Thread Thomas Müller
Hi Ian,

Could you describe your use case?

 probability of conflict when updating a multivalued property is reduced

What methods do you call, and how should the conflict be resolved?
Example: if you currently use the following code:

1) session1.getNode(test).setProperty(multi, new String[]{a, b},..);
2) session2.getNode(test).setProperty(multi, new String[]{d, e},..);
3) session1.save();
4) session2.save();

Then that would be a conflict. How would you resolve it? One option is
to silently overwrite in line 4.

Regards,
Thomas


Re: [jr3] Synchronized sessions

2010-02-26 Thread Marcel Reutegger
Hi,

On Thu, Feb 25, 2010 at 19:14, Felix Meschberger fmesc...@gmail.com wrote:
 Hi,

 On 25.02.2010 17:55, Marcel Reutegger wrote:
 Hi,

 On Thu, Feb 25, 2010 at 15:49, Felix Meschberger fmesc...@gmail.com wrote:
 Hi,

 On 24.02.2010 21:19, Thomas Müller wrote:
 Hi,

 deadlocks

 I think it's relatively simple to synchronize all methods on the session.

 Yes, but this creates a big potential for deadlocks ...


 If we want to make sessions thread-safe, we should use proper
 implementations.

 Yes, that's what I want to write: a proper implementation.

 I disagree that this would be a proper implementation.

 can you please elaborate what you think is a proper implementation in
 this context?

 Just off-the-top-of-my-head: Using a better read-mostly guarding locking
 mechanism (i.e. readers don't block each other, writers need exclusive
 access [still not entirely save]); not at the global method level, but
 more intelligently guarding the shared data; not using the Session
 object itself for locking

that's pretty much what we do currently. we have read-write locks in
various places. the problem with that approach is that the sequence of
lock acquisition is very important. such fine grained locking is very
difficult to control and lead to various deadlock situation that were
hard to analyse and fix. IMO we should get rid of as many of those
locks as we can and synchronize on a more coarse grained level, which
is easier to maintain and more predictable.

regards
 marcel

 Regards
 Felix


 regards
  marcel


 any concurrent use of the same session is unsupported.

 The disadvantage of this is that there is no way to enforce correct
 usage. In some cases, incorrect usage leads to data corruption. I
 believe data corrupt is not acceptable, even if the user made a
 mistake.

 Anything can go wrong -- and if people do the wrong things, well, fine,
 let them do ...

 And I don't say, we should not make Session thread-safe. But if we set
 out to do it, we should do it right. And just synchronizing all methods
 is just not right.

 Regards
 Felix







Re: [jr3] Synchronized sessions

2010-02-26 Thread Felix Meschberger
Hi,

On 26.02.2010 14:38, Marcel Reutegger wrote:
 Hi,
 
 On Thu, Feb 25, 2010 at 19:14, Felix Meschberger fmesc...@gmail.com wrote:
 Hi,

 On 25.02.2010 17:55, Marcel Reutegger wrote:
 Hi,

 On Thu, Feb 25, 2010 at 15:49, Felix Meschberger fmesc...@gmail.com wrote:
 Hi,

 On 24.02.2010 21:19, Thomas Müller wrote:
 Hi,

 deadlocks

 I think it's relatively simple to synchronize all methods on the session.

 Yes, but this creates a big potential for deadlocks ...


 If we want to make sessions thread-safe, we should use proper
 implementations.

 Yes, that's what I want to write: a proper implementation.

 I disagree that this would be a proper implementation.

 can you please elaborate what you think is a proper implementation in
 this context?

 Just off-the-top-of-my-head: Using a better read-mostly guarding locking
 mechanism (i.e. readers don't block each other, writers need exclusive
 access [still not entirely save]); not at the global method level, but
 more intelligently guarding the shared data; not using the Session
 object itself for locking
 
 that's pretty much what we do currently. we have read-write locks in
 various places. the problem with that approach is that the sequence of
 lock acquisition is very important. such fine grained locking is very
 difficult to control and lead to various deadlock situation that were
 hard to analyse and fix. IMO we should get rid of as many of those
 locks as we can and synchronize on a more coarse grained level, which
 is easier to maintain and more predictable.

The problem is, that the code is very complicated, not to say convoluted
in places. Combined with mulitple locking mechanisms (JVM locks and
read-write locks) this creates much potential for deadlocks, right.

But to take the sledge hammer of synchronizing all methods out of the
box IMVHO is the wrong way to go.

Regards
Felix

 
 regards
  marcel
 
 Regards
 Felix


 regards
  marcel


 any concurrent use of the same session is unsupported.

 The disadvantage of this is that there is no way to enforce correct
 usage. In some cases, incorrect usage leads to data corruption. I
 believe data corrupt is not acceptable, even if the user made a
 mistake.

 Anything can go wrong -- and if people do the wrong things, well, fine,
 let them do ...

 And I don't say, we should not make Session thread-safe. But if we set
 out to do it, we should do it right. And just synchronizing all methods
 is just not right.

 Regards
 Felix





 



Re: [jr3] Synchronized sessions

2010-02-26 Thread Jukka Zitting
Hi,

On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger fmesc...@gmail.com wrote:
 Consider two threads T1 and T2 each modifying data from the same session:

  T1 makes some modifications
  T2 makes some modifications
  T1 saves the session (incl. both T1's and T2's modifs)
  T2 makes some more modifications
  T2 decides to rollback

 At the end the content is inconsistent from the POV of T2 because some
 modifications have been persistent and some haven't.

This has nothing to do with synchronizing session access. If T2 wants
a separate transient space, it should use a separate session.

All we're trying to achieve here is ensure internal consistency even
when clients do something like the above (for whatever reason,
intentional or not).

BR,

Jukka Zitting


Re: [jr3] Synchronized sessions

2010-02-26 Thread Guo Du
On Fri, Feb 26, 2010 at 6:11 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
 All we're trying to achieve here is ensure internal consistency even
 when clients do something like the above (for whatever reason,
 intentional or not).

jdbc connection is not thread safe.

jcr session works similar way and I prefer follow the same pattern.

We should promote developer to do the right thing instead of
technically encourage them to do bad design. Shared session is only
useful for read access in some case, if it's related to write, should
not share the session.

-Guo


[jira] Updated: (JCR-2426) Deadlock in lucene (Jackrabbit 1.4.4)

2010-02-26 Thread Antonio Martinez (JIRA)

 [ 
https://issues.apache.org/jira/browse/JCR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Martinez updated JCR-2426:
--

Attachment: deadlock_jackrabbit1.6.txt

 Deadlock in lucene (Jackrabbit 1.4.4)
 -

 Key: JCR-2426
 URL: https://issues.apache.org/jira/browse/JCR-2426
 Project: Jackrabbit Content Repository
  Issue Type: Bug
  Components: indexing
Affects Versions: core 1.4.4
Reporter: Antonio Martinez
Priority: Blocker
 Attachments: deadlock_2nd_setup.txt, deadlock_jackrabbit1.6.txt, 
 deadlock_summary.txt


 We get a deadlock in lucene part of jackrabbit (see deadlock_summary.txt)
 This issue has been observed in two different production setups running 
 Jackrabbit 1.4.4 in cluster configuration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jr3] Synchronized sessions

2010-02-26 Thread Felix Meschberger
Hi,

On 26.02.2010 19:11, Jukka Zitting wrote:
 Hi,
 
 On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger fmesc...@gmail.com wrote:
 Consider two threads T1 and T2 each modifying data from the same session:

  T1 makes some modifications
  T2 makes some modifications
  T1 saves the session (incl. both T1's and T2's modifs)
  T2 makes some more modifications
  T2 decides to rollback

 At the end the content is inconsistent from the POV of T2 because some
 modifications have been persistent and some haven't.
 
 This has nothing to do with synchronizing session access. If T2 wants
 a separate transient space, it should use a separate session.
 
 All we're trying to achieve here is ensure internal consistency even
 when clients do something like the above (for whatever reason,
 intentional or not).

To what avail ?

Quoting Thomas Müller:

 In some cases, incorrect usage leads to data corruption. I
 believe data corrupt is not acceptable, even if the user made a
 mistake.

Now, you say, actual data consistency is not the goal, but internal
concistency is. All end-users (not the ones doing the coding) really
care about is data consistency. They don't care for a distinciton of
internal and external consistency.

Plus: This is *not* about users like my grand-mother who never touched a
computer in her life. This is about programmers who must adhere to a
programming model and to API contracts. If they do not, it is their
fault and they have to live with the consequences of their doing the
wrong thing.

If a Session can do better when used concurrently, fine. But not with
synchronizing all methods.

E.g.: How about taking note of the current thread when the transient
space is first used by a thread making modifications. As soon as another
thread is trying to use the same transient space, an exception might be
thrown. This way the transient space is owned by a session until
refresh or commit.

This is IMHO super-simple, fast and safe.

The only thing to care about -- and find a solution -- is, that the
Session might effectively become read-only if a thread starts modifying
content and then abandons without commit or refresh.

Regards
Felix