Re: [jr3] EventJournal / who merges changes

2010-02-27 Thread Stefan Guggisberg
On Sat, Feb 27, 2010 at 11:45 AM, Ian Boston  wrote:
>
> On 26 Feb 2010, at 12:43, Thomas Müller wrote:
>
>> Hi Ian,
>>
>> Could you describe your use case?
>
> Multiple threads adding child nodes to the same parent node, which IIUC are 
> stored as multivalue properties in 1.x

not quite, the parent-child relationship is currently (i.e. jr
1.x-2.0) stored as a list of name-uuid pairs in the parent state.

cheers
stefan

>
> For instance, more than 1 thread calling 
> UserManager.createUser(userId,shardPath(useId)) where shardPath(userId) 
> results in a subtree generated from the userId to reduce contention, although 
> we still see it.
>
>
>>
>>> probability of conflict when updating a multivalued property is reduced
>>
>> What methods do you call, and how should the conflict be resolved?
>> Example: if you currently use the following code:
>>
>> 1) session1.getNode("test").setProperty("multi", new String[]{"a", "b"},..);
>> 2) session2.getNode("test").setProperty("multi", new String[]{"d", "e"},..);
>> 3) session1.save();
>> 4) session2.save();
>>
>> Then that would be a conflict. How would you resolve it? One option is
>> to silently overwrite in line 4.
>
> yes although this is not really a big problem for us since the application 
> code can do something.
> In the past we have used a custom in JVM lock manager to control this type of 
> update where we know there is high concurrency. We avoided cluster wide JCR 
> locks since they generate extra cluster traffic at a latency that doesn't 
> really help the locking problem.
>
> To recap, the reason I said multivalue is because I thought child nodes are 
> essentially multivalue properties in 1.x,

that's not correct, see my previous comment.

cheers
stefan

> not because we see conflicts with setProperty.
>
> Ian
>
>
>>
>> Regards,
>> Thomas
>
>


Re: [jr3] EventJournal / who merges changes

2010-02-27 Thread Thomas Müller
Hi,

> Multiple threads adding child nodes to the same parent node

Yes, that's an important use case, and should not be a problem problem
for my proposed solution.

> For instance, more than 1 thread calling 
> UserManager.createUser(userId,shardPath(useId)) where shardPath(userId) 
> results in a subtree generated from the userId to reduce contention

If we support large child node lists (automatically split using hidden
inner nodes) then your application would get simpler.

> child nodes are essentially multivalue properties

You are right, internally child nodes are stored in a similar way currently.

Regards,
Thomas


Re: [jr3] EventJournal / who merges changes

2010-02-27 Thread Ian Boston

On 26 Feb 2010, at 12:43, Thomas Müller wrote:

> Hi Ian,
> 
> Could you describe your use case?

Multiple threads adding child nodes to the same parent node, which IIUC are 
stored as multivalue properties in 1.x 

For instance, more than 1 thread calling 
UserManager.createUser(userId,shardPath(useId)) where shardPath(userId) results 
in a subtree generated from the userId to reduce contention, although we still 
see it.


> 
>> probability of conflict when updating a multivalued property is reduced
> 
> What methods do you call, and how should the conflict be resolved?
> Example: if you currently use the following code:
> 
> 1) session1.getNode("test").setProperty("multi", new String[]{"a", "b"},..);
> 2) session2.getNode("test").setProperty("multi", new String[]{"d", "e"},..);
> 3) session1.save();
> 4) session2.save();
> 
> Then that would be a conflict. How would you resolve it? One option is
> to silently overwrite in line 4.

yes although this is not really a big problem for us since the application code 
can do something.
In the past we have used a custom in JVM lock manager to control this type of 
update where we know there is high concurrency. We avoided cluster wide JCR 
locks since they generate extra cluster traffic at a latency that doesn't 
really help the locking problem.

To recap, the reason I said multivalue is because I thought child nodes are 
essentially multivalue properties in 1.x, not because we see conflicts with 
setProperty.

Ian


> 
> Regards,
> Thomas



Re: [jr3] EventJournal / who merges changes

2010-02-26 Thread Thomas Müller
Hi Ian,

Could you describe your use case?

> probability of conflict when updating a multivalued property is reduced

What methods do you call, and how should the conflict be resolved?
Example: if you currently use the following code:

1) session1.getNode("test").setProperty("multi", new String[]{"a", "b"},..);
2) session2.getNode("test").setProperty("multi", new String[]{"d", "e"},..);
3) session1.save();
4) session2.save();

Then that would be a conflict. How would you resolve it? One option is
to silently overwrite in line 4.

Regards,
Thomas


Re: [jr3] EventJournal / who merges changes

2010-02-26 Thread Ian Boston
Sorry for top posting, I am not certain where to put this "request".

Currently adding child nodes is almost serialised since its not possible to 
merge concurrent changes in a single multi valued property.
*If* MVCC with abort on conflict is going to make this situation worse, then 
that IMHO would be a mistake.
If however the the probability of conflict when updating a multivalued property 
is reduced then that would be good. (ie giving certain properties a different 
storage layout that avoided conflicts, I think you elude to this)

eg
At the moment, (JR16) when adding users to our jackrabbit (Sling) based system, 
we have to do this single threaded to avoid conflicts, since even with 3 
threads, conflicts are far too common. To reduce contention put the new nodes 
sharded tree (eg .../ff/ff/ff/ff/user_node ), but we still get lots of 
contention, estimated at 1 in 20 operations for the first 20K users, worse at 
the start. (btw, num of users ranges 10K ->4M).

Ian
On 25 Feb 2010, at 13:38, Thomas Müller wrote:

> There are "low level merge" and "high level merge". A "low level
> merge" is problematic it can result in unexpected behavior. I would
> even say the way Jackrabbit merges changes currently (by looking at
> the data itself, not at the operations) is problematic.
> 
> Example: Currently, orderBefore can not be done at the same time as
> addNode or another orderBefore. I'm not saying this is important, but
> it's one case that is complex. Another example: Let's say the low
> level representation would split nodes if here are more than 1000
> child nodes (add one layer of hidden internal nodes). That means
> adding a node to a list of 1000 nodes could cause a (b-tree-) split.
> If two sessions do that concurrently it will get messy. Session 1 will
> create new internal nodes, session 2 will create new internal nodes as
> well (but different ones), and merging the result will (probably)
> duplicate all 1000 nodes. Or worse.
> 
> The idea is to _not_ try to merge by looking at the data, but merge by
> re-applying the operation. If saving the new data fails (by looking at
> the timestamp/version numbers), then refresh the data, and re-apply
> the operation ("orderBefore", "addNode",...). This is relatively easy
> to implement, and works in more cases than what Jackrabbit can do now.
> Jackrabbit anyway needs to keep the EventJournal, so this is will not
> use more memory.
> 
> This is not a new idea, it is how MVCC works (at least how I
> understand it). From
> http://en.wikipedia.org/wiki/Multiversion_concurrency_control  - "if a
> transaction [fails], the transaction ... is aborted and restarted."
> 
> Regards,
> Thomas



Re: [jr3] EventJournal / who merges changes

2010-02-25 Thread Thomas Müller
There are "low level merge" and "high level merge". A "low level
merge" is problematic it can result in unexpected behavior. I would
even say the way Jackrabbit merges changes currently (by looking at
the data itself, not at the operations) is problematic.

Example: Currently, orderBefore can not be done at the same time as
addNode or another orderBefore. I'm not saying this is important, but
it's one case that is complex. Another example: Let's say the low
level representation would split nodes if here are more than 1000
child nodes (add one layer of hidden internal nodes). That means
adding a node to a list of 1000 nodes could cause a (b-tree-) split.
If two sessions do that concurrently it will get messy. Session 1 will
create new internal nodes, session 2 will create new internal nodes as
well (but different ones), and merging the result will (probably)
duplicate all 1000 nodes. Or worse.

The idea is to _not_ try to merge by looking at the data, but merge by
re-applying the operation. If saving the new data fails (by looking at
the timestamp/version numbers), then refresh the data, and re-apply
the operation ("orderBefore", "addNode",...). This is relatively easy
to implement, and works in more cases than what Jackrabbit can do now.
Jackrabbit anyway needs to keep the EventJournal, so this is will not
use more memory.

This is not a new idea, it is how MVCC works (at least how I
understand it). From
http://en.wikipedia.org/wiki/Multiversion_concurrency_control  - "if a
transaction [fails], the transaction ... is aborted and restarted."

Regards,
Thomas


Re: [jr3] EventJournal / who merges changes

2010-02-25 Thread Michael Dürig

Alexander Klimetschek wrote:

On Thu, Feb 25, 2010 at 13:28, Michael Dürig  wrote:

I like this approach. In general I think merging is very difficult if not
impossible at all to get right and will always be based on some assumptions
(intended semantics). I think we should not put this burden onto the
micro-kernel but rather leave it to higher levels (or even end users) to do
the/some merging.


Doing so wouldn't prevent a specific micro-kernel implementation from
still trying to do a merge? Because I think this could still be
interesting when following eventual consistency models with
distributed cluster nodes.



Sure but I think we should restrict this to the cases where 1) the merge 
semantics is 'obivious' and 2) the performance hit is neglectable.


Michael


Re: [jr3] EventJournal / who merges changes

2010-02-25 Thread Alexander Klimetschek
On Thu, Feb 25, 2010 at 13:28, Michael Dürig  wrote:
> I like this approach. In general I think merging is very difficult if not
> impossible at all to get right and will always be based on some assumptions
> (intended semantics). I think we should not put this burden onto the
> micro-kernel but rather leave it to higher levels (or even end users) to do
> the/some merging.

Doing so wouldn't prevent a specific micro-kernel implementation from
still trying to do a merge? Because I think this could still be
interesting when following eventual consistency models with
distributed cluster nodes.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com


Re: [jr3] EventJournal / who merges changes

2010-02-25 Thread Michael Dürig


Thomas,


== Proposed Solution ==

When adding/changing/removing a property or node, the logical
operation should be recorded on a high level ("this node was added",
"this node was moved from here to there", "this property was added"),
first in memory, but when there are changes, it needs to be persisted
(possibly only temporarily).

When committing a transaction (usually Session.save()), the
micro-kernel tries to apply the changes. If there was a conflict, the
micro-kernel rejects the changes (it doesn't try to merge). The higher
level then has to deal with that. One way to deal with conflict
resolution is:

1) Reload the current persistent state (undo all changes, load the new data).

2) Replay the logical operations from the (in-memory or persisted) journal.

3) If that fails again, depending on a timeout, go to 1) or fail.


I like this approach. In general I think merging is very difficult if 
not impossible at all to get right and will always be based on some 
assumptions (intended semantics). I think we should not put this burden 
onto the micro-kernel but rather leave it to higher levels (or even end 
users) to do the/some merging.


Regarding 3) I'd include detailed error information on failure: which 
operations are part of the conflict and what constitutes the conflict. 
This makes it easier for the consuming API (end user) to resolve conflicts.



== API ==

Instead of the current API that requires the change log to be in
memory, I suggest to use iterators:


Maybe better iterables? More flexible if multiple playbacks are needed.

Michael