On 10/09/2014 03:40 PM, William Burns wrote:
> Actually this was something I was hoping to get to possibly in the near 
> future.
>
> I already have to do https://issues.jboss.org/browse/ISPN-4358 which
> will require rewriting parts of the distributed entry iterator.  In
> doing so I was planning on breaking this out to a more generic
> framework where you could run a given operation by segment
> guaranteeing it was only ran once per entry.  In doing so I was
> thinking I could try to move M/R on top of this to allow it to also be
> resilient to rehash events.
>
> Additional comments inline.
>
> On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard <emman...@hibernate.org> 
> wrote:
>> Pedro and I have been having discussions with the LEADS guys on their 
>> experience of Map / Reduce especially around stability during topology 
>> changes.
>>
>> This ties to the .size() thread you guys have been exchanging on (I only 
>> could read it partially).
>>
>> On the requirements, theirs is pretty straightforward and expected I think 
>> from most users.
>> They are fine with inconsistencies with entries create/updated/deleted 
>> between the M/R start and the end.
>
> There is no way we can fix this without adding a very strict isolation
> level like SERIALIZABLE.

Snapshot Isolation should be fine, but I don't wanna enter in discussion 
about it right now :)

>
>> They are *not* fine with seeing the same key/value several time for the 
>> duration of the M/R execution. This AFAIK can happen when a topology change 
>> occurs.
>
> This can happen if it was processed on one node and then rehash
> migrates the entry to another and runs it there.
>
>>
>> Here is a proposal.
>> Why not run the M/R job not per node but rather per segment?
>> The point is that segments are stable across topology changes. The M/R tasks 
>> would then be about iterating over the keys in a given segment.
>>
>> The M/R request would send the task per segments on each node where the 
>> segment is primary.
>
> This is exactly what the iterator does today but also watches for
> rehashes to send the request to a new owner when the segment moves
> between nodes.
>
>> (We can imagine interesting things like sending it to one of the backups for 
>> workload optimization purposes or sending it to both primary and backups and 
>> to comparisons).
>> The M/R requester would be in an interesting situation. It could detect that 
>> a segment M/R never returns and trigger a new computation on another node 
>> than the one initially sent.
>>
>> One tricky question around that is when the M/R job store data in an 
>> intermediary state. We need some sort of way to expose the user indirectly 
>> to segments so that we can evict per segment intermediary caches in case of 
>> failure or retry.
>
> This was one place I was thinking I would need to take special care to
> look into when doing a conversion like this.
>
>>
>> But before getting ahead of ourselves, what do you thing of the general 
>> idea? Even without retry framework, this approach would be more stable than 
>> our current per node approach during topology changes and improve 
>> dependability.
>
> Doing it solely based on segment would remove the possibility of
> having duplicates.  However without a mechanism to send a new request
> on rehash it would be possible to only find a subset of values (if a
> segment is removed while iterating on it).

true. I think the retry mechanism is the best approach. other 
alternative, would be to implement a Map<K,V> getBySegment(int) 
operations that goes remote if the segment is not local.

>
>>
>> Emmanuel
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to