On 10/09/2014 03:40 PM, William Burns wrote: > Actually this was something I was hoping to get to possibly in the near > future. > > I already have to do https://issues.jboss.org/browse/ISPN-4358 which > will require rewriting parts of the distributed entry iterator. In > doing so I was planning on breaking this out to a more generic > framework where you could run a given operation by segment > guaranteeing it was only ran once per entry. In doing so I was > thinking I could try to move M/R on top of this to allow it to also be > resilient to rehash events. > > Additional comments inline. > > On Thu, Oct 9, 2014 at 8:18 AM, Emmanuel Bernard <emman...@hibernate.org> > wrote: >> Pedro and I have been having discussions with the LEADS guys on their >> experience of Map / Reduce especially around stability during topology >> changes. >> >> This ties to the .size() thread you guys have been exchanging on (I only >> could read it partially). >> >> On the requirements, theirs is pretty straightforward and expected I think >> from most users. >> They are fine with inconsistencies with entries create/updated/deleted >> between the M/R start and the end. > > There is no way we can fix this without adding a very strict isolation > level like SERIALIZABLE.
Snapshot Isolation should be fine, but I don't wanna enter in discussion about it right now :) > >> They are *not* fine with seeing the same key/value several time for the >> duration of the M/R execution. This AFAIK can happen when a topology change >> occurs. > > This can happen if it was processed on one node and then rehash > migrates the entry to another and runs it there. > >> >> Here is a proposal. >> Why not run the M/R job not per node but rather per segment? >> The point is that segments are stable across topology changes. The M/R tasks >> would then be about iterating over the keys in a given segment. >> >> The M/R request would send the task per segments on each node where the >> segment is primary. > > This is exactly what the iterator does today but also watches for > rehashes to send the request to a new owner when the segment moves > between nodes. > >> (We can imagine interesting things like sending it to one of the backups for >> workload optimization purposes or sending it to both primary and backups and >> to comparisons). >> The M/R requester would be in an interesting situation. It could detect that >> a segment M/R never returns and trigger a new computation on another node >> than the one initially sent. >> >> One tricky question around that is when the M/R job store data in an >> intermediary state. We need some sort of way to expose the user indirectly >> to segments so that we can evict per segment intermediary caches in case of >> failure or retry. > > This was one place I was thinking I would need to take special care to > look into when doing a conversion like this. > >> >> But before getting ahead of ourselves, what do you thing of the general >> idea? Even without retry framework, this approach would be more stable than >> our current per node approach during topology changes and improve >> dependability. > > Doing it solely based on segment would remove the possibility of > having duplicates. However without a mechanism to send a new request > on rehash it would be possible to only find a subset of values (if a > segment is removed while iterating on it). true. I think the retry mechanism is the best approach. other alternative, would be to implement a Map<K,V> getBySegment(int) operations that goes remote if the segment is not local. > >> >> Emmanuel >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev