Right. I'm familiar with the map/reduce process and the proposed improvements.
This part of the blog threw me off: "as the map/reduce tasks now run in parallel over both the nodes in the cluster and within the same node (multiple threads)" To me, it implies that there are now multiple map threads per node. Further, I thought that the map / reduce 'working set' was limited to what was in memory. I did not realize that map / reduce would iterate over all of the data both in memory and on disk. That is good to hear, though I'm curious if it will apply to all cache stores (e.g. LevelDB) and how ISPN map / reduce handles a data set that is greater than the available memory. A lot in-memory stores face this limitation when backed by on-disk stores. If the data is retrieved one entry at a time, I don't see how multiple threads will help. However, if it is retrieved in bulk I can see how it might. Not entirely sure. Shane ----- Original Message ----- From: "Vladimir Blagojevic" <[email protected]> To: "infinispan -Dev List" <[email protected]> Cc: "Shane Johnson" <[email protected]> Sent: Tuesday, September 17, 2013 10:32:39 AM Subject: Re: [infinispan-dev] blog on new cache store API Shane, When MapReduce command arrives on the Infinispan node it is execute on a single thread that carries the incoming message. I have done preliminary work on multithreaded execution [1] but I have not get around to complete it. The main idea is that incoming thread submits a task to executor that in turns splits map/reduce work on multiple threads and executes work. Once work is completed incoming thread is given result to return response back. I am not sure how Mircea implemented parallel iteration in stores but it is definitely a different beast. Although I agree with him that parallel reading from stores definitely helps. The above thread we mentioned will wait for reading from stores much less. Hope it all makes more sense now! Regards, Vladimir [1] https://github.com/vblagoje/infinispan/tree/t_2284 On 13-09-16 4:59 PM, Shane Johnson wrote: > But there are now multiple map threads per node? Or, is there one map thread > and multiple cache store threads? I'm not sure how a single map thread could > benefit from multiple cache store threads. > > Shane > > ----- Original Message ----- > From: "Mircea Markus" <[email protected]> > To: "infinispan -Dev List" <[email protected]> > Sent: Monday, September 16, 2013 3:31:01 PM > Subject: Re: [infinispan-dev] blog on new cache store API > > > On Sep 16, 2013, at 7:11 PM, Shane Johnson <[email protected]> wrote: > >> "parallel iteration: it is now possible to iterate over entries in the store >> with multiple threads in parallel. Map/Reduce tasks immediately benefit from >> this, as the map/reduce tasks now run in parallel over both the nodes in >> the cluster and within the same node (multiple threads)" >> >> Does this apply to entries in the cache as well (ISPN-2284)? > no, only the entries in the store are iterated in parallel. > > Cheers, _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
