Regarding the complications pointed out for returning a remote iterator from takeMultiple: -- (1) One can size the batch to make best balance network bandwidth and latency. --
That's currently done by combination of server-side takeMultipleLimit and client-side maxEntries. If we use a remote iterator, I assume we'd include takeMultipleBatchSize and retain the client-side maxEntries. So I don't see this as being substantially different. -- (2) One can limit the time a collection of exclusive locks are held under a transaction by virtue of the timeout. -- Hmm, why would this not be the case under a remote iterator? I would think that the correct behavior would be to release locks after a timeout expires regardless of whether the return type was an iterator or collection. -- (3) Batching in this way allows multiple clients to remove and process entrys in a more scalable fashion than with a (unbounded or no entry limit) remote iterator. -- Users would still be free to make multiple calls with small values for maxEntries if they so chose. They would also gain the ability to make an unbounded request, which is currently lacking, outside of repeated calls. -- (4) [gleaned from text] More bookkeeping is necessary. -- Certainly. Also, we'd have to work out the precise semantics that the iterator operates under and make them clear in the documentation. -- (5) [gleaned from text] A remote iterator would certainly be less performant than a straight batch take. -- This is the biggest concern, I think. As such, I'd be interested in seeing performance runs, to back up the intuition. Then, at least, we'd know precisely what trade-off we're talking about. The test would need to cover both small batches and large, both in multiples of the batch-size/takeMultipleLimit and for numbers off of those multiples, with transactions and without. jamesG -----Original Message----- From: "Dan Creswell" <[email protected]> Sent: Wednesday, December 22, 2010 5:23am To: [email protected] Subject: Re: Space/outrigger suggestions (remote iterator vs. collection) Hey, So the below means you are indeed following my explanation so to your question: Yes, you could use a remote iterator style of thing but for take it's quite a heavyweight construct especially once you have transactions in the way. The core implementation itself is very similar to contents and would have for the most part similar performance. However, it'd certainly be less performant than a straight batch take. More of a concern though is the impact on other clients of the space implementation: by virtue of lots of book-keeping, the most exclusive locks on entry's and long running transactions that inflict delays on other clients leading to poor scaling. Contents by virtue of it's read nature is a little less painful performance wise and for a lot of applications you'd pass no transaction which reduces performance pain further. So I'd say that batch take is probably a better tradeoff than a take/remote iterator combo because: (1) One can size the batch to make best balance network bandwidth and latency. (2) One can limit the time a collection of exclusive locks are held under a transaction by virtue of the timeout. (3) Batching in this way allows multiple clients to remove and process entrys in a more scalable fashion than with a (unbounded or no entry limit) remote iterator. In essence one puts the control squarely with the user so's they can get what they want albeit at the price of some API asymmetry as you correctly point out. As an implementer, I could reduce my codebase a little if we did takes with a remote iterator but being completely honest, not by enough that I'd support a spec change for that reason alone. HTH, Dan.
