Regarding the complications pointed out for returning a remote iterator from 
takeMultiple:
--
(1) One can size the batch to make best balance network bandwidth and
latency.
--

That's currently done by combination of server-side takeMultipleLimit and 
client-side maxEntries.   If we use a remote iterator, I assume we'd include 
takeMultipleBatchSize and retain the client-side maxEntries.   So I don't see 
this as being substantially different.

--
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
--

Hmm, why would this not be the case under a remote iterator?   I would think 
that the correct behavior would be to release locks after a timeout expires 
regardless of whether the return type was an iterator or collection.

--
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.
--

Users would still be free to make multiple calls with small values for 
maxEntries if they so chose.   They would also gain the ability to make an 
unbounded request, which is currently lacking, outside of repeated calls.

--
(4) [gleaned from text] More bookkeeping is necessary.
--

Certainly.   Also, we'd have to work out the precise semantics that the 
iterator operates under and make them clear in the documentation.

--
(5) [gleaned from text] A remote iterator would certainly be less performant 
than a straight batch take.
--

This is the biggest concern, I think.   As such, I'd be interested in seeing 
performance runs, to back up the intuition.   Then, at least, we'd know 
precisely what trade-off we're talking about.

The test would need to cover both small batches and large, both in multiples of 
the batch-size/takeMultipleLimit and for numbers off of those multiples, with 
transactions and without.

jamesG

-----Original Message-----
From: "Dan Creswell" <[email protected]>
Sent: Wednesday, December 22, 2010 5:23am
To: [email protected]
Subject: Re: Space/outrigger suggestions (remote iterator vs. collection)

Hey,

So the below means you are indeed following my explanation so to your
question:

Yes, you could use a remote iterator style of thing but for take it's quite
a heavyweight construct especially once you have transactions in the way.
The core implementation itself is very similar to contents and would have
for the most part similar performance. However, it'd certainly be less
performant than a straight batch take.

More of a concern though is the impact on other clients of the space
implementation: by virtue of lots of book-keeping, the most exclusive locks
on entry's and long running transactions that inflict delays on other
clients leading to poor scaling. Contents by virtue of it's read nature is a
little less painful performance wise and for a lot of applications you'd
pass no transaction which reduces performance pain further.

So I'd say that batch take is probably a better tradeoff than a take/remote
iterator combo because:

(1) One can size the batch to make best balance network bandwidth and
latency.
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.

In essence one puts the control squarely with the user so's they can get
what they want albeit at the price of some API asymmetry as you correctly
point out.

As an implementer, I could reduce my codebase a little if we did takes with
a remote iterator but being completely honest, not by enough that I'd
support a spec change for that reason alone.

HTH,

Dan.

Reply via email to