Re: A List implementation backed by multiple small arrays rather than the traditional single large array.

Joe Kearney Tue, 13 Apr 2010 04:52:32 -0700

Hi Kevin, Martin,

To add another discussion point, I've been writing a draft/proof-of-concept
of retrofitting the List interface onto ArrayDeque. This works over the raw
array, it doesn't use the fancier structures being discussed elsewhere on
this list that deal with splitting huge arrays into arraylets, or that
provide for O(1) insert in the middle.


http://code.google.com/p/libjoe/source/browse/trunk/src/joe/collect/CircularArrayList.java

I'd be interested if you have any comments in the context of this
discussion. The code is not entirely ready yet, a couple of tests fail
(6/789) because of a corner case I haven't nailed yet, but the idea is there
at least. I'd like to add array shrinking later, when the size dips below
capacity*0.4 perhaps, to avoid flickering up and down around...

Tests show performance to be close to ArrayList for the O(1)
operations. Timings for indexed reads and writes showed
no discernible difference between implementations last time I ran the
tests. I don't understand at the moment why the iterator add at index
size/3, size/2 perform 30% slower than ArrayList on smaller lists, nor the
dodgy numbers for ArrayList.insert(5), I'll look at this soon.  Those
operations that become O(1) in a circular implementation (that are
implemented and tested here) are faster than in ArrayList. Insert/remove in
the middle are somewhat faster than ArrayList because we only have to copy
at most half of the elements, except when resizing the array.

Kevin, I don't fully understand your point about not amortizing to O(1).
Certainly that's true for insert not at head or tail. Otherwise this
implementation only moves array elements to the front on an array resize
operation which happens every O(ln n) operations at most, if we do lots of
adds, maybe a little more if we add array shrinking too.  This is the same
as ArrayList. Are you just referring to the add-in-the-middle case?

Some performance results below, code for these is in the repository above
too. This was the second run, after a warmup.

Thanks,
Joe

------------------------------------------------ CircularArrayList
------------------------------------------------
  size         add         get         set   iterAdd/3   iterAdd/2
insert(5)   removeRnd   removeMid   remove(0)
    10          20          67          70         125         102
 90         240         191         138
   100          19          67          70         166         138
 94         230         194         118
  1000          28          64          67         681         538
 91         324         382         119
 10000          30          65          67        5884        4425
 94        1296        2330         124
---------------------------------------------------- ArrayList
----------------------------------------------------
  size         add         get         set   iterAdd/3   iterAdd/2
insert(5)   removeRnd   removeMid   remove(0)
    10          23          68          70         100          69
32913         162         130         105
   100          20          67          70         129         104
21944         169         134         135
  1000          29          63          67         651         506
 9602         364         333         526
 10000          30          63          66        5878        4414
 9947        2312        2280        4437

2010/4/13 Kevin L. Stern <kevin.l.st...@gmail.com>

> Hi Martin,
>
> I had intended to address your request for absolute O(1) operations in the
> previous email.  The approach to achieving this suggested in
> [Brodnik99resizablearrays] is tantamount to making ArrayList operations
> absolute O(1) by keeping around an array of size (3/2)*n and filling it with
> a constant number of entries from the main array each time add is called.
> Although this distributes the work done during a resize across the n
> operations required to enter a resize-required state, it is at the expense
> of additional memory usage and slower add operations.  My thought is that
> this would be a fine approach for a real-time application that requires hard
> guarantees on performance but would be a liability in so many Java
> applications that do not require these hard guarantees.  I look forward to
> hearing your thoughts on the matter, though.
>
> Kevin
>
>
> On Tue, Apr 13, 2010 at 6:18 AM, Kevin L. Stern 
> <kevin.l.st...@gmail.com>wrote:
>
>> Hi Martin,
>>
>> It's interesting to note that the old circular list trick will not suffice
>> to turn this data structure into a deque since we might be copying all n
>> elements back to the front = 0 position every n^(1/2) operations (add
>> wouldn't amortize to O(1)).  We could use the old two stacks trick (push
>> elements onto one stack, flip (the bottom) half (of) the elements to the
>> 'other' stack when the 'other' stack becomes empty), mentioned in
>> [Brodnik99resizablearrays], but I find this to be a bit CS 101.  In
>> [Brodnik99resizablearrays] the authors suggest a method for making all
>> blocks roughly the same size, allowing us to expand/shrink capacity at the
>> beginning or the end; this is the approach that I will take to create a
>> deque.
>>
>> The FAQ for the Sun Contributor Agreement Q3 (
>> http://www.sun.com/software/opensource/contributor_agreement.jsp#sa_3)
>> indicates that one should check with the project to determine where the SCA
>> should be sent.  Do you know where I would find this information?
>>
>> Kevin
>>
>> @MISC{Brodnik99resizablearrays,
>>     author = {Andrej Brodnik and Svante Carlsson and Erik D. Demaine and
>> J. Ian Munro and Robert Sedgewick},
>>     title = {Resizable Arrays in Optimal Time and Space},
>>     year = {1999}
>>
>> }
>>
>> On Sun, Apr 11, 2010 at 4:17 PM, Martin Buchholz <marti...@google.com>wrote:
>>
>>> Hi Kevin,
>>>
>>> Thanks for your continuing work on this.
>>>
>>> I like the test results, and agree with your analysis.
>>> I'm especially happy that you're beating
>>> ArrayList at some operations.
>>>
>>> I'd like to see O(1) addition at the beginning,
>>> implement both List and Deque (I regret
>>> our not having done this with ArrayDeque).
>>>
>>> An additional property that would be nice to
>>> have (but don't try too hard)
>>> is to provide some kind of real-time
>>> guarantees on the cost of an individual operation,
>>> not just amortized time.  E.g. ArrayList.add
>>> is worst-case O(n), making it unsuitable for use
>>> in some real-time applications.
>>>
>>> I will help get your changes into the obvious
>>> software distributions.  I assume you're happy
>>> with having this class included in any of
>>> Doug Lea's jsr166, guava-libraries, or the JDK itself.
>>> You should sign a Sun contributor agreement,
>>> or whatever the Oracle equivalent is,
>>> if you have not done so yet.
>>>
>>> Doug Lea likes public domain,
>>> guava-libraries likes the Apache license.
>>>
>>> We should get various people a chance to give
>>> a thumbs up on the design of this class -
>>> Doug Lea, Josh Bloch.
>>>
>>> Martin
>>>
>>> On Sun, Apr 11, 2010 at 09:32, Kevin L. Stern <kevin.l.st...@gmail.com>
>>> wrote:
>>> > Hello Martin,
>>> >
>>> > I spent some time this weekend trying to bring out bugs in the
>>> > implementation; I believe the latest version to be in decent shape.  I
>>> have
>>> > also gathered some data on the performance of ChunkedArrayList over
>>> > ArrayList using the latest 1.6 JDK, which I've included below (note
>>> that the
>>> > numbers represent the time spent performing the specified operation
>>> with
>>> > ChunkedArrayList over the time spent with ArrayList, so 1.00 indicates
>>> > equivalent performance, < 1.00 indicates that ChunkedArrayList is less
>>> > costly and > 1.00 indicates that ArrayList is less costly).  I've
>>> noticed
>>> > relatively significant variability in a few of the numbers when I
>>> switch
>>> > hardware; though, these data do seem to represent rough performance
>>> > expectations.  For my test I generated x elements and then timed the
>>> process
>>> > of adding them to ArrayList/ChunkedArrayList, then I performed a get
>>> > operation on each for indices 0 through x-1 and finally I used the
>>> iterator
>>> > mechanism to retrieve the first through xth element (of course, I
>>> performed
>>> > each of these operations multiple times throwing away the timing for
>>> the
>>> > first few iterations to warm up the JVM).
>>> >
>>> > Regarding the question of whether or not this belongs in java.util, I
>>> would
>>> > suggest that if it is desirable from a GC point of view to eliminate
>>> the
>>> > large backing array from ArrayList then your suggestion of achieving
>>> this by
>>> > way of a data structure that is both time and space optimal is a
>>> > particularly elegant solution as it not only guarantees that no backing
>>> > array will be larger than sqrt(n) elements but it also provides dynamic
>>> > shrinking behavior, has less maximum memory overhead than ArrayList,
>>> and
>>> > copies (asymptotically) fewer elements during a resize than ArrayList.
>>> Of
>>> > course, this data structure does not do everything better than
>>> ArrayList; in
>>> > particular, indexed access is more costly, due to the required
>>> decomposition
>>> > of the index into backing array index and offset and the additional
>>> memory
>>> > indirection, and insertion-at-an-index is more costly, due to the
>>> multiple
>>> > array copies necessary to complete the shift.  That being said, I think
>>> that
>>> > the additional cost of indexed access is partially mitigated by the
>>> > availability of iterator and listIterator, whose implementations do not
>>> use
>>> > the index decomposition procedure, and the additional cost of
>>> > insertion-at-an-index is partially mitigated by the fact that
>>> > insertion-at-an-index is already an undesirable operation on ArrayList
>>> due
>>> > to its linear time complexity.
>>> >
>>> > Kevin
>>> >
>>> > 1000000 elements:
>>> > Client JVM:
>>> > Add to ChunkedArrayList over ArrayList: 1.30
>>> > Indexed access ChunkedArrayList over ArrayList: 1.80
>>> > Iterator ChunkedArrayList over ArrayList: 0.52
>>> >
>>> > Server JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.81
>>> > Indexed access ChunkedArrayList over ArrayList: 2.87
>>> > Iterator ChunkedArrayList over ArrayList: 1.31
>>> >
>>> > 100000 elements:
>>> > Client JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.96
>>> > Indexed access ChunkedArrayList over ArrayList: 1.86
>>> > Iterator ChunkedArrayList over ArrayList: 0.48
>>> >
>>> > Server JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.96
>>> > Indexed access ChunkedArrayList over ArrayList: 1.89
>>> > Iterator ChunkedArrayList over ArrayList: 2.68
>>> >
>>> > 10000 elements:
>>> > Client JVM:
>>> > Add to ChunkedArrayList over ArrayList: 1.04
>>> > Indexed access ChunkedArrayList over ArrayList: 2.33
>>> > Iterator ChunkedArrayList over ArrayList: 0.53
>>> >
>>> > Server JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.97
>>> > Indexed access ChunkedArrayList over ArrayList: 2.45
>>> > Iterator ChunkedArrayList over ArrayList: 2.52
>>> >
>>> > 1000 elements:
>>> > Client JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.99
>>> > Indexed access ChunkedArrayList over ArrayList: 2.27
>>> > Iterator ChunkedArrayList over ArrayList: 0.54
>>> >
>>> > Server JVM:
>>> > Add to ChunkedArrayList over ArrayList: 0.84
>>> > Indexed access ChunkedArrayList over ArrayList: 1.23
>>> > Iterator ChunkedArrayList over ArrayList: 1.11
>>> >
>>> >
>>> > On Fri, Apr 9, 2010 at 7:42 PM, Martin Buchholz <marti...@google.com>
>>> wrote:
>>> >>
>>> >> My feeling on whether to support O(1) at both ends
>>> >> is that any flavor of this that ends up in the JDK eventually
>>> >> should really do this.  My idea is that we can
>>> >> wholeheartedly recommend this collection class
>>> >> for overall good behavior without any of the surprising
>>> >> performance traps of existing collection classes.
>>> >>
>>> >> But for the preliminary version, it makes sense to
>>> >> support only O(1) at one end, if it simplifies the
>>> >> implementation.  Random access will of course
>>> >> be worse than ArrayList, but by how much?
>>> >> We can do some benchmarking and look for
>>> >> micro-optimizations now.
>>> >>
>>> >> Kevin, what is you own personal feeling?
>>> >> Is the algorithm correct, and efficient enough?
>>> >> Do you think your new collection belongs in java.util?
>>> >>
>>> >> Martin
>>> >>
>>> >> On Sun, Apr 4, 2010 at 04:12, Kevin L. Stern <kevin.l.st...@gmail.com
>>> >
>>> >> wrote:
>>> >> > The data structure is available at the second link that I originally
>>> >> > provided (once again, it is
>>> >> >
>>> >> >
>>> https://docs.google.com/Doc?docid=0Aabrz3MPBDdhZGdrbnEzejdfM2M3am5wM2Mz&hl=en
>>> ).
>>> >> > This does not have O(1) time insertion at the front as yet as it was
>>> >> > unclear
>>> >> > to me whether or not it was agreed upon:
>>> >> > _________________
>>> >> > From: Osvaldo Doederlein <opin...@gmail.com>
>>> >> > Date: Mon, Mar 29, 2010 at 10:08 AM
>>> >> > Subject: Re: A List implementation backed by multiple small arrays
>>> >> > rather
>>> >> > than the traditional single large array.
>>> >> > To: Martin Buchholz <marti...@google.com>
>>> >> > Cc: "Kevin L. Stern" <kevin.l.st...@gmail.com>,
>>> >> > core-libs-dev@openjdk.java.net
>>> >> >
>>> >> > Initially, it would be good enough to replace only
>>> java.util.ArrayList
>>> >> > with
>>> >> > minimal overhead. ArrayList does not support efficient add-at-front
>>> or
>>> >> > other
>>> >> > enhancements of ArrayDeque; but ArrayList is still a much more
>>> important
>>> >> > and
>>> >> > popular collection, it's the primary "straight replacement for
>>> primitive
>>> >> > arrrays" and I guess it should continue with that role.
>>> >> > _________________
>>> >> >
>>> >> > As a disclaimer, I'm still tinkering with this so I'll be updating
>>> the
>>> >> > document at the provided link as I find improvements.
>>> >> >
>>> >> > Thoughts?
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Kevin
>>> >> >
>>> >> > On Thu, Apr 1, 2010 at 10:28 PM, Martin Buchholz <
>>> marti...@google.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Kevin,
>>> >> >>
>>> >> >> You're probably the only one on this list who has
>>> >> >> seriously read the paper.  It is not surprising that
>>> >> >> taking a research paper into production would
>>> >> >> discover bugs - the research never had to undergo
>>> >> >> rigorous testing.  (I like the Java culture of
>>> >> >> combining spec + implementation + test suite)
>>> >> >>
>>> >> >> I suggest you ask the authors directly about the bug.
>>> >> >> They would probably also be interested to hear
>>> >> >> about your implementation.
>>> >> >>
>>> >> >> Are you aware of Integer.numberOfLeadingZeros?
>>> >> >>
>>> >> >>
>>> >> >>
>>> http://download.java.net/jdk7/docs/api/java/lang/Integer.html#numberOfLeadingZeros(int)<http://download.java.net/jdk7/docs/api/java/lang/Integer.html#numberOfLeadingZeros%28int%29>
>>> >> >>
>>> >> >> Martin
>>> >> >>
>>> >> >> On Wed, Mar 31, 2010 at 19:34, Kevin L. Stern <
>>> kevin.l.st...@gmail.com>
>>> >> >> wrote:
>>> >> >> > I'm almost convinced now that the paper is incorrect.  The code
>>> below
>>> >> >> > gives
>>> >> >> > me the appropriate index into the index array and the offset into
>>> the
>>> >> >> > data
>>> >> >> > block.  That being said, remember when I mentioned that this will
>>> >> >> > include a
>>> >> >> > bit more work to access an element than a simple bit shift and a
>>> bit
>>> >> >> > mask?
>>> >> >> > Well this is more than a bit more - we'll be doing this each time
>>> an
>>> >> >> > index
>>> >> >> > is requested.  I'll spend some time trying to twiddle the bits to
>>> see
>>> >> >> > if
>>> >> >> > I
>>> >> >> > can eliminate/combine some of the operations.
>>> >> >> >
>>> >> >> >         for (int r = 1; r < 33; r++) {
>>> >> >> >             int k = lg(r);
>>> >> >> >             int floorKO2 = k >> 1;
>>> >> >> >             int powFloorKO2 = (1 << floorKO2);
>>> >> >> >             int p = ((1 << floorKO2) - 1) << 1;
>>> >> >> >             int ceilKO2;
>>> >> >> >             if ((k & 1) == 1) {
>>> >> >> >                 ceilKO2 = floorKO2 + 1;
>>> >> >> >                 p += powFloorKO2;
>>> >> >> >             } else {
>>> >> >> >                 ceilKO2 = floorKO2;
>>> >> >> >             }
>>> >> >> >             int e = r & ((1 << ceilKO2) - 1);
>>> >> >> >             int b = (r >> ceilKO2) & (powFloorKO2 - 1);
>>> >> >> >
>>> >> >> >             System.out.println((r - 1) + " " + (p + b) + " " +
>>> e);
>>> >> >> >         }
>>> >> >> >
>>> >> >> > Kevin
>>> >> >> >
>>> >> >> > On Wed, Mar 31, 2010 at 7:08 PM, Kevin L. Stern
>>> >> >> > <kevin.l.st...@gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> I realize that 2 * (2^(k/2) - 1) only works for even numbered
>>> >> >> >> superblocks,
>>> >> >> >> the odd numbered superblocks need an additional term added (the
>>> >> >> >> number
>>> >> >> >> of
>>> >> >> >> data blocks in SB_[k-1]) to jive with my interpretation; anyhow,
>>> I
>>> >> >> >> also
>>> >> >> >> came
>>> >> >> >> across an alternative characterization of superblock in the
>>> paper
>>> >> >> >> which
>>> >> >> >> states that data blocks are grouped within a superblock when
>>> they
>>> >> >> >> are
>>> >> >> >> the
>>> >> >> >> same size - to me, though, that implies that my example
>>> structure
>>> >> >> >> below
>>> >> >> >> would be
>>> >> >> >>
>>> >> >> >> SB_0: [1]
>>> >> >> >> SB_1: [2][2][2]
>>> >> >> >> SB_2: [4][4][4][4][4][4]
>>> >> >> >>
>>> >> >> >> which seems to contradict my understanding of (1) below.  I must
>>> be
>>> >> >> >> reading this upside down.
>>> >> >> >>
>>> >> >> >> On Wed, Mar 31, 2010 at 6:36 PM, Kevin L. Stern
>>> >> >> >> <kevin.l.st...@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >>>
>>> >> >> >>> What am I missing here?  In "Resizable arrays in optimal time
>>> and
>>> >> >> >>> space"
>>> >> >> >>> the authors define their data structure with the following
>>> >> >> >>> property:
>>> >> >> >>>
>>> >> >> >>> (1)  "When superblock SB_k is fully allocated, it consists of
>>> >> >> >>> 2^(floor(k/2)) data blocks, each of size 2^(ceil(k/2))."
>>> >> >> >>>
>>> >> >> >>> Since the superblock is zero-based indexed this implies the
>>> >> >> >>> following
>>> >> >> >>> structure:
>>> >> >> >>>
>>> >> >> >>> SB_0: [1]
>>> >> >> >>> SB_1: [2]
>>> >> >> >>> SB_2: [2][2]
>>> >> >> >>> SB_3: [4][4]
>>> >> >> >>> SB_4: [4][4][4][4]
>>> >> >> >>> [...]
>>> >> >> >>>
>>> >> >> >>> Let's have a look at Algorithm 3, Locate(i), with i = 3:
>>> >> >> >>>
>>> >> >> >>> r = 100 (the binary expansion of i + 1)
>>> >> >> >>> k = |r| - 1 = 2
>>> >> >> >>> p = 2^k - 1 = 3
>>> >> >> >>>
>>> >> >> >>> What concerns me is their statement that p represents "the
>>> number
>>> >> >> >>> of
>>> >> >> >>> data
>>> >> >> >>> blocks in superblocks prior to SB_k."  There are only two data
>>> >> >> >>> blocks
>>> >> >> >>> in
>>> >> >> >>> superblocks prior to SB_2, not three.  Given (1) above, unless
>>> I'm
>>> >> >> >>> misinterpreting it, the number of data blocks in superblocks
>>> prior
>>> >> >> >>> to
>>> >> >> >>> SB_k
>>> >> >> >>> should be:
>>> >> >> >>>
>>> >> >> >>> 2 * Sum[i=0->k/2-1] 2^i = 2 * (2^(k/2) - 1)
>>> >> >> >>>
>>> >> >> >>> This, of course, seems to work out much better in my example
>>> above,
>>> >> >> >>> giving the correct answer to my interpretation of their data
>>> >> >> >>> structure, but
>>> >> >> >>> I have a hard time believing that this is their mistake rather
>>> than
>>> >> >> >>> my
>>> >> >> >>> misinterpretation.
>>> >> >> >>>
>>> >> >> >>> Thoughts?
>>> >> >> >>>
>>> >> >> >>> Kevin
>>> >> >> >>>
>>> >> >> >>> On Tue, Mar 30, 2010 at 5:20 PM, Martin Buchholz
>>> >> >> >>> <marti...@google.com>
>>> >> >> >>> wrote:
>>> >> >> >>>>
>>> >> >> >>>> On Tue, Mar 30, 2010 at 04:25, Kevin L. Stern
>>> >> >> >>>> <kevin.l.st...@gmail.com>
>>> >> >> >>>> wrote:
>>> >> >> >>>> > Hi Martin,
>>> >> >> >>>> >
>>> >> >> >>>> > Thanks much for your feedback.  The first approach that
>>> comes to
>>> >> >> >>>> > mind
>>> >> >> >>>> > to
>>> >> >> >>>> > implement O(1) time front as well as rear insertion is to
>>> create
>>> >> >> >>>> > a
>>> >> >> >>>> > cyclic
>>> >> >> >>>> > list structure with a front/rear pointer - to insert at the
>>> >> >> >>>> > front
>>> >> >> >>>> > requires
>>> >> >> >>>> > decrementing the front pointer (modulo the size) and to
>>> insert
>>> >> >> >>>> > at
>>> >> >> >>>> > the
>>> >> >> >>>> > rear
>>> >> >> >>>> > requires incrementing the rear pointer (modulo the size).
>>> We
>>> >> >> >>>> > need
>>> >> >> >>>> > to
>>> >> >> >>>> > resize
>>> >> >> >>>> > when the two pointers bump into each other.  Could you
>>> explain
>>> >> >> >>>> > more
>>> >> >> >>>> > about
>>> >> >> >>>> > your suggestion of introducing an arraylet that is shared by
>>> the
>>> >> >> >>>> > front
>>> >> >> >>>> > and
>>> >> >> >>>> > the rear?
>>> >> >> >>>>
>>> >> >> >>>> It was a half-baked idea - I don't know if there's a way to
>>> turn
>>> >> >> >>>> it
>>> >> >> >>>> into
>>> >> >> >>>> something useful.  I was thinking of the ArrayDeque
>>> >> >> >>>> implementation,
>>> >> >> >>>> where all the elements live in a single array.
>>> >> >> >>>>
>>> >> >> >>>> >  It's not clear to me how that would help and/or be a better
>>> >> >> >>>> > approach than the cyclic list.  Anyhow, the paper that you
>>> >> >> >>>> > reference,
>>> >> >> >>>> > "Resizable arrays in optimal time and space", gives a deque
>>> so
>>> >> >> >>>> > if
>>> >> >> >>>> > we
>>> >> >> >>>> > take
>>> >> >> >>>> > that approach then the deque is specified.
>>> >> >> >>>>
>>> >> >> >>>> Technically, ArrayList also supports the Deque operations -
>>> >> >> >>>> just not efficiently.
>>> >> >> >>>
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>

Re: A List implementation backed by multiple small arrays rather than the traditional single large array.

Reply via email to