Shelving is fine.

When it comes time to do this, I thikn that the rationale for the pending
stuff is that merging is roughly as expensive as a copy.  Thus, we should
be able to arrange that there really is a copy and the old arrays.  This
will allow any iterators to complete cleanly if the pending list isn't a
problem.  Presumably even the list can be made to work since it only has
append operations and the iterator can snapshot the length.

On Wed, Mar 27, 2013 at 10:12 PM, Dan Filimon
<dangeorge.fili...@gmail.com>wrote:

> I realized that in fact just the pending array is not enough. There is
> also the distances array that needs fixing.
> In fact, an entire new copy is needed if we're to be able to safely
> iterate and reindex.
>
> I'm shelving this for now. That ok?
>
> On Wed, Mar 27, 2013 at 4:35 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> > Another option is to make the iterator take a reference to the array as
> it
> > exists and then during merging always create a new array.
> >
> > A second option is to just let the iterator get a bit confused (don't
> like
> > the smell there).
> >
> > On Tue, Mar 26, 2013 at 10:59 PM, Dan Filimon
> > <dangeorge.fili...@gmail.com>wrote:
> >
> >> Ted, everyone,
> >>
> >> I added the multiple pass BallKMeans at the end of the reducer and
> >> have uncovered a series of issues with the searchers.
> >>
> >> FastProjectionSearch stores "pending" vectors in a separate list from
> >> the main projections, the idea being that use arrays throughout rather
> >> than a tree multiset to reduce the overhead.
> >>
> >> Here's what's happening:
> >> - after having added a bunch of centroids to the searcher,
> >> - we iterate through each centroid and search for its closest neighbor,
> >> - when calling search(), reindex() is called internally that deals
> >> with pending entries that haven't been merged in some time. When
> >> reindex() mutates the data structures, the iterator we're using to go
> >> through the centroids becomes invalid and a
> >> ConcurrentModificationException is thrown.
> >>
> >> I'm not really sure what the best approach is to fix this. Here are
> >> some options:
> >> - expose reindex() and demand users to call it explicitly (which
> >> completely kills abstraction);
> >> - when instantiating an Iterator, create a helper ImmutableList that
> >> keeps track of the pending values for the duration of the iteration
> >> and when iterating go through these first and release the reference to
> >> the list as soon as possible;
> >> - come up with an alternative design and really think about how this
> >> searcher is useful.
> >>
> >> In particular, when running minimal tests (SearchQualityTest), this
> >> FPS is faster than ordinary ProjectionSearch but worse than LSH.
> >>
> >> Any thoughts?
> >>
>

Reply via email to