I agree with every point about the delays inherent in a distributed system,
and how any "list" call should be treated by clients as point-in-time.  And
I agree that the impact _should_ be minimal since diligent clients should
have error handling in these cases anyways.

But it still feels off to me to have a "list" op output something that's
potentially incorrect even in the point-in-time it's produced.

Not a -1 or a veto, just my 2c.  If it's an outlier opinion, please ignore
it.

Best,

Jason

On Mon, Jan 29, 2024 at 2:23 PM David Smiley <dsmi...@apache.org> wrote:

> Yeah, I'm sympathetic to that viewpoint.  I was coming at this from
> Walter's -- clients must be tolerant always.  This mindset is
> important when working on scalable distributed systems.  But depending
> on clients being so tolerant leads to being less friendly --
> increasing the likelihood that they will have to deal with such
> errors.  Solr might even appear buggy to such a client/user.  Shrug.
>
> At work we've got this modification to add listAll to collection
> listing (thus can toggle the semantics) but for scalability reasons,
> we're finding we want this enabled everywhere, which begs the question
> if it should simply work this way to begin with.  I'm also motivated
> to contribute to Solr without adding complexity -- arguably listing
> collections shouldn't need any parameters.  But we could contribute it
> this way; okay?  And maybe make listAll's default be a system property
> so you can run Solr in this way.
>
> On Mon, Jan 29, 2024 at 1:42 PM Jason Gerlowski <gerlowsk...@gmail.com>
> wrote:
> >
> > Thanks for calling this out more explicitly; definitelyf worth
> discussing.
> >
> > > If a client/caller/user lists collections and then loops them to take
> > some action on them, it needs to be tolerant of the collection not
> working;
> > may seem to not exist.
> >
> > I'd go even a step further and say that users should always have
> > error-handling around their calls to Solr.
> >
> > But even so I'm leery of changing the semantics here.  I think the
> > assumption of most folks is that each entry returned by a "list" exists
> > fully, unless the response gives more granular info to augment that.  I'd
> > worry that returning partially-created or partially-deleted collections
> > would be confusing and unintuitive to most users.  (e.g. Imagine
> iterating
> > over a "list", getting a not-found error running some operation on one of
> > the entries, but still seeing the collection when you call "list" again
> to
> > double-check.)
> >
> > I understand the need for a more scalable API, or a way to detect
> orphaned
> > data in ZK.  But I'd personally rather not see us change the LIST
> semantics
> > to accomplish that.  If you need the ZK child nodes, is there maybe a
> > scalable way to invoke ZookeeperInfoHandler to get that information?
> >
> > Best,
> >
> > Jason
> >
> > On Fri, Jan 26, 2024 at 2:46 PM David Smiley <dsmi...@apache.org> wrote:
> >
> > > https://issues.apache.org/jira/browse/SOLR-16909
> > > > Collections LIST command should fetch ZK data, not cached state
> > >
> > > I want to get further input from folks that changing the semantics is
> > > okay.  If the change is applied, LIST will be much faster but it will
> > > return collections that have not yet been fully constructed or
> > > deleted.  If a client/caller/user lists collections and then loops
> > > them to take some action on them, it needs to be tolerant of the
> > > collection not working; may seem to not exist.  I argue callers should
> > > *already* behave in this way or it may be brittle to circumstances
> > > that are hard to reason about.  On the other hand, maybe this would
> > > increase the frequency of errors to existing clients that didn't
> > > encounter this in testing?  Shrug.  I could imagine ways to solve this
> > > but it would add some complexity and it's not clear it's worthwhile.
> > >
> > > A related aside: the method ClusterStatus.getCollectionsMap is not
> > > scalable for clusters with 10K+ collections because it loops every
> > > collection to fetch the latest stake from ZK, putting a massive load
> > > on ZK.  Our implementation of collection listing calls it, as does a
> > > number of places across Solr.  Some could be changed with relative
> > > ease; some are more thorny.  I'd love to rename this thing, putting
> > > "slow" in the name so that you think twice before calling it :-)
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

Reply via email to