I agree with every point about the delays inherent in a distributed system, and how any "list" call should be treated by clients as point-in-time. And I agree that the impact _should_ be minimal since diligent clients should have error handling in these cases anyways.
But it still feels off to me to have a "list" op output something that's potentially incorrect even in the point-in-time it's produced. Not a -1 or a veto, just my 2c. If it's an outlier opinion, please ignore it. Best, Jason On Mon, Jan 29, 2024 at 2:23 PM David Smiley <dsmi...@apache.org> wrote: > Yeah, I'm sympathetic to that viewpoint. I was coming at this from > Walter's -- clients must be tolerant always. This mindset is > important when working on scalable distributed systems. But depending > on clients being so tolerant leads to being less friendly -- > increasing the likelihood that they will have to deal with such > errors. Solr might even appear buggy to such a client/user. Shrug. > > At work we've got this modification to add listAll to collection > listing (thus can toggle the semantics) but for scalability reasons, > we're finding we want this enabled everywhere, which begs the question > if it should simply work this way to begin with. I'm also motivated > to contribute to Solr without adding complexity -- arguably listing > collections shouldn't need any parameters. But we could contribute it > this way; okay? And maybe make listAll's default be a system property > so you can run Solr in this way. > > On Mon, Jan 29, 2024 at 1:42 PM Jason Gerlowski <gerlowsk...@gmail.com> > wrote: > > > > Thanks for calling this out more explicitly; definitelyf worth > discussing. > > > > > If a client/caller/user lists collections and then loops them to take > > some action on them, it needs to be tolerant of the collection not > working; > > may seem to not exist. > > > > I'd go even a step further and say that users should always have > > error-handling around their calls to Solr. > > > > But even so I'm leery of changing the semantics here. I think the > > assumption of most folks is that each entry returned by a "list" exists > > fully, unless the response gives more granular info to augment that. I'd > > worry that returning partially-created or partially-deleted collections > > would be confusing and unintuitive to most users. (e.g. Imagine > iterating > > over a "list", getting a not-found error running some operation on one of > > the entries, but still seeing the collection when you call "list" again > to > > double-check.) > > > > I understand the need for a more scalable API, or a way to detect > orphaned > > data in ZK. But I'd personally rather not see us change the LIST > semantics > > to accomplish that. If you need the ZK child nodes, is there maybe a > > scalable way to invoke ZookeeperInfoHandler to get that information? > > > > Best, > > > > Jason > > > > On Fri, Jan 26, 2024 at 2:46 PM David Smiley <dsmi...@apache.org> wrote: > > > > > https://issues.apache.org/jira/browse/SOLR-16909 > > > > Collections LIST command should fetch ZK data, not cached state > > > > > > I want to get further input from folks that changing the semantics is > > > okay. If the change is applied, LIST will be much faster but it will > > > return collections that have not yet been fully constructed or > > > deleted. If a client/caller/user lists collections and then loops > > > them to take some action on them, it needs to be tolerant of the > > > collection not working; may seem to not exist. I argue callers should > > > *already* behave in this way or it may be brittle to circumstances > > > that are hard to reason about. On the other hand, maybe this would > > > increase the frequency of errors to existing clients that didn't > > > encounter this in testing? Shrug. I could imagine ways to solve this > > > but it would add some complexity and it's not clear it's worthwhile. > > > > > > A related aside: the method ClusterStatus.getCollectionsMap is not > > > scalable for clusters with 10K+ collections because it loops every > > > collection to fetch the latest stake from ZK, putting a massive load > > > on ZK. Our implementation of collection listing calls it, as does a > > > number of places across Solr. Some could be changed with relative > > > ease; some are more thorny. I'd love to rename this thing, putting > > > "slow" in the name so that you think twice before calling it :-) > > > > > > ~ David Smiley > > > Apache Lucene/Solr Search Developer > > > http://www.linkedin.com/in/davidwsmiley > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > > > For additional commands, e-mail: dev-h...@solr.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > >