If we find issues with larger limits, maybe have a configurable limit like we do for maxBooleanClauses. Maybe somebody wants to run with a 100G heap and do one query per second.
Where I work (LexisNexis), we have high-value queries, but just not that many of them per second. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 6, 2023, at 8:57 AM, Alessandro Benedetti <a.benede...@sease.io> wrote: > > To be clear Robert, I agree with you in not bumping it just to 2048 or > whatever not motivated enough constant. > > But I disagree on the performance perspective: > I mean I am absolutely positive in working to improve the current > performances, but I think this is disconnected from that limit. > > Not all users need billions of vectors, maybe tomorrow a new chip is released > that speed up the processing 100x or whatever... > > The limit as far as I know is not used to initialise or optimise any data > structure, it's only used to raise an exception. > > I don't see a big problem in allowing 10k vectors for example but then > majority of people won't be able to use such vectors because slow on the > average computer. > If we just get 1 new user, it's better than 0. > Or well, if it's a reputation thing, than It's a completely different > discussion I guess. > > > On Thu, 6 Apr 2023, 16:47 Robert Muir, <rcm...@gmail.com > <mailto:rcm...@gmail.com>> wrote: >> Well, I'm asking ppl actually try to test using such high dimensions. >> Based on my own experience, I consider it unusable. It seems other >> folks may have run into trouble too. If the project committers can't >> even really use vectors with such high dimension counts, then its not >> in an OK state for users, and we shouldn't bump the limit. >> >> I'm happy to discuss/compromise etc, but simply bumping the limit >> without addressing the underlying usability/scalability is a real >> no-go, it is not really solving anything, nor is it giving users any >> freedom or allowing them to do something they couldnt do before. >> Because if it still doesnt work it still doesnt work. >> >> We all need to be on the same page, grounded in reality, not fantasy, >> where if we set a limit of 1024 or 2048, that you can actually index >> vectors with that many dimensions and it actually works and scales. >> >> On Thu, Apr 6, 2023 at 11:38 AM Alessandro Benedetti >> <a.benede...@sease.io <mailto:a.benede...@sease.io>> wrote: >> > >> > As I said earlier, a max limit limits usability. >> > It's not forcing users with small vectors to pay the performance penalty >> > of big vectors, it's literally preventing some users to use >> > Lucene/Solr/Elasticsearch at all. >> > As far as I know, the max limit is used to raise an exception, it's not >> > used to initialise or optimise data structures (please correct me if I'm >> > wrong). >> > >> > Improving the algorithm performance is a separate discussion. >> > I don't see a correlation with the fact that indexing billions of whatever >> > dimensioned vector is slow with a usability parameter. >> > >> > What about potential users that need few high dimensional vectors? >> > >> > As I said before, I am a big +1 for NOT just raise it blindly, but I >> > believe we need to remove the limit or size it in a way it's not a problem >> > for both users and internal data structure optimizations, if any. >> > >> > >> > On Wed, 5 Apr 2023, 18:54 Robert Muir, <rcm...@gmail.com >> > <mailto:rcm...@gmail.com>> wrote: >> >> >> >> I'd ask anyone voting +1 to raise this limit to at least try to index >> >> a few million vectors with 756 or 1024, which is allowed today. >> >> >> >> IMO based on how painful it is, it seems the limit is already too >> >> high, I realize that will sound controversial but please at least try >> >> it out! >> >> >> >> voting +1 without at least doing this is really the >> >> "weak/unscientifically minded" approach. >> >> >> >> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner >> >> <michael.wech...@wyona.com <mailto:michael.wech...@wyona.com>> wrote: >> >> > >> >> > Thanks for your feedback! >> >> > >> >> > I agree, that it should not crash. >> >> > >> >> > So far we did not experience crashes ourselves, but we did not index >> >> > millions of vectors. >> >> > >> >> > I will try to reproduce the crash, maybe this will help us to move >> >> > forward. >> >> > >> >> > Thanks >> >> > >> >> > Michael >> >> > >> >> > Am 05.04.23 um 18:30 schrieb Dawid Weiss: >> >> > >> Can you describe your crash in more detail? >> >> > > I can't. That experiment was a while ago and a quick test to see if I >> >> > > could index rather large-ish USPTO (patent office) data as vectors. >> >> > > Couldn't do it then. >> >> > > >> >> > >> How much RAM? >> >> > > My indexing jobs run with rather smallish heaps to give space for I/O >> >> > > buffers. Think 4-8GB at most. So yes, it could have been the problem. >> >> > > I recall segment merging grew slower and slower and then simply >> >> > > crashed. Lucene should work with low heap requirements, even if it >> >> > > slows down. Throwing ram at the indexing/ segment merging problem >> >> > > is... I don't know - not elegant? >> >> > > >> >> > > Anyway. My main point was to remind folks about how Apache works - >> >> > > code is merged in when there are no vetoes. If Rob (or anybody else) >> >> > > remains unconvinced, he or she can block the change. (I didn't invent >> >> > > those rules). >> >> > > >> >> > > D. >> >> > > >> >> > > --------------------------------------------------------------------- >> >> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> > > <mailto:dev-unsubscr...@lucene.apache.org> >> >> > > For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > <mailto:dev-h...@lucene.apache.org> >> >> > > >> >> > >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> > <mailto:dev-unsubscr...@lucene.apache.org> >> >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > <mailto:dev-h...@lucene.apache.org> >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> <mailto:dev-unsubscr...@lucene.apache.org> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> <mailto:dev-h...@lucene.apache.org> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> <mailto:dev-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> <mailto:dev-h...@lucene.apache.org> >>