To be clear Robert, I agree with you in not bumping it just to 2048 or whatever not motivated enough constant.
But I disagree on the performance perspective: I mean I am absolutely positive in working to improve the current performances, but I think this is disconnected from that limit. Not all users need billions of vectors, maybe tomorrow a new chip is released that speed up the processing 100x or whatever... The limit as far as I know is not used to initialise or optimise any data structure, it's only used to raise an exception. I don't see a big problem in allowing 10k vectors for example but then majority of people won't be able to use such vectors because slow on the average computer. If we just get 1 new user, it's better than 0. Or well, if it's a reputation thing, than It's a completely different discussion I guess. On Thu, 6 Apr 2023, 16:47 Robert Muir, <rcm...@gmail.com> wrote: > Well, I'm asking ppl actually try to test using such high dimensions. > Based on my own experience, I consider it unusable. It seems other > folks may have run into trouble too. If the project committers can't > even really use vectors with such high dimension counts, then its not > in an OK state for users, and we shouldn't bump the limit. > > I'm happy to discuss/compromise etc, but simply bumping the limit > without addressing the underlying usability/scalability is a real > no-go, it is not really solving anything, nor is it giving users any > freedom or allowing them to do something they couldnt do before. > Because if it still doesnt work it still doesnt work. > > We all need to be on the same page, grounded in reality, not fantasy, > where if we set a limit of 1024 or 2048, that you can actually index > vectors with that many dimensions and it actually works and scales. > > On Thu, Apr 6, 2023 at 11:38 AM Alessandro Benedetti > <a.benede...@sease.io> wrote: > > > > As I said earlier, a max limit limits usability. > > It's not forcing users with small vectors to pay the performance penalty > of big vectors, it's literally preventing some users to use > Lucene/Solr/Elasticsearch at all. > > As far as I know, the max limit is used to raise an exception, it's not > used to initialise or optimise data structures (please correct me if I'm > wrong). > > > > Improving the algorithm performance is a separate discussion. > > I don't see a correlation with the fact that indexing billions of > whatever dimensioned vector is slow with a usability parameter. > > > > What about potential users that need few high dimensional vectors? > > > > As I said before, I am a big +1 for NOT just raise it blindly, but I > believe we need to remove the limit or size it in a way it's not a problem > for both users and internal data structure optimizations, if any. > > > > > > On Wed, 5 Apr 2023, 18:54 Robert Muir, <rcm...@gmail.com> wrote: > >> > >> I'd ask anyone voting +1 to raise this limit to at least try to index > >> a few million vectors with 756 or 1024, which is allowed today. > >> > >> IMO based on how painful it is, it seems the limit is already too > >> high, I realize that will sound controversial but please at least try > >> it out! > >> > >> voting +1 without at least doing this is really the > >> "weak/unscientifically minded" approach. > >> > >> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner > >> <michael.wech...@wyona.com> wrote: > >> > > >> > Thanks for your feedback! > >> > > >> > I agree, that it should not crash. > >> > > >> > So far we did not experience crashes ourselves, but we did not index > >> > millions of vectors. > >> > > >> > I will try to reproduce the crash, maybe this will help us to move > forward. > >> > > >> > Thanks > >> > > >> > Michael > >> > > >> > Am 05.04.23 um 18:30 schrieb Dawid Weiss: > >> > >> Can you describe your crash in more detail? > >> > > I can't. That experiment was a while ago and a quick test to see if > I > >> > > could index rather large-ish USPTO (patent office) data as vectors. > >> > > Couldn't do it then. > >> > > > >> > >> How much RAM? > >> > > My indexing jobs run with rather smallish heaps to give space for > I/O > >> > > buffers. Think 4-8GB at most. So yes, it could have been the > problem. > >> > > I recall segment merging grew slower and slower and then simply > >> > > crashed. Lucene should work with low heap requirements, even if it > >> > > slows down. Throwing ram at the indexing/ segment merging problem > >> > > is... I don't know - not elegant? > >> > > > >> > > Anyway. My main point was to remind folks about how Apache works - > >> > > code is merged in when there are no vetoes. If Rob (or anybody else) > >> > > remains unconvinced, he or she can block the change. (I didn't > invent > >> > > those rules). > >> > > > >> > > D. > >> > > > >> > > > --------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > > For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >