As I said earlier, a max limit limits usability. It's not forcing users with small vectors to pay the performance penalty of big vectors, it's literally preventing some users to use Lucene/Solr/Elasticsearch at all. As far as I know, the max limit is used to raise an exception, it's not used to initialise or optimise data structures (please correct me if I'm wrong).
Improving the algorithm performance is a separate discussion. I don't see a correlation with the fact that indexing billions of whatever dimensioned vector is slow with a usability parameter. What about potential users that need few high dimensional vectors? As I said before, I am a big +1 for NOT just raise it blindly, but I believe we need to remove the limit or size it in a way it's not a problem for both users and internal data structure optimizations, if any. On Wed, 5 Apr 2023, 18:54 Robert Muir, <rcm...@gmail.com> wrote: > I'd ask anyone voting +1 to raise this limit to at least try to index > a few million vectors with 756 or 1024, which is allowed today. > > IMO based on how painful it is, it seems the limit is already too > high, I realize that will sound controversial but please at least try > it out! > > voting +1 without at least doing this is really the > "weak/unscientifically minded" approach. > > On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner > <michael.wech...@wyona.com> wrote: > > > > Thanks for your feedback! > > > > I agree, that it should not crash. > > > > So far we did not experience crashes ourselves, but we did not index > > millions of vectors. > > > > I will try to reproduce the crash, maybe this will help us to move > forward. > > > > Thanks > > > > Michael > > > > Am 05.04.23 um 18:30 schrieb Dawid Weiss: > > >> Can you describe your crash in more detail? > > > I can't. That experiment was a while ago and a quick test to see if I > > > could index rather large-ish USPTO (patent office) data as vectors. > > > Couldn't do it then. > > > > > >> How much RAM? > > > My indexing jobs run with rather smallish heaps to give space for I/O > > > buffers. Think 4-8GB at most. So yes, it could have been the problem. > > > I recall segment merging grew slower and slower and then simply > > > crashed. Lucene should work with low heap requirements, even if it > > > slows down. Throwing ram at the indexing/ segment merging problem > > > is... I don't know - not elegant? > > > > > > Anyway. My main point was to remind folks about how Apache works - > > > code is merged in when there are no vetoes. If Rob (or anybody else) > > > remains unconvinced, he or she can block the change. (I didn't invent > > > those rules). > > > > > > D. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >