Thanks to everyone involved so far!
I confirm that a proper subject should have been [POLL] rather than [VOTE],
apologies for the confusion.

We are in the middle of the poll and this is the summary so far (ordered by

Option 2-4: 9 votes
make the limit configurable, potentially moving the limit to the
appropriate place

Option 3: 4 votes
keep it as it is (1024) but move it lower level in HNSW-specific

Option 1: 0 votes
keep it as it is (1024)

I've also seen many people responding in the mail thread, but not
indicating their preference.
I believe it would be very useful if everyone interested, expresses their

Have a good day!
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: <>
LinkedIn <> | Twitter
<> | Youtube
<> | Github

On Thu, 18 May 2023 at 14:34, Nicholas Knize <> wrote:

> Difficult to keep up with this topic when it's spread across issues, PRs,
> and email lists. My poll response is option 3. -1 to option 2, I think the
> configuration should be moved to the HNSW specific implementation. At this
> point of technical maturity, it doesn't make sense (to me) to have the
> config be a global system property.
> Given the conversation fragmentation I'll ask here what I asked in my
> comment on the github issue
> <>.
> "Can anyone smart here post their benchmarks to substantiate their
> claims?"
> For as enthusiastic a topic as vector dimensionality is, it sure is
> discouraging there isn't empirical data to help make an informed decision
> around what the recommended limit should be. I've only seen broad benchmark
> claims like "We benchmarked a patched Lucene/Solr. We fully understand (we
> measured it :-P)" It sure would be useful to see these benchmarks! Not
> having them to help improve these arbitrary limits seems like a serious
> disservice to the Lucene/Solr user community. I think until trustworthy
> numbers are made available all we'll have is conjecture and opinions.
> IMHO, given Java's lag in SIMD Vector support I'd rather see equal energy
> put into Robert's Vector API Integration, Plan B
> <> proposal. I'm not trying
> to minimize the importance of adding a configuration to the HNSW
> dimensionality, I just think we have the requisite expertise on this
> project to fix the bigger performance issues that are a direct result of
> Java's bigger vector performance deficiencies.
> Nicholas Knize, Ph.D., GISP
> Principal Engineer - Search  |  Amazon
> Apache Lucene PMC Member and Committer
> On Thu, May 18, 2023 at 7:07 AM Michael Wechner <>
> wrote:
>> Am 18.05.23 um 12:22 schrieb Michael McCandless:
>> I love all the energy and passion going into debating all the ways to
>> poke at this limit, but please let's also spend some of this passion on
>> actually improving the scalability of our aKNN implementation!  E.g. Robert
>> opened an exciting "Plan B" (
>> ) to workaround
>> OpenJDK's crazy slowness on enabling access to vectorized SIMD CPU
>> instructions (the Java Vector API, JEP 426:
>> ).  This could help postings and doc values performance too!
>> agreed, but I do not think the MAX_DIMENSIONS decision should depend on
>> this, because I think whatever improvements can be accomplished eventually,
>> very likely there will always be some limit.
>> Thanks
>> Michael
>> Mike McCandless
>> On Thu, May 18, 2023 at 5:24 AM Alessandro Benedetti <
>>> wrote:
>>> That's great and a good plan B, but let's try to focus this thread of
>>> collecting votes for a week (let's keep discussions on the nice PR opened
>>> by David or the discussion thread we have in the mailing list already :)
>>> On Thu, 18 May 2023, 10:10 Ishan Chattopadhyaya, <
>>>> wrote:
>>>> That sounds promising, Michael. Can you share scripts/steps/code to
>>>> reproduce this?
>>>> On Thu, 18 May, 2023, 1:16 pm Michael Wechner, <
>>>>> wrote:
>>>>> I just implemented it and tested it with OpenAI's
>>>>> text-embedding-ada-002, which is using 1536 dimensions and it works very
>>>>> fine :-)
>>>>> Thanks
>>>>> Michael
>>>>> Am 18.05.23 um 00:29 schrieb Michael Wechner:
>>>>> IIUC KnnVectorField is deprecated and one is supposed to use
>>>>> KnnFloatVectorField when using float as vector values, right?
>>>>> Am 17.05.23 um 16:41 schrieb Michael Sokolov:
>>>>> see
>>>>> On Wed, May 17, 2023 at 10:09 AM David Smiley <>
>>>>> wrote:
>>>>>> > easily be circumvented by a user
>>>>>> This is a revelation to me and others, if true.  Michael, please then
>>>>>> point to a test or code snippet that shows the Lucene user community what
>>>>>> they want to see so they are unblocked from their explorations of vector
>>>>>> search.
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> On Wed, May 17, 2023 at 7:51 AM Michael Sokolov <>
>>>>>> wrote:
>>>>>>> I think I've said before on this list we don't actually enforce the
>>>>>>> limit in any way that can't easily be circumvented by a user. The codec
>>>>>>> already supports any size vector - it doesn't impose any limit. The way 
>>>>>>> the
>>>>>>> API is written you can *already today* create an index with max-int 
>>>>>>> sized
>>>>>>> vectors and we are committed to supporting that going forward by our
>>>>>>> backwards compatibility policy as Robert points out. This wasn't
>>>>>>> intentional, I think, but it is the facts.
>>>>>>> Given that, I think this whole discussion is not really necessary.
>>>>>>> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <
>>>>>>>> wrote:
>>>>>>>> Hi all,
>>>>>>>> we have finalized all the options proposed by the community and we
>>>>>>>> are ready to vote for the preferred one and then proceed with the
>>>>>>>> implementation.
>>>>>>>> *Option 1*
>>>>>>>> Keep it as it is (dimension limit hardcoded to 1024)
>>>>>>>> *Motivation*:
>>>>>>>> We are close to improving on many fronts. Given the criticality of
>>>>>>>> Lucene in computing infrastructure and the concerns raised by one of 
>>>>>>>> the
>>>>>>>> most active stewards of the project, I think we should keep working 
>>>>>>>> toward
>>>>>>>> improving the feature as is and move to up the limit after we can
>>>>>>>> demonstrate improvement unambiguously.
>>>>>>>> *Option 2*
>>>>>>>> make the limit configurable, for example through a system property
>>>>>>>> *Motivation*:
>>>>>>>> The system administrator can enforce a limit its users need to
>>>>>>>> respect that it's in line with whatever the admin decided to be 
>>>>>>>> acceptable
>>>>>>>> for them.
>>>>>>>> The default can stay the current one.
>>>>>>>> This should open the doors for Apache Solr, Elasticsearch,
>>>>>>>> OpenSearch, and any sort of plugin development
>>>>>>>> *Option 3*
>>>>>>>> Move the max dimension limit lower level to a HNSW specific
>>>>>>>> implementation. Once there, this limit would not bind any other 
>>>>>>>> potential
>>>>>>>> vector engine alternative/evolution.
>>>>>>>> *Motivation:* There seem to be contradictory performance
>>>>>>>> interpretations about the current HNSW implementation. Some consider 
>>>>>>>> its
>>>>>>>> performance ok, some not, and it depends on the target data set and use
>>>>>>>> case. Increasing the max dimension limit where it is currently (in top
>>>>>>>> level FloatVectorValues) would not allow potential alternatives (e.g. 
>>>>>>>> for
>>>>>>>> other use-cases) to be based on a lower limit.
>>>>>>>> *Option 4*
>>>>>>>> Make it configurable and move it to an appropriate place.
>>>>>>>> In particular, a
>>>>>>>> simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024) should be
>>>>>>>> enough.
>>>>>>>> *Motivation*:
>>>>>>>> Both are good and not mutually exclusive and could happen in any
>>>>>>>> order.
>>>>>>>> Someone suggested to perfect what the _default_ limit should be,
>>>>>>>> but I've not seen an argument _against_ configurability.  Especially in
>>>>>>>> this way -- a toggle that doesn't bind Lucene's APIs in any way.
>>>>>>>> I'll keep this [VOTE] open for a week and then proceed to the
>>>>>>>> implementation.
>>>>>>>> --------------------------
>>>>>>>> *Alessandro Benedetti*
>>>>>>>> Director @ Sease Ltd.
>>>>>>>> *Apache Lucene/Solr Committer*
>>>>>>>> *Apache Solr PMC Member*
>>>>>>>> e-mail:
>>>>>>>> *Sease* - Information Retrieval Applied
>>>>>>>> Consulting | Training | Open Source
>>>>>>>> Website: <>
>>>>>>>> LinkedIn <> | Twitter
>>>>>>>> <> | Youtube
>>>>>>>> <> | Github
>>>>>>>> <>

Reply via email to