+1 :-)
Thanks
Michael
Am 18.10.22 um 19:52 schrieb Julie Tibshirani:
Hi everyone,
We recently discovered a severe bug in the 9.4 release in the kNN
vectors format: https://github.com/apache/lucene/issues/11858.
Explaining the problem: when ingesting a lot of data, or when
performing a force merge, segments can grow large. The format
validation code accidentally uses an int instead of a long to compute
the data size, so it can fail on these large segments. When format
validation fails, the segment is essentially lost and unusable. For
some client systems like Elasticsearch, it can send the whole index
into a "failed" state, blocking further writes or searches.
I think this bug is sufficiently bad that we should perform a 9.4.1
release as soon as possible. The fix is just an update to the
read-side validation code, there won't be any effect on the data
format. This means it is safe to merge the fix into the existing 9.4
vectors format. The bug was introduced during the work to add
quantization (https://github.com/apache/lucene/pull/1054) and does not
affect versions before 9.4.
Let me know what you think! I could serve as release manager. (We
should also follow up with a plan to prevent this from happening in
the future -- maybe we need to regularly run larger-scale benchmarks?)
Julie
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org