Re: call for 9.4.1 release (bug in vectors format)

Michael Wechner Tue, 18 Oct 2022 11:51:05 -0700

+1 :-)

Thanks


Michael

Am 18.10.22 um 19:52 schrieb Julie Tibshirani:

Hi everyone,
We recently discovered a severe bug in the 9.4 release in the kNNvectors format: https://github.com/apache/lucene/issues/11858.Explaining the problem: when ingesting a lot of data, or whenperforming a force merge, segments can grow large. The formatvalidation code accidentally uses an int instead of a long to computethe data size, so it can fail on these large segments. When formatvalidation fails, the segment is essentially lost and unusable. Forsome client systems like Elasticsearch, it can send the whole indexinto a "failed" state, blocking further writes or searches.
I think this bug is sufficiently bad that we should perform a 9.4.1release as soon as possible. The fix is just an update to theread-side validation code, there won't be any effect on the dataformat. This means it is safe to merge the fix into the existing 9.4vectors format. The bug was introduced during the work to addquantization (https://github.com/apache/lucene/pull/1054) and does notaffect versions before 9.4.
Let me know what you think! I could serve as release manager. (Weshould also follow up with a plan to prevent this from happening inthe future -- maybe we need to regularly run larger-scale benchmarks?)
Julie



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: call for 9.4.1 release (bug in vectors format)

Reply via email to