Hi everyone,

We recently discovered a severe bug in the 9.4 release in the kNN vectors
format: https://github.com/apache/lucene/issues/11858. Explaining the
problem: when ingesting a lot of data, or when performing a force merge,
segments can grow large. The format validation code accidentally uses an
int instead of a long to compute the data size, so it can fail on these
large segments. When format validation fails, the segment is essentially
lost and unusable. For some client systems like Elasticsearch, it can send
the whole index into a "failed" state, blocking further writes or searches.

I think this bug is sufficiently bad that we should perform a 9.4.1 release
as soon as possible. The fix is just an update to the read-side validation
code, there won't be any effect on the data format. This means it is safe
to merge the fix into the existing 9.4 vectors format. The bug was
introduced during the work to add quantization (
https://github.com/apache/lucene/pull/1054) and does not affect versions
before 9.4.

Let me know what you think! I could serve as release manager. (We
should also follow up with a plan to prevent this from happening in the
future -- maybe we need to regularly run larger-scale benchmarks?)

Julie

Reply via email to