mayya-sharipova commented on code in PR #992: URL: https://github.com/apache/lucene/pull/992#discussion_r921175386
########## lucene/core/src/java/org/apache/lucene/index/VectorValuesWriter.java: ########## @@ -26,233 +26,153 @@ import org.apache.lucene.codecs.KnnVectorsWriter; import org.apache.lucene.search.DocIdSetIterator; import org.apache.lucene.search.TopDocs; +import org.apache.lucene.util.Accountable; import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.Bits; import org.apache.lucene.util.BytesRef; -import org.apache.lucene.util.Counter; import org.apache.lucene.util.RamUsageEstimator; /** - * Buffers up pending vector value(s) per doc, then flushes when segment flushes. + * Buffers up pending vector value(s) per doc, then flushes when segment flushes. Used for {@code + * SimpleTextKnnVectorsWriter} and for vectors writers before v 9.3 . * * @lucene.experimental */ -class VectorValuesWriter { - - private final FieldInfo fieldInfo; - private final Counter iwBytesUsed; - private final List<float[]> vectors = new ArrayList<>(); - private final DocsWithFieldSet docsWithField; - - private int lastDocID = -1; - - private long bytesUsed; - - VectorValuesWriter(FieldInfo fieldInfo, Counter iwBytesUsed) { - this.fieldInfo = fieldInfo; - this.iwBytesUsed = iwBytesUsed; - this.docsWithField = new DocsWithFieldSet(); - this.bytesUsed = docsWithField.ramBytesUsed(); - if (iwBytesUsed != null) { - iwBytesUsed.addAndGet(bytesUsed); +public abstract class VectorValuesWriter extends KnnVectorsWriter { Review Comment: @jtibshirani > I was thinking we would rewrite SimpleTextKnnVectorsWriter to implement the new interface directly. For example it'd define its own KnnFieldVectorsWriter where addValue writes to the vectors data file directly. I studied `SimpleTextKnnVectorsWriter` a little bit more, and understood we can't organize it this way. It needs to buffer vectors from all documents for each field, and only then write buffered vectors for each field to the vectors data file; as it is organized field by field basis, not by docs (as for example stored fields). If there was only a single vector field, we could potentially write to the vectors data file directly as you suggested. Thus, we have to stick with BufferingKnnVectorsWriter for `SimpleTextKnnVectorsWriter`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org