Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

via GitHub Fri, 08 Dec 2023 08:19:57 -0800


benwtrent commented on code in PR #12699:
URL: https://github.com/apache/lucene/pull/12699#discussion_r1420733099



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -1190,4 +1176,63 @@ public void seekExact(long ord) {
   public long ord() {
     throw new UnsupportedOperationException();
   }
+
+  static class OutputAccumulator extends DataInput {
+
+    BytesRef[] outputs = new BytesRef[16];
+    BytesRef current;
+    int num;
+    int outputIndex;
+    int index;
+
+    void push(BytesRef output) {
+      if (output != Lucene90BlockTreeTermsReader.NO_OUTPUT) {

Review Comment:
   While we have strict contracts, I can see that 
`BytesSequenceOutputs#add(BytesRef, BytesRef)` has assertions that the length 
is > 0. 
   
   Here in `OutputAccumulator` `readByte()` makes a big assumption that a 
`BytesRef` has at least length 1. If it had a length of 0, we would read past 
the ref end and read bytes sitting in a `byte[]` that we shouldn't.
   
   IMO `OutputAccumulator` needs to be way more cautious than 
``BytesSequenceOutputs#add(BytesRef, BytesRef)` because `OutputAccumulator` 
isn't making copies and is relying on the underlying byte arrays not changing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

Reply via email to