Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-05-01 Thread via GitHub


Jackie-Jiang merged PR #12866:
URL: https://github.com/apache/pinot/pull/12866


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-05-01 Thread via GitHub


abhioncbr commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2088309617

   @Jackie-Jiang / @xiangfu0, Can you please review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-26 Thread via GitHub


Jackie-Jiang commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2079819764

   We can use the same mechanism in #11857 to avoid this backward 
incompatibility by introducing a new suffix for the lucene99 index


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-24 Thread via GitHub


abhioncbr commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2076188853

   > I don't fully follow why did it break. Following the same logic in #11857 
should work. Basically when downgrading and Pinot doesn't understand the new 
format, it should try to generate the old format
   
   I think, in this case, it's failing because we are generating the [segment 
based on 
updated](https://github.com/apache/pinot/blob/99a41803305a39887805e1106f90d07e4b6af978/pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/SegmentOp.java#L183)
 Lucene library-based code using codec99, and for 
[uploading](https://github.com/apache/pinot/blob/99a41803305a39887805e1106f90d07e4b6af978/pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/SegmentOp.java#L222)
 using the old library based code.  So while, uploading we are getting this 
error
   ```bash
   2024/04/17 02:28:04.193 INFO [StartServiceManagerCommand] [Start a Pinot 
[SERVER]] Started Pinot [SERVER] instance [Server_10.1.0.40_8098] at 9.332s 
since launch
   2024/04/17 02:38:12.482 ERROR [LuceneTextIndexReader] 
[HelixTaskExecutor-message_handle_thread_4] Failed to instantiate Lucene text 
index reader for column textDim1, exception Could not load codec 'Lucene99'. 
Did you forget to add lucene-backward-codecs.jar?
   2024/04/17 02:38:12.482 ERROR [ImmutableSegmentLoader] 
[HelixTaskExecutor-message_handle_thread_4] Failed to load segment: 
FeatureTest1_Segment with SegmentDirectory
   java.lang.RuntimeException: java.lang.IllegalArgumentException: Could not 
load codec 'Lucene99'. Did you forget to add lucene-backward-codecs.jar?
at 
org.apache.pinot.segment.local.segment.index.readers.text.LuceneTextIndexReader.(LuceneTextIndexReader.java:101)
 
~[pinot-segment-local-1.2.0-compat-1713320271.jar:1.2.0-compat-1713320271-1d807df40160ec8525b5a33847d67a61fef2c54e]
   ```
   
   Do you think it's a valid reason? Also, I think generation of the segment in 
the compatibility test should also be based on the previous version itself. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-22 Thread via GitHub


Jackie-Jiang commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2071043920

   I don't fully follow why did it break. Following the same logic in #11857 
should work. Basically when downgrading and Pinot doesn't understand the new 
format, it should try to generate the old format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-20 Thread via GitHub


abhioncbr commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2067871551

   I am seeing a similar discussion that happened last time while upgrading the 
Lucene library, [PR](https://github.com/apache/pinot/pull/11857). 
   
   Compatability tests are failing for failing to load `lucene99Codec` class. 
@xiangfu0 / @Jackie-Jiang please suggest.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559826386


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/vector/lucene95/HnswVectorsFormat.java:
##
@@ -19,10 +19,10 @@
 package org.apache.pinot.segment.local.segment.creator.impl.vector.lucene95;
 
 import java.io.IOException;
+import org.apache.lucene.backward_codecs.lucene95.Lucene95HnswVectorsFormat;

Review Comment:
   
   > Does this mean after upgrading Lucece to 0.9.10:
   > 
   > 1. We can only write index with lucene99 based codecs?
   
   True, after upgrading Lucece to 0.9.10, we can only write an index based on 
lucene99.
   
   > 2. If a new generated index is served by an old server (running Lucene 
9.8.0), will it break?
   
   I am unsure, but I will say it will break if I have to guess. The only thing 
is this vector functionality (classes) got added in Pinot on Dec/23, so I am 
not sure how much it is getting used. Maybe @xiangfu0 can help us here with a 
better understanding. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


Jackie-Jiang commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559807956


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/vector/lucene95/HnswVectorsFormat.java:
##
@@ -19,10 +19,10 @@
 package org.apache.pinot.segment.local.segment.creator.impl.vector.lucene95;
 
 import java.io.IOException;
+import org.apache.lucene.backward_codecs.lucene95.Lucene95HnswVectorsFormat;

Review Comment:
   Does this mean after upgrading Lucece to 0.9.10:
   1. We can only write index with lucene99 based codecs?
   2. If a new generated index is served by an old server (running Lucene 
9.8.0), will it break?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559767788


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/vector/lucene95/HnswVectorsFormat.java:
##
@@ -19,10 +19,10 @@
 package org.apache.pinot.segment.local.segment.creator.impl.vector.lucene95;
 
 import java.io.IOException;
+import org.apache.lucene.backward_codecs.lucene95.Lucene95HnswVectorsFormat;

Review Comment:
   @xiangfu0 / @Jackie-Jiang, this codec can not be used to write data since 
now it's in `backward_codec` package. unit test 
`LoaderTest.testVectorIndexLoad:766->constructSegmentWithVectorIndex:928` is 
failing with the following exception
   ```bash
   Caused by: java.lang.UnsupportedOperationException: Old codecs may only be 
used for reading
at 
org.apache.lucene.backward_codecs.lucene95.Lucene95HnswVectorsFormat.fieldsWriter(Lucene95HnswVectorsFormat.java:182)
   ```
   based on the 
[conversation](https://lists.apache.org/thread/fl9ds5ofxgmq2obzh0j5hgsmp4n6jx65),
 I found, is it okay if I use lucene99 based codecs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


codecov-commenter commented on PR #12866:
URL: https://github.com/apache/pinot/pull/12866#issuecomment-2047162999

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/12866?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   Attention: Patch coverage is `0%` with `8 lines` in your changes are missing 
coverage. Please review.
   > Project coverage is 0.00%. Comparing base 
[(`59551e4`)](https://app.codecov.io/gh/apache/pinot/commit/59551e45224f1535c4863fd577622b37366ccc97?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`4b80af1`)](https://app.codecov.io/gh/apache/pinot/pull/12866?dropdown=coverage&src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 258 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/pinot/pull/12866?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...ache/pinot/segment/local/utils/fst/FSTBuilder.java](https://app.codecov.io/gh/apache/pinot/pull/12866?src=pr&el=tree&filepath=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Futils%2Ffst%2FFSTBuilder.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9mc3QvRlNUQnVpbGRlci5qYXZh)
 | 0.00% | [7 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12866?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   | 
[...al/segment/index/readers/LuceneFSTIndexReader.java](https://app.codecov.io/gh/apache/pinot/pull/12866?src=pr&el=tree&filepath=pinot-segment-local%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fpinot%2Fsegment%2Flocal%2Fsegment%2Findex%2Freaders%2FLuceneFSTIndexReader.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvTHVjZW5lRlNUSW5kZXhSZWFkZXIuamF2YQ==)
 | 0.00% | [1 Missing :warning: 
](https://app.codecov.io/gh/apache/pinot/pull/12866?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #12866   +/-   ##
   =
   - Coverage 61.75%0.00%   -61.76% 
   =
 Files  2436 2422   -14 
 Lines133233   132712  -521 
 Branches  2063620543   -93 
   =
   - Hits  822740-82274 
   - Misses44911   132712+87801 
   + Partials   60480 -6048 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[integration](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `0.00% <0.00%> (-0.01%)` | :arrow_down: |
   | 
[integration1](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[integration2](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `0.00% <0.00%> (ø)` | |
   | 
[java-11](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[java-21](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `0.00% <0.00%> (-61.63%)` | :arrow_down: |
   | 
[skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12866/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |

Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559175866


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/vector/lucene95/HnswCodec.java:
##
@@ -18,15 +18,15 @@
  */
 package org.apache.pinot.segment.local.segment.creator.impl.vector.lucene95;
 
+import org.apache.lucene.backward_codecs.lucene90.Lucene90PostingsFormat;

Review Comment:
   Lucene has moved these codecs to `backward_codecs` package.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559174589


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/LuceneFSTIndexReader.java:
##
@@ -52,7 +52,8 @@ public LuceneFSTIndexReader(PinotDataBuffer pinotDataBuffer)
 _dataBufferIndexInput = new PinotBufferIndexInput(_dataBuffer, 0L, 
_dataBuffer.size());
 
 _readFST =
-new FST(_dataBufferIndexInput, _dataBufferIndexInput, 
PositiveIntOutputs.getSingleton(), new OffHeapFSTStore());
+new FST<>(FST.readMetadata(_dataBufferIndexInput, 
PositiveIntOutputs.getSingleton()),

Review Comment:
   Introduced new static class 
[FSTMetadata](https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/util/fst/FST.FSTMetadata.html)
 in Lucene 9.9.0. This is getting used in the FST constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559174589


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/LuceneFSTIndexReader.java:
##
@@ -52,7 +52,8 @@ public LuceneFSTIndexReader(PinotDataBuffer pinotDataBuffer)
 _dataBufferIndexInput = new PinotBufferIndexInput(_dataBuffer, 0L, 
_dataBuffer.size());
 
 _readFST =
-new FST(_dataBufferIndexInput, _dataBufferIndexInput, 
PositiveIntOutputs.getSingleton(), new OffHeapFSTStore());
+new FST<>(FST.readMetadata(_dataBufferIndexInput, 
PositiveIntOutputs.getSingleton()),

Review Comment:
   Introduced new static class 
[FSTMetadata](https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/util/fst/FST.FSTMetadata.html)
 in lucene 9.9.0. This is getting used in FST constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Upgrade lucene to 9.10.0 and compatibility changes to code. [pinot]

2024-04-10 Thread via GitHub


abhioncbr commented on code in PR #12866:
URL: https://github.com/apache/pinot/pull/12866#discussion_r1559170172


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/fst/FSTBuilder.java:
##
@@ -36,28 +36,31 @@
  */
 public class FSTBuilder {
   public static final Logger LOGGER = 
LoggerFactory.getLogger(FSTBuilder.class);
-  private final FSTCompiler _builder = new 
FSTCompiler<>(FST.INPUT_TYPE.BYTE4, PositiveIntOutputs.getSingleton());
+  private final FSTCompiler _fstCompiler =
+  (new FSTCompiler.Builder<>(FST.INPUT_TYPE.BYTE4, 
PositiveIntOutputs.getSingleton())).build();
   private final IntsRefBuilder _scratch = new IntsRefBuilder();
 
   public static FST buildFST(SortedMap input)
   throws IOException {
 PositiveIntOutputs fstOutput = PositiveIntOutputs.getSingleton();
-FSTCompiler fstCompiler = new FSTCompiler<>(FST.INPUT_TYPE.BYTE4, 
fstOutput);
+FSTCompiler.Builder fstCompilerBuilder = new 
FSTCompiler.Builder<>(FST.INPUT_TYPE.BYTE4, fstOutput);
+FSTCompiler fstCompiler = fstCompilerBuilder.build();
 
 IntsRefBuilder scratch = new IntsRefBuilder();
 for (Map.Entry entry : input.entrySet()) {
   fstCompiler.add(Util.toUTF16(entry.getKey(), scratch), 
entry.getValue().longValue());
 }
-return fstCompiler.compile();
+
+return FST.fromFSTReader(fstCompiler.compile(), 
fstCompiler.getFSTReader());

Review Comment:
   This 
[change](https://github.com/apache/lucene/commit/63d4ba938fe0a4a82e615dd66e716684016cfc32)
 in Lucene mentions about getting an FST object from the FSTCompiler. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org