Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914167175

   Thanks for looking into this, can you explain what kind of queries perform 
better with this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Propagate topLevelScoringClause from QueryProfiler [lucene]

2024-01-29 Thread via GitHub


jpountz merged PR #13031:
URL: https://github.com/apache/lucene/pull/13031


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use short-circuit boolean expressions [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13042:
URL: https://github.com/apache/lucene/pull/13042#issuecomment-1914176269

   Thanks, eagerly evaluating all conditions looks unintended indeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use short-circuit boolean expressions [lucene]

2024-01-29 Thread via GitHub


jpountz merged PR #13042:
URL: https://github.com/apache/lucene/pull/13042


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make use of null-checked variable [lucene]

2024-01-29 Thread via GitHub


jpountz merged PR #13040:
URL: https://github.com/apache/lucene/pull/13040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]

2024-01-29 Thread via GitHub


jpountz closed issue #13022: What if we pick up segments in segment size's 
ascending order in TieredMergePolicy.doFindMerges?
URL: https://github.com/apache/lucene/issues/13022


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]

2024-01-29 Thread via GitHub


jpountz commented on issue #13022:
URL: https://github.com/apache/lucene/issues/13022#issuecomment-1914190216

   > but the main reason may be doStall takes longer with data growing
   
   This sounds plausible. FWIW I don't think that anyone actually performed 
testing on a modern HDD, values greater than 1 may perform better.
   
   If you don't mind, I'm closing this issue since it looks like we don't want 
to change the iteration order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub


jfreden commented on PR #13036:
URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914220139

   Output from luceneutil.
   
   **The count tasks:** 
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
CountAndHighMed   87.21  (1.6%)   86.66  
(2.3%)   -0.6% (  -4% -3%) 0.313
   CountAndHighHigh   23.33  (1.5%)   23.42  
(1.3%)0.4% (  -2% -3%) 0.396
CountOrHighHigh   42.71  (1.8%)   42.93  
(1.8%)0.5% (  -3% -4%) 0.382
 CountOrHighMed   42.46  (1.5%)   42.74  
(1.8%)0.7% (  -2% -4%) 0.204
   ```
   **Full output:** 
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
 OrHighMedDayTaxoFacets5.77  (4.1%)5.60  
(6.7%)   -3.1% ( -13% -8%) 0.081
 IntNRQ   95.26 (11.4%)   92.46 
(13.5%)   -2.9% ( -24% -   24%) 0.456
  BrowseDayOfYearSSDVFacets7.05 (13.7%)6.89  
(5.5%)   -2.3% ( -18% -   19%) 0.486
LowIntervalsOrdered   10.54  (6.6%)   10.37  
(6.9%)   -1.7% ( -14% -   12%) 0.434
MedIntervalsOrdered   11.90 (10.3%)   11.71 
(11.9%)   -1.6% ( -21% -   22%) 0.639
   HighIntervalsOrdered   11.92  (9.1%)   11.76 
(10.5%)   -1.3% ( -19% -   20%) 0.667
BrowseRandomLabelSSDVFacets6.16  (8.6%)6.09  
(7.8%)   -1.2% ( -16% -   16%) 0.648
   MedTermDayTaxoFacets   11.06  (3.5%)   10.96  
(2.9%)   -0.8% (  -6% -5%) 0.403
CountAndHighMed   87.21  (1.6%)   86.66  
(2.3%)   -0.6% (  -4% -3%) 0.313
   PKLookup  290.76  (2.3%)  290.12  
(1.7%)   -0.2% (  -4% -3%) 0.727
   AndHighHighDayTaxoFacets   30.24  (1.6%)   30.21  
(1.6%)   -0.1% (  -3% -3%) 0.863
 HighPhrase   23.56  (3.8%)   23.54  
(4.9%)   -0.1% (  -8% -8%) 0.951
   HighSloppyPhrase8.59  (2.6%)8.59  
(3.0%)   -0.1% (  -5% -5%) 0.929
AndHighMedDayTaxoFacets   51.89  (1.8%)   51.86  
(2.1%)   -0.1% (  -3% -3%) 0.916
  HighTermTitleSort  269.41  (3.2%)  269.30  
(2.2%)   -0.0% (  -5% -5%) 0.962
   Wildcard  116.25  (2.3%)  116.22  
(2.3%)   -0.0% (  -4% -4%) 0.969
   HighSpanNear   12.27  (1.5%)   12.27  
(1.2%)   -0.0% (  -2% -2%) 0.953
MedSpanNear5.92  (2.0%)5.92  
(0.8%)   -0.0% (  -2% -2%) 0.997
AndHighHigh   97.01  (4.2%)   97.08  
(2.8%)0.1% (  -6% -7%) 0.945
MedSloppyPhrase7.86  (2.0%)7.86  
(2.0%)0.1% (  -3% -4%) 0.875
  HighTermDayOfYearSort  351.66  (2.5%)  352.13  
(2.2%)0.1% (  -4% -4%) 0.855
   OrHighNotMed  537.02  (4.9%)  538.06  
(4.4%)0.2% (  -8% -9%) 0.895
 AndHighMed  103.19  (4.0%)  103.42  
(3.1%)0.2% (  -6% -7%) 0.843
 TermDTSort  246.04  (4.3%)  246.58  
(3.6%)0.2% (  -7% -8%) 0.858
  BrowseMonthSSDVFacets6.80  (6.4%)6.82  
(6.3%)0.2% ( -11% -   13%) 0.903
Prefix3  274.04  (1.2%)  274.81  
(1.5%)0.3% (  -2% -3%) 0.524
  OrHighNotHigh  633.25  (3.8%)  635.21  
(3.0%)0.3% (  -6% -7%) 0.773
LowSloppyPhrase   33.19  (2.3%)   33.30  
(2.1%)0.3% (  -4% -4%) 0.651
   CountAndHighHigh   23.33  (1.5%)   23.42  
(1.3%)0.4% (  -2% -3%) 0.396
LowSpanNear   43.51  (1.3%)   43.72  
(1.0%)0.5% (  -1% -2%) 0.193
   OrHighNotLow  556.09  (4.7%)  558.71  
(4.7%)0.5% (  -8% -   10%) 0.749
CountOrHighHigh   42.71  (1.8%)   42.93  
(1.8%)0.5% (  -3% -4%) 0.382
 Fuzzy1  100.31  (1.2%)  100.83  
(1.2%)0.5% (  -1% -2%) 0.179
Respell   60.42  (2.2%)   60.74  
(1.9%)0.5% (  -3% -4%) 0.422
  MedPhrase  401.96  (2.6%)  404.29  
(3.2%)0.6% (  -5% -6%) 0.535
 OrHighHigh   33.17 

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


mrkm4ntr commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914318228

   For this case. Suppose ScorerA, B, and C can return valid maxScore. If the 
ScorerD is dominant, larger minCompetitiveScore is set to WANDScorer. But 
ConjunctionScorer returns Infinity as maxScore and scaledMaxScore becomes 0. 
Then WANDScorer cannot skip docs well. Is my understanding correct?
   ```
   DisjunctionMaxScorer(scorers=[
 WANDScorer(scorers=[ConjunctionScorer(ScorerA, ScorerB), ScorerC]),
 ScorerD
   ], tieBreakerMultiplier=0.0)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13036:
URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914327764

   This is a great speedup on `CountOrHighMed`! Too bad it's not faster all the 
time, though I'm not too surprised as conjunctions have more overhead than 
disjunctions when all clauses have a high cost.
   
   As a first step, maybe we can have a simple heuristic to only enable this 
optimization when it's almost guaranteed to yield a speedup? I'm not sure what 
makes the most sense, maybe a threshold on the minimum count across both 
clauses, and only enabling the optimization below this threshold. You'll 
probably need to play with various disjunctions to figure out a threshold that 
works.
   
   One inefficiency that your PR introduces is that it requires more lookups in 
terms dictionaries. We could avoid this by caching the `TermState` for each 
term query, which you could do in your 
`rewriteTwoClauseDisjunctionWithTermsForCount()` utility method: if the 
TermQuery has a null `TermQuery#getTermStates()`, then you could rewrite it to 
a `TermQuery` that has a non-null `TermStates` object.
   
   And maybe as a follow-up we could look again into the old idea of using 
bitsets to evaluate dense conjunctions, just like `BooleanScorer` does for 
disjunctions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914332845

   OK, I see, it's about conjunctions within disjunctions. Thanks for 
explaining.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914335779

   The change looks good to me, can you add a CHANGES entry?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


mrkm4ntr commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914355141

   Thanks. 9.10 as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub


jfreden commented on PR #13036:
URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914370159

   Thanks for the review! 
   
   Great ideas! I will work on adding a simple heuristic and caching the 
`TermState`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


jpountz commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914443248

   Yes please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


s1monw opened a new pull request, #13046:
URL: https://github.com/apache/lucene/pull/13046

   After working on #12829 I was quite surprised about the complexity our BWC 
have and how ancient the appear compared to the rest of our test suite. I took 
a step back and tired to modernize the tests to leverage RandomizedRunner 
Parameterized Tests that allow us to have more structured and hopefully more 
extendible BWC tests in the future. This change doesn't add any new tests but 
tries to make the ones we have more structured and support growth down the 
road. 
   Basically, every index type got it's own Test class that doesn't require to 
loop over all the indices in each test. Each test case is run with all versions 
specified. Several sanity checks are applied in the base class to make 
individual tests smaller and much easier to read. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


s1monw commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1469519342


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestDVUpdateBackwardsCompatibility.java:
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.StringField;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.NoMergePolicy;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.analysis.MockAnalyzer;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.Version;
+
+public class TestDVUpdateBackwardsCompatibility extends 
BackwardsCompatibilityTestBase {
+
+  static final String INDEX_NAME = "dvupdates";
+  static final String SUFFIX = "";
+
+  public TestDVUpdateBackwardsCompatibility(Version version, String pattern) {
+super(version, pattern);
+  }
+
+  @ParametersFactory(argumentFormatting = "Lucene-Version:%1$s; Pattern: %2$s")
+  public static Iterable testVersionsFactory() {
+List params = new ArrayList<>();
+// NOCOMMIT: why are we only testing one version here?

Review Comment:
   maybe somebody has an answer to that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


s1monw commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1469520535


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexUpgradeBackwardsCompatibility.java:
##
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexUpgrader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.MergePolicy;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.index.SegmentInfos;
+import org.apache.lucene.store.ByteBuffersDirectory;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FSDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.NIOFSDirectory;
+import org.apache.lucene.tests.analysis.MockAnalyzer;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.InfoStream;
+import org.apache.lucene.util.Version;
+
+public class TestIndexUpgradeBackwardsCompatibility extends 
BackwardsCompatibilityTestBase {
+
+  public TestIndexUpgradeBackwardsCompatibility(Version version, String 
pattern) {
+super(version, pattern);
+  }
+
+  @ParametersFactory(argumentFormatting = "Lucene-Version:%1$s; Pattern: %2$s")
+  public static Iterable testVersionsFactory() throws 
IllegalAccessException {
+Iterable allSupportedVersions =
+allVersion(
+TestBasicBackwardsCompatibility.INDEX_NAME,
+TestBasicBackwardsCompatibility.SUFFIX_CFS,
+TestBasicBackwardsCompatibility.SUFFIX_NO_CFS);
+return allSupportedVersions;
+  }
+
+  /** Randomizes the use of some of hte constructor variations */
+  static IndexUpgrader newIndexUpgrader(Directory dir) {
+final boolean streamType = random().nextBoolean();
+final int choice = TestUtil.nextInt(random(), 0, 2);
+switch (choice) {
+  case 0:
+return new IndexUpgrader(dir);
+  case 1:
+return new IndexUpgrader(dir, streamType ? null : 
InfoStream.NO_OUTPUT, false);
+  case 2:
+return new IndexUpgrader(dir, newIndexWriterConfig(null), false);
+  default:
+fail("case statement didn't get updated when random bounds changed");
+}
+return null; // never get here
+  }
+
+  public void testUpgradeOldIndex() throws Exception {
+Directory dir = newDirectory(directory);
+int indexCreatedVersion = 
SegmentInfos.readLatestCommit(dir).getIndexCreatedVersionMajor();
+newIndexUpgrader(dir).upgrade();
+
+checkAllSegmentsUpgraded(dir, indexCreatedVersion);
+dir.close();
+  }
+
+  @Override
+  protected void createIndex(Directory directory) throws IOException {
+if (indexPattern.equals(
+createPattern(
+TestBasicBackwardsCompatibility.INDEX_NAME,
+TestBasicBackwardsCompatibility.SUFFIX_CFS))) {
+  TestBasicBackwardsCompatibility.createIndex(directory, true, false);
+} else {
+  TestBasicBackwardsCompatibility.createIndex(directory, false, false);
+}
+  }
+
+  public void testUpgradeOldSingleSegmentIndexWithAdditions() throws Exception 
{
+// NOCOMMIT we use to have single segment indices but we stopped creating 
them at some point

Review Comment:
   maybe somebody has an answer to that, this puzzles me as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubsc

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-01-29 Thread via GitHub


benwtrent commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1914618652

   I have done this test back in Lucene 9.4, and we still end up every once in 
a while a graph where the mean number of connections hovers around `1` and 
whose connectedness is very low. It may be that these particular docs are 
tightly clustered, but even then, it seems weird that it should end up being so 
bad.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] AES Encrypted Directory [LUCENE-2228] [lucene]

2024-01-29 Thread via GitHub


osnatShomrony commented on issue #3304:
URL: https://github.com/apache/lucene/issues/3304#issuecomment-1914637568

   With the new requirement of PCI 4.0 that disk encryption cannot be the only 
protection for data at rest, this contribution becomes very crucial, is there 
any progress with this ?
   https://www.vikingcloud.com/blog/pci-dss-v4-are-you-using-disk-encryption
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


mrkm4ntr commented on PR #13043:
URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914637603

   Added, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Contribution: Codec for index-level encryption [LUCENE-6966] [lucene]

2024-01-29 Thread via GitHub


osnatShomrony commented on issue #8023:
URL: https://github.com/apache/lucene/issues/8023#issuecomment-1914638272

   With the new requirement of PCI 4.0 that disk encryption cannot be the only 
protection for data at rest, this contribution becomes very crucial, is there 
any progress with this ?
   https://www.vikingcloud.com/blog/pci-dss-v4-are-you-using-disk-encryption


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Directory based approach for index encryption [LUCENE-9379] [lucene]

2024-01-29 Thread via GitHub


osnatShomrony commented on issue #10419:
URL: https://github.com/apache/lucene/issues/10419#issuecomment-1914640685

   With the new requirement of PCI 4.0 that disk encryption cannot be the only 
protection for data at rest, this contribution becomes very crucial, is there 
any progress with this ?
   https://www.vikingcloud.com/blog/pci-dss-v4-are-you-using-disk-encryption


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


jpountz commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1469520708


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBinaryBackwardsCompatibility.java:
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexCommit;
+import org.apache.lucene.index.IndexFormatTooOldException;
+import org.apache.lucene.index.SegmentInfos;
+import org.apache.lucene.index.StandardDirectoryReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.store.BaseDirectoryWrapper;
+import org.apache.lucene.util.Version;
+
+public class TestBinaryBackwardsCompatibility extends 
BackwardsCompatibilityTestBase {
+
+  static final int MIN_BINARY_SUPPORTED_MAJOR = Version.MIN_SUPPORTED_MAJOR - 
1;
+  static final String INDEX_NAME = "unsupported";
+  static final String SUFFIX_CFS = "-cfs";
+  static final String SUFFIX_NO_CFS = "-nocfs";
+
+  public TestBinaryBackwardsCompatibility(Version version, String pattern) {
+super(version, pattern);
+  }
+
+  @ParametersFactory(argumentFormatting = "Lucene-Version:%1$s; Pattern: %2$s")
+  public static Iterable testVersionsFactory() {
+List params = new ArrayList<>();
+for (Version version : BINARY_SUPPORTED_VERSIONS) {
+  params.add(new Object[] {version, createPattern(INDEX_NAME, 
SUFFIX_CFS)});
+  params.add(new Object[] {version, createPattern(INDEX_NAME, 
SUFFIX_NO_CFS)});
+}
+return params;
+  }
+
+  @Override
+  void verifyUsesDefaultCodec(Directory dir, String name) throws IOException {
+// don't this will fail since the indices are not supportd

Review Comment:
   ```suggestion
   // don't this will fail since the indices are not supported
   ```



##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestEmptyIndexBackwardsCompatibility.java:
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.NoMergePolicy;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.analysis.MockAnalyzer;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.Version;
+
+public class TestEmptyIndexBackwardsCompatibility extends 
BackwardsCompatibilityTestBase {
+  static final String INDEX_NAME = "empty";
+  static final String SUFFIX = "";
+
+  public TestEmptyIndexBackwardsCompatibility(Version version, String pattern) 
{
+super(version, pattern);
+  }
+
+  @Override
+  protected void createIndex(Directory directory) throws IOException {
+IndexWriterConfig conf =
+new IndexWriterConfig(new MockAnalyzer(random()))
+.setUseCompoundFile(false)
+.setCodec(TestUtil.getDefaultCodec())
+.setMergePolicy(NoMergePolicy.INSTANCE);
+try (IndexWriter writer = new IndexWriter(directory, conf)) {
+  wr

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub


jpountz merged PR #13043:
URL: https://github.com/apache/lucene/pull/13043


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


sabi0 opened a new pull request, #13048:
URL: https://github.com/apache/lucene/pull/13048

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use String.indexOf(char) where possible [lucene]

2024-01-29 Thread via GitHub


sabi0 opened a new pull request, #13049:
URL: https://github.com/apache/lucene/pull/13049

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub


sabi0 opened a new pull request, #13052:
URL: https://github.com/apache/lucene/pull/13052

   When list size is greater `list.contains()` will be called for each set 
element.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


dweiss commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1915011660

   I like it. Whenever there is test repetition that can be driven by data, it 
should be driven by data. 
   
   The only downside to using ParametersFactory is that it's something that is 
implemented by the randomizedtesting framework (not JUnit itself) and people 
may be unfamiliar with it - perhaps a comment on the class (or the 
parameterized constructor) pointing at where the arguments come from would make 
it easier to grasp how this thing works?
   
   Plain JUnit alternatives do exist but they rely on a different test runner 
(Parameterized) - you wouldn't be able to extend from LuceneTestCase anymore 
(and you'd be outside of the randomization context). I don't see any other, 
more elegant, solutions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


dweiss commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1915017690

   Oh, one more problem is that you can't "see" the structure of tests before 
you actually run them (in an IDE). Don't know how much of an issue this is in 
practice but it's impossible to solve with JUnit4. Some IDEs may also have 
problems repeating a particular test if its name contains parameters and their 
names. But I don't think we should think in terms of IDE support - the final 
outcome and benefit for maintainability matters more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub


mikemccand commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1469886161


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexUpgradeBackwardsCompatibility.java:
##
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexUpgrader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.MergePolicy;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.index.SegmentInfos;
+import org.apache.lucene.store.ByteBuffersDirectory;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FSDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.NIOFSDirectory;
+import org.apache.lucene.tests.analysis.MockAnalyzer;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.InfoStream;
+import org.apache.lucene.util.Version;
+
+public class TestIndexUpgradeBackwardsCompatibility extends 
BackwardsCompatibilityTestBase {
+
+  public TestIndexUpgradeBackwardsCompatibility(Version version, String 
pattern) {
+super(version, pattern);
+  }
+
+  @ParametersFactory(argumentFormatting = "Lucene-Version:%1$s; Pattern: %2$s")
+  public static Iterable testVersionsFactory() throws 
IllegalAccessException {
+Iterable allSupportedVersions =
+allVersion(
+TestBasicBackwardsCompatibility.INDEX_NAME,
+TestBasicBackwardsCompatibility.SUFFIX_CFS,
+TestBasicBackwardsCompatibility.SUFFIX_NO_CFS);
+return allSupportedVersions;
+  }
+
+  /** Randomizes the use of some of hte constructor variations */
+  static IndexUpgrader newIndexUpgrader(Directory dir) {
+final boolean streamType = random().nextBoolean();
+final int choice = TestUtil.nextInt(random(), 0, 2);
+switch (choice) {
+  case 0:
+return new IndexUpgrader(dir);
+  case 1:
+return new IndexUpgrader(dir, streamType ? null : 
InfoStream.NO_OUTPUT, false);
+  case 2:
+return new IndexUpgrader(dir, newIndexWriterConfig(null), false);
+  default:
+fail("case statement didn't get updated when random bounds changed");
+}
+return null; // never get here
+  }
+
+  public void testUpgradeOldIndex() throws Exception {
+Directory dir = newDirectory(directory);
+int indexCreatedVersion = 
SegmentInfos.readLatestCommit(dir).getIndexCreatedVersionMajor();
+newIndexUpgrader(dir).upgrade();
+
+checkAllSegmentsUpgraded(dir, indexCreatedVersion);
+dir.close();
+  }
+
+  @Override
+  protected void createIndex(Directory directory) throws IOException {
+if (indexPattern.equals(
+createPattern(
+TestBasicBackwardsCompatibility.INDEX_NAME,
+TestBasicBackwardsCompatibility.SUFFIX_CFS))) {
+  TestBasicBackwardsCompatibility.createIndex(directory, true, false);
+} else {
+  TestBasicBackwardsCompatibility.createIndex(directory, false, false);
+}
+  }
+
+  public void testUpgradeOldSingleSegmentIndexWithAdditions() throws Exception 
{
+// NOCOMMIT we use to have single segment indices but we stopped creating 
them at some point

Review Comment:
   I don't know why we stopped!  We should start again?  But maybe as followon?



##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexUpgradeBackwardsCompatibility.java:
##
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The AS

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


uschindler commented on code in PR #13048:
URL: https://github.com/apache/lucene/pull/13048#discussion_r1469932148


##
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java:
##
@@ -185,7 +185,7 @@ static VectorizationProvider lookup(boolean testMode) {
*/
   private static Optional lookupVectorModule() {
 return 
Optional.ofNullable(VectorizationProvider.class.getModule().getLayer())
-.orElse(ModuleLayer.boot())

Review Comment:
   This change is not necessary. The boot layer is a static constant in system 
class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13048:
URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915203166

   For your cleanup pull requests to be merged, please make sure to add a 
description to your issue, why those changes are useful.
   
   I agree to merge this (any some of your other PRs), but before doing that we 
would like to have a changes entry including your full name to give you credit.
   
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-29 Thread via GitHub


jmazanec15 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1915207682

   @kevindrosendahl This is really cool! I had a couple questions around 
product quantization implementation. I see in 
   
[VectorSandboxVamanaVectorsWriter](https://github.com/kevindrosendahl/lucene/blob/vamana2/lucene/core/src/java/org/apache/lucene/codecs/vectorsandbox/VectorSandboxVamanaVectorsWriter.java#L295),
 you create a new codebook for each segment. Do you know roughly how long it 
took to generate?  Also, how did you choose 6 iterations of k-Means? It seems 
from the results they look pretty promising - but just curious how you arrived 
there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


sabi0 commented on code in PR #13048:
URL: https://github.com/apache/lucene/pull/13048#discussion_r1470080459


##
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java:
##
@@ -185,7 +185,7 @@ static VectorizationProvider lookup(boolean testMode) {
*/
   private static Optional lookupVectorModule() {
 return 
Optional.ofNullable(VectorizationProvider.class.getModule().getLayer())
-.orElse(ModuleLayer.boot())

Review Comment:
   Thank you for the insightful comments.
   I will remove the unwanted changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


sabi0 commented on PR #13048:
URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915392997

   I've added my name to the GitHub profile. And will review the PRs to 
describe the reasoning.
   
   -Dmitry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub


sabi0 commented on code in PR #13039:
URL: https://github.com/apache/lucene/pull/13039#discussion_r1470102508


##
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/AnalyzerQueryNodeProcessor.java:
##
@@ -107,10 +105,9 @@ public QueryNode process(QueryNode queryTree) throws 
QueryNodeException {
   @Override
   protected QueryNode postProcessNode(QueryNode node) throws 
QueryNodeException {
 
-if (node instanceof TextableQueryNode
+if (node instanceof FieldQueryNode

Review Comment:
   'node' is cast below to `FieldQueryNode`. But the condition here was 
checking for a base type `TextableQueryNode`, excluding some sub-types. If a 
new `TextableQueryNode` sub-type was added the code below could run into a CCE.
   
   Having the `instanceof FieldQueryNode` condition here makes the type cast 
below safe. And also eliminates the need to exclude the `RegexpQueryNode` as it 
does not subclass `FieldQueryNode`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Clean up AnyQueryNode code [lucene]

2024-01-29 Thread via GitHub


sabi0 opened a new pull request, #13053:
URL: https://github.com/apache/lucene/pull/13053

   - removed redundant field initializers
   - fixed a typo in the field name: `minimumMatching_m_Elements`
   - removed redundant `instanceof` checks
   - replaced type casts with a pattern variable
   - changed `size() == 0` to `isEmpty()`
   - removed unnecessary explicit `toString()` calls


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13048:
URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915623346

   It would be nice to have a changes entry in `CHANGES.txt`. You can add it to 
9.10 section and I will cherry pick the changes in Lucene 9.x.
   
   As you have more open PRs with cleanups, it would be good to have a generic 
one and we lost all your PR numbers there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove concatenation in String.format() calls [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13038:
URL: https://github.com/apache/lucene/pull/13038#issuecomment-1915649179

   Thanks for this. The code also has safety problems. If the concatted text 
para contain percent signs it would break. So format strings should never ever 
be constructed with variables containing inches ked strings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


sabi0 commented on PR #13048:
URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915656350

   I hope I understood your suggestions with CHANGES.txt right. Please let me 
know if I should change something there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13048:
URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915682882

   Looks fine. Have you added all PR numbers that were merged today and the 
ones I approved already?  I will merge them soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove concatenation in String.format() calls [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13038:
URL: https://github.com/apache/lucene/pull/13038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use String.replace() instead of replaceAll() [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13047:
URL: https://github.com/apache/lucene/pull/13047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use String.indexOf(char) where possible [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13049:
URL: https://github.com/apache/lucene/pull/13049


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13051:
URL: https://github.com/apache/lucene/pull/13051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use String.isEmpty() instead of equals("") [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13050:
URL: https://github.com/apache/lucene/pull/13050


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub


uschindler merged PR #13048:
URL: https://github.com/apache/lucene/pull/13048


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13051:
URL: https://github.com/apache/lucene/pull/13051#issuecomment-1915761017

   When backporting this to Lucene 9.x, it confliced because the sets have 
different contents in older version. I fixed this.
   
   Nevertheless, the static final constants should be unmodifiable sets, so we 
should possibly use Java 9+ `Set.of()` instead of `EnumSet.of()`(which is 
modifiable) or wrap the `EnumSet` with `Collections.unmodifiableSet()`.
   
   Could you open an issue about this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13052:
URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915775381

   I am not sure if I like this code. Unfortunately the spec of 
`Set#removeAll()` is a bit stupid and should only use `argument.contains()` if 
it is a set (so almost constant contains is possible).
   
   I'd like to ask to open a bug in OpenJDK so this gets fixed at some point.
   
   Did you make a benchmarking if this affects Lucene at all?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub


uschindler commented on code in PR #13039:
URL: https://github.com/apache/lucene/pull/13039#discussion_r1470387185


##
lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java:
##
@@ -62,8 +62,7 @@ public boolean equals(Object other) {
   return true;
 }
 
-if (other instanceof PayloadAttribute) {
-  PayloadAttributeImpl o = (PayloadAttributeImpl) other;
+if (other instanceof PayloadAttributeImpl o) {

Review Comment:
   This looks like an unrelated change. There is a PR about this type of code 
change already. It can only be applied to main branch as it requires Java 17.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub


uschindler commented on code in PR #13039:
URL: https://github.com/apache/lucene/pull/13039#discussion_r1470388288


##
lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java:
##
@@ -62,8 +62,7 @@ public boolean equals(Object other) {
   return true;
 }
 
-if (other instanceof PayloadAttribute) {
-  PayloadAttributeImpl o = (PayloadAttributeImpl) other;
+if (other instanceof PayloadAttributeImpl o) {

Review Comment:
   See this PR: #12295



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] clean up smoketester GPG leaks [lucene]

2024-01-29 Thread via GitHub


github-actions[bot] commented on PR #12882:
URL: https://github.com/apache/lucene/pull/12882#issuecomment-1915807209

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-01-29 Thread via GitHub


github-actions[bot] commented on PR #12831:
URL: https://github.com/apache/lucene/pull/12831#issuecomment-1915807375

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13052:
URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915822328

   I think we should apply this PR. When merging, Lucene always has a larger 
set before the merge so the new set set is always smaller. So 
`AbstractSet#removeAll` will always use the slow path.
   
   We should open a bug report in JDK, that the optimization in `AbstractSet` 
should only be used if the collection passed as argument is also a set. In all 
other cases it is for sure faster to iterate over the collection passed as 
parameter and remove from actual set. The JDK impl has also some bugs with 
TreeSet as the contains of a TreeSet may not be symmetric with contains of a 
List.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub


uschindler commented on PR #13052:
URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915825022

   Maybe put a comment here describing why foreach is faster, especially 
because merging will always produce smaller sets so always triggering slow path.
   
   @jpountz what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org