[GitHub] [lucene] rmuir commented on issue #12023: Mechanism to interrupt long-running/resource intensive queries

2022-12-15 Thread GitBox


rmuir commented on issue #12023:
URL: https://github.com/apache/lucene/issues/12023#issuecomment-1353025015

   determinization has already been removed here. that is the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #12023: Mechanism to interrupt long-running/resource intensive queries

2022-12-15 Thread GitBox


rmuir closed issue #12023: Mechanism to interrupt long-running/resource 
intensive queries
URL: https://github.com/apache/lucene/issues/12023


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Bukhtawar opened a new issue, #12023: Mechanism to interrupt long-running/resource intensive queries

2022-12-15 Thread GitBox


Bukhtawar opened a new issue, #12023:
URL: https://github.com/apache/lucene/issues/12023

   ### Description
   
   As a part of https://github.com/opensearch-project/OpenSearch/issues/687 we 
detected that regex queries can run into tight loops for quite long. Below is 
the stack trace of the request for a wildcard query which consumed 100% CPU for 
an hour(although addressed in 
https://issues.apache.org/jira/browse/LUCENE-9981). 
   OpenSearch has a mechanism to cancel task on timeout and one other option 
that was we were deliberating was to send interrupts to the long running thread 
if the request is consuming more Memory/CPU or time, independent of query 
timeout. 
   The problem is not all costly executions in Lucene have a check on 
interrupts much like ExitableDirectoryReader#checkAndThrow
   
   ```
   private void checkAndThrow() {
 if (queryTimeout.shouldExit()) {
   throw new ExitingReaderException("The request took too long to 
iterate over point values. Timeout: "
   + queryTimeout.toString()
   + ", PointValues=" + in
   );
 } else if (Thread.interrupted()) {
   throw new ExitingReaderException("Interrupted while iterating over 
point values. PointValues=" + in);
 }
   }
   ```
   
   Sharing the stack trace for the request
   ```
   100.2% (500.8ms out of 500ms) cpu usage by thread 
'opensearch[917451917ca5731579187db45dd52853][search][T#5]'
4/10 snapshots sharing following 36 elements
  
app//org.apache.lucene.util.automaton.Operations.determinize(Operations.java:780)
  
app//org.apache.lucene.util.automaton.Operations.getCommonSuffixBytesRef(Operations.java:1155)
  
app//org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:245)
  
app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:110)
  
app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:87)
  
app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:71)
  
app//org.apache.lucene.search.WildcardQuery.(WildcardQuery.java:56)
  
app//org.opensearch.index.mapper.StringFieldType.wildcardQuery(StringFieldType.java:158)
  
app//org.opensearch.index.query.WildcardQueryBuilder.doToQuery(WildcardQueryBuilder.java:259)
  
app//org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:116)
  
app//org.opensearch.index.query.BoolQueryBuilder.addBooleanClauses(BoolQueryBuilder.java:337)
  
app//org.opensearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:321)
  
app//org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:116)
  
app//org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:386)
  
app//org.opensearch.index.query.QueryShardContext$$Lambda$5010/0x000801d4d840.apply(Unknown
 Source)
  
app//org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:398)
  
app//org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:385)
  
app//org.opensearch.search.SearchService.parseSource(SearchService.java:903)
  
app//org.opensearch.search.SearchService.createContext(SearchService.java:740)
  
app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:442)
  
app//org.opensearch.search.SearchService.access$500(SearchService.java:155)
  
app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:415)
  
app//org.opensearch.search.SearchService$2$$Lambda$4889/0x000801ce9840.get(Unknown
 Source)
  
app//org.opensearch.search.SearchService$$Lambda$4891/0x000801ce9c40.get(Unknown
 Source)
  
app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:71)
  
app//org.opensearch.action.ActionRunnable$$Lambda$3741/0x0008011fa440.accept(Unknown
 Source)
  
app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:86)
  
app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
  
app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
  
app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
  
app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
  
app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:774)
  
app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
  
java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  
java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  

[GitHub] [lucene] rmuir commented on issue #12021: Large fields with large="true" can be truncated in v9+

2022-12-15 Thread GitBox


rmuir commented on issue #12021:
URL: https://github.com/apache/lucene/issues/12021#issuecomment-1352977757

   This looks like a bug in solr code (SolrDocumentFetcher) so I'd recommend 
opening a bug over at https://github.com/apache/solr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on a diff in pull request #12022: Fix flat polygons incorrectly containing intersecting geometries

2022-12-15 Thread GitBox


iverase commented on code in PR #12022:
URL: https://github.com/apache/lucene/pull/12022#discussion_r1049521314


##
lucene/CHANGES.txt:
##
@@ -68,6 +68,8 @@ Bug Fixes
 * LUCENE-10599: LogMergePolicy is more likely to keep merging segments until
   they reach the maximum merge size. (Adrien Grand)
 
+* GITHUB#12020: Fixes bug whereby very flat polygons can incorrectly contain 
intersecting geometries. (Craig Taverner)
+

Review Comment:
   Could you move this entry to the lucene 9.5 section? I am planning in 
backporting it so it will make life easier



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on a diff in pull request #12022: Fix flat polygons incorrectly containing intersecting geometries

2022-12-15 Thread GitBox


iverase commented on code in PR #12022:
URL: https://github.com/apache/lucene/pull/12022#discussion_r1049521314


##
lucene/CHANGES.txt:
##
@@ -68,6 +68,8 @@ Bug Fixes
 * LUCENE-10599: LogMergePolicy is more likely to keep merging segments until
   they reach the maximum merge size. (Adrien Grand)
 
+* GITHUB#12020: Fixes bug whereby very flat polygons can incorrectly contain 
intersecting geometries. (Craig Taverner)
+

Review Comment:
   Could you move this entry to Lucene 9.5? I am planning in backporting it so 
it will make life easier



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] craigtaverner opened a new pull request, #12022: Fix flat polygons incorrectly containing intersecting geometries

2022-12-15 Thread GitBox


craigtaverner opened a new pull request, #12022:
URL: https://github.com/apache/lucene/pull/12022

   Fixes https://github.com/apache/lucene/issues/12020
   
   ### Description
   
   When performing a search using a shape geometry query of relation type 
`QueryRelation.CONTAINS`, it is possible to get a false positive when two 
geometries intersect, but neither actually contains the other. This happens if 
the indexed geometry is a polygon that is so flat that one of its triangles is 
simplified to a single line segment. The bug is that the line segment currently 
records whether it is part of the external polygon by taking that knowledge 
from first line segment of the triangle, not necessarily the part of the 
triangle being retained. This first line segment could be an internal line 
(part of the triangle, but not the polygon). The simplification code should 
instead take this knowledge from a line segment that is not being collapsed by 
the simplification. The consequence of this bug occur later during the contains 
search when the search query deduces that the geometry is contained because the 
polygon is not closed. The search does not realise it intersects an outer
  line of the polygon because that line is not correctly marked as outer.
   
   The fix is to select the correct flag from the surviving line segments 
instead of the first line segment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nosvalds opened a new issue, #12021: Large fields with large="true" can be truncated in v9+

2022-12-15 Thread GitBox


nosvalds opened a new issue, #12021:
URL: https://github.com/apache/lucene/issues/12021

   ### Description
   
   ## Issue
   
   For fields using `large="true"`, large fields (which is what they are 
intended for) can be truncated in v9+ of Lucene.
   
   Example fieldtype definition:
   ```
   
   ```
   
   ## Cause
   Looks like this is a bug introduced along with 
[LUCENE-8805](https://issues.apache.org/jira/browse/LUCENE-8805) / 
https://github.com/apache/lucene/issues/9849:
   
   
[https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/sol[…]r/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java](https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465)
   
   Specifically with respect to "large" fields handling.
   
   The length in utf8 bytes will often be longer than the string length 
`value.length()`, hence the truncation.
   
   ## Fix
   
   The Fix would be:
   
   `bytesRef.length = bytesRef.bytes.length`
   
   ### Version and environment details
   
   - Solr v9.1.0
   - Lucene v9.3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] craigtaverner opened a new issue, #12020: Very flat polygons give incorrect 'contains' result

2022-12-15 Thread GitBox


craigtaverner opened a new issue, #12020:
URL: https://github.com/apache/lucene/issues/12020

   ### Description
   
   When performing a search using a shape geometry query of relation type 
`QueryRelation.CONTAINS`, it is possible to get a false positive when two 
geometries intersect, but neither actually contains the other. This happens if 
the indexed geometry is a polygon that is so flat that one of its triangles is 
simplified to a single line segment. The bug is that the line segment currently 
records whether it is part of the external polygon by taking that knowledge 
from first line segment of the triangle, not necessarily the part of the 
triangle being retained. This first line segment could be an internal line 
(part of the triangle, but not the polygon). The simplification code should 
instead take this knowledge from a line segment that is not being collapsed by 
the simplification. The consequence of this bug occur later during the contains 
search when the search query deduces that the geometry is contained because the 
polygon is not closed. The search does not realise it intersects an outer
  line of the polygon because that line is not correctly marked as outer.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-15 Thread GitBox


LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162986


##
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() 
throws Exception {
 }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   addressed in 
https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369
 and 
https://github.com/apache/lucene/pull/12017/commits/ae3f8d67ed75724ac2662c24025c85ae7008612a



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox


LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049163040


##
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() 
throws Exception {
 }
   }
 
+  public void testAggressiveMatchCount() throws IOException {
+Directory dir = newDirectory();
+IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+Document doc = new Document();
+LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L);
+doc.add(longPoint);
+StringField stringField = new StringField("string", "abc", Store.NO);
+doc.add(stringField);
+writer.addDocument(doc);
+longPoint.setLongValues(10L, 11L, 12L);
+stringField.setStringValue("xyz");
+writer.addDocument(doc);
+
+IndexReader reader = DirectoryReader.open(writer);
+writer.close();
+IndexSearcher searcher = new IndexSearcher(reader);
+
+long[] lower = new long[] {4L, 5L, 6L};
+long[] upper = new long[] {9L, 10L, 11L};
+Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper);
+assert reader.leaves().size() == 1;
+assert searcher
+.createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f)
+.count(reader.leaves().get(0))
+== -1;
+
+Query query =
+new BooleanQuery.Builder()
+.add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+.add(unknownCountQuery, Occur.MUST_NOT)
+.add(new MatchAllDocsQuery(), Occur.MUST_NOT)
+.build();
+Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+// count of the first MUST_NOT clause is unknown, but the second MUST_NOT 
clause matches all
+// docs
+assertEquals(0, weight.count(reader.leaves().get(0)));
+
+query =
+new BooleanQuery.Builder()
+.add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+.add(unknownCountQuery, Occur.MUST_NOT)
+.add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT)
+.build();
+weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+// count of the first MUST_NOT clause is unknown, though the second 
MUST_NOT clause matche one
+// doc, we can't figure out the number of
+// docs
+assertEquals(-1, weight.count(reader.leaves().get(0)));
+
+// test pure disConjunction

Review Comment:
   addressed in 
https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox


LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162892


##
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##
@@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws 
IOException {
   private int optCount(LeafReaderContext context, Occur occur) throws 
IOException {
 final int numDocs = context.reader().numDocs();
 int optCount = 0;
+boolean unknownCount = false;
 for (WeightedBooleanClause weightedClause : weightedClauses) {
   if (weightedClause.clause.getOccur() != occur) {
 continue;
   }
   int count = weightedClause.weight.count(context);
-  if (count == -1 || count == numDocs) {
-// If any of the clauses has a number of matches that is unknown, the 
number of matches of
-// the disjunction is unknown.
+  if (count == -1) {
+// If one clause has a number of matches that is unknown, let's be 
more aggressive to check
+// whether remain clauses could match all docs.
+unknownCount = true;
+continue;

Review Comment:
   addressed in  
https://github.com/apache/lucene/pull/12017/commits/dc13ae7bddc3ca272af7c45dd998258984469904



##
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##
@@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur 
occur) throws IOException
 return -1;

Review Comment:
   Thanks @jpountz , you are right. addressed in  
https://github.com/apache/lucene/pull/12017/commits/477b2a4e1f95f408562fc1745979bf9815ffaa75



##
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() 
throws Exception {
 }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   addressed in 
https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352328979

   > I will check it out and inspect the coverage report from .js tests and see 
if there are any holes. If i find them I will push more tests. I am just really 
paranoid about some of the slow things antlr4 will do, having fought thru these 
issues with @jdconrad once before.
   
   I am a bit afraid because those blobs with DFAs growed quite a lot in the 
patch. So yes, let's review how it behaves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352322297

   I will check it out and inspect the coverage report from .js tests and see 
if there are any holes. If i find them I will push more tests. I am just really 
paranoid about some of the slow things antlr4 will do, having fought thru these 
issues with @jdconrad once before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049026102


##
lucene/expressions/src/java/module-info.java:
##
@@ -19,7 +19,7 @@
 module org.apache.lucene.expressions {

Review Comment:
   Ok. No problem. I just noticed name change. I think it's is just a named 
automodule with a manifest entry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352291207

   > sorry i didnt do the other tests, I gotta run for now. i can do them later 
tonight or tomorrow if you need but just wanted to prototype it out
   
   @rmuir tests have been migrated, thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


reta commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049019777


##
lucene/expressions/src/java/module-info.java:
##
@@ -19,7 +19,7 @@
 module org.apache.lucene.expressions {

Review Comment:
   ah ... it is still automatic :( although changed ...
   
   ```
   module-info.java:21: warning: requires directive for an automatic module
 requires org.antlr.antlr4.runtime;
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049016403


##
lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.expressions.js;
+
+/** Settings for expression compiler for javascript expressions. */
+final class JavascriptCompilerSettings {

Review Comment:
   I agree with Uwe, let's move to a boolean (actually only need one 
package-private ctor with the boolean, the way the new base CompilerTestCase 
class works). 
   
   If we have more options to this thing later, we can deal with the issue at 
that time. It is all package-private anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049015663


##
lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.expressions.js;
+
+/** Settings for expression compiler for javascript expressions. */
+final class JavascriptCompilerSettings {

Review Comment:
   Personally I hate mutable instances with getters and setters. 郎



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


reta commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049014391


##
lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.expressions.js;
+
+/** Settings for expression compiler for javascript expressions. */
+final class JavascriptCompilerSettings {

Review Comment:
   No, not really, but it could be useful to add more settings later on 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049010145


##
lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.expressions.js;
+
+/** Settings for expression compiler for javascript expressions. */
+final class JavascriptCompilerSettings {

Review Comment:
   Do we really need this additional class only holding one boolean? I think a 
simple additional PKG private compile method taking the boolean picky would 
have same effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on code in PR #12016:
URL: https://github.com/apache/lucene/pull/12016#discussion_r1049005864


##
lucene/expressions/src/java/module-info.java:
##
@@ -19,7 +19,7 @@
 module org.apache.lucene.expressions {

Review Comment:
   Can we now remove this suppression? I think all modules here are now real 
ones.



##
lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.expressions.js;
+
+/** Settings for expression compiler for javascript expressions. */
+final class JavascriptCompilerSettings {

Review Comment:
   Do we really need this additional class only holding one boolean? I think a 
simple additional PKG private ctor would have same effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352259233

   sorry i didnt do the other tests, I gotta run for now. i can do them later 
tonight or tomorrow if you need but just wanted to prototype it out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352258413

   > I pushed a proposal to your branch (only fixing one of the tests in .js to 
use it). If you are really against it, just revert the commit. But i think it 
keeps tests simpler.
   
   :+1: No objections, thanks a lot for helping out @rmuir , it keeps tests 
simpler indeed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352248407

   I pushed a proposal to your branch (only fixing one of the tests in .js to 
use it). If you are really against it, just revert the commit. But i think it 
keeps tests simpler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352177919

   we don't need to parameterize the pickiness IMO, we can just turn it on in 
these tests. Thanks for getting this hooked up. I will look at your branch and 
try to play with it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352084948

   @rmuir first pass over pickiness, all tests are now run w/o diagnostics 
listener (picky / not picky mode), the rough observations so far are 
encouraging, there are no significant changes in the test duration between 
these two modes (some random timings below):
   
   
![image](https://user-images.githubusercontent.com/509855/207701868-7a252ef2-20c4-4169-979c-4bb5c1f08f4e.png)
   
   I am moving further to compare the numbers against `main` and to conduct 
more precise measurements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent opened a new pull request, #12019: Clean up vector backward-codecs

2022-12-14 Thread GitBox


benwtrent opened a new pull request, #12019:
URL: https://github.com/apache/lucene/pull/12019

   I noticed that the Lucene94 backward-codec was still allowing KNN fields 
writer. I have removed it, and updated the related tests similarly to other KNN 
backward-codec changes.
   
   There are various other house-keeping things that I noticed in the 
backward_codecs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox


benwtrent commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
* @throws IllegalArgumentException if k is less than 1
*/
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the k nearest documents to the target vector according 
to the vectors in the
+   * given field. target vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   @msokolov you haven't missed anything. I am specifically talking about users 
providing `similarityThreshold` to the query. If they have calculating that 
they want a specific `cosine` or `dotProduct` similarity, they would then need 
to adjust that to match Lucene's scoring transformation.
   
   I think that `similarityThreshold` should mean vector similarities. We can 
transform it for the user to reflect the score that similarity represents 
(given vector encoding type and similarity function).
   
   
   An example here is `dotProduct`. The user knows they want `FLOAT32` vectors 
within a dotProduct of 0.7. With this API that ACTUALLY means they want to 
limit the scores to .85 (`(1 + dotProduct)/2`). How is the user supposed to 
know that?
   
   This seems really weird to me.
   
   This doesn't take into account the different scoring methods between vector 
types as well, which can get even more confusing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox


msokolov commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048846385


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
* @throws IllegalArgumentException if k is less than 1
*/
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the k nearest documents to the target vector according 
to the vectors in the
+   * given field. target vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   Well, the scores we are talking about here are at least always in [0, 1]. 
I'm not sure what you mean by the actual similarity of vectors. We used to have 
a two-step process where we would compute the similarity and then convert to a 
query score, but I think it's unified today and they are the same? Aren't the 
scores being thresholded here the output of VectorSimilarityFunction.compare? I 
may have missed something along the way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API

2022-12-14 Thread GitBox


rmuir commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351946133

   I would tend to expect some organization like this:
   
   indexing:
   * ByteVectorField
   * FloatVectorField
   
   searching:
   * ByteVectorField.newQuery
   * FloatVectorField.newQuery
   
   codec api:
   * KnnVectorsReader.getByteVectors(String field)
   * KnnVectorsReader.getFloatVectors(String field)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on issue #11963: Improve vector quantization API

2022-12-14 Thread GitBox


benwtrent commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351900658

   OK, step one of this is done with the `KnnByteVectorQuery`. I am next 
approaching a new `KnnByteVectorField` and this will cause some refactoring to 
`LeafReader` or `VectorValues`.
   
   Right now, `VectorValues` assumes all vector values are `float[]` and 
"expands" the bytes, needlessly.
   
   There are many ways to approach this and I am honestly not sure the way to 
go here (being unfamiliar with typical API patterns).
   
   It seems there are a handful of seemingly valid options:
   
* 1. Add a method to `VectorValues` (which is returned via `LeafReader`) 
that explicitly returns `byteVectorValue()` and will throw if `vectorValue()` 
is called when the encoding is `BYTE` (and throw on `byteVectorValue()` if 
encoding is `FLOAT32`).
   * This doesn't seem like the best to me. Though we already require users 
to choose the correct methods based on the vector encoding.
* 2. Add a new `AbstractVectorValues` where `vectorValue()` returns `T`. 
`LeafReader` would be changed to return that instead of `VectorValues`. 
   * There don't seem to be precedent on using generics like this anywhere, 
and this breaks with how we handle DocValues. The main difference with vectors 
and doc values is that numeric doc values are all returned as Long, but really 
could be any number whose bytes FIT in a long...
* 3. Add a new `LeafReader#getByteVectorValues()` that returns a 
`ByteVectorValues` object. The main downside here is that we are not making it 
flexible for adding new vector kinds in the future. We will want to add boolean 
vectors & hamming distance in the future. These are important for image search.
* 4. Simply make `VectorValues` always return `BytesRef` for everything and 
provide similarity that knows the encoding and can decode the `float[]` if that 
is the encoding. This feels closer to what we do with doc values (taking 
`DoubleDocValuesField` & `DoubleValuesSource` as prior art with how that 
interaction is done with `NumericDocValues`).
   
   
   @jpountz @rmuir do y'all have opinions here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] AlmostFamiliar commented on issue #9231: HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled [LUCENE-8183]

2022-12-14 Thread GitBox


AlmostFamiliar commented on issue #9231:
URL: https://github.com/apache/lucene/issues/9231#issuecomment-1351830837

   Are there still plans to merge this? 
   
   I would also be very interested in this feature. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox


benwtrent commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048764283


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
* @throws IllegalArgumentException if k is less than 1
*/
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the k nearest documents to the target vector according 
to the vectors in the
+   * given field. target vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   As it is, callers of this method need to know the inner nuances of how we 
calculate the score given the similarity. I would prefer them not having to 
know that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox


benwtrent commented on code in PR #11946:
URL: https://github.com/apache/lucene/pull/11946#discussion_r1048763428


##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) {
* @throws IllegalArgumentException if k is less than 1
*/
   public KnnVectorQuery(String field, float[] target, int k, Query filter) {
+this(field, target, k, Float.NEGATIVE_INFINITY, filter);
+  }
+
+  /**
+   * Find the k nearest documents to the target vector according 
to the vectors in the
+   * given field. target vector.
+   *
+   * @param field a field that has been indexed as a {@link KnnVectorField}.
+   * @param target the target of the search
+   * @param k the number of documents to find (the upper bound)
+   * @param similarityThreshold the minimum acceptable value of similarity

Review Comment:
   So, right now, `similarityThreshold` is really `scoreThreshold`. I would 
prefer it to be the actual similarity of the vectors and NOT how we translate 
it to scores (which for ByteVectors has some surprising behavior). 
   
   @msokolov What do you think here? 
   
   Its already tricky that "similarity implies score". But truly, similarity != 
score.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


jdconrad commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351772233

   > javacc?
   
   I was under the impression this was EOL? If it's still well supported I'm 
not familiar enough with the code generation to know if this would avoid the 
pitfalls ANTLR has.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw

2022-12-14 Thread GitBox


msokolov commented on PR #11946:
URL: https://github.com/apache/lucene/pull/11946#issuecomment-1351747886

   It seems there are conflicts due to a recent refactor of this query - would 
you mind merging from main and resolving those please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351743460

   > One other possibility would be to hand-roll the expression lexer/parser. 
This would get rid of the need for any additional dependencies and generated 
code. From what I can tell the API has been stable for a long time, so I don't 
think there would be a problem with maintenance. This would have the added 
benefit of likely improved performance as ANTLR isn't the fastest, and 
potentially better error messages.
   
   javacc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox


uschindler commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351734347

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox


rmuir commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351680479

   I looked at what e.g. glibc does here as a fallback out of curiousity, for 
floats it is very simple (using Dekker algorithm), but requires changing the FP 
rounding mode, which you cant do in java. For doubles it is more complicated 
but still no bigdecimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


jdconrad commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351666949

   One other possibility would be to hand-roll the expression lexer/parser. 
This would get rid of the need for any additional dependencies and generated 
code. From what I can tell the API has been stable for a long time, so I don't 
think there would be a problem with maintenance. This would have the added 
benefit of likely improved performance as ANTLR isn't the fastest, and 
potentially better error messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox


dweiss commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351660443

   I honestly don't know who can use this method without any provided cpuid 
check... We actually use fma in our code but do so by detecting the performance 
difference between a naive implementation on primitive types and Math.fma 
(during bootstrap). It's ugly like hell but the difference is so vast that it 
works. I'm not sure who'd even gain from using the bigdecimal-based 
implementation...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-14 Thread GitBox


benwtrent commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351560532

   Holy crap, creating `BigDecimal` and then multiplying & adding is crazy. 
This is a completely unacceptable fallback calculation for this method.
   
   +1 on banning its use in the code base.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox


jpountz merged PR #12018:
URL: https://github.com/apache/lucene/pull/12018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox


benwtrent commented on PR #12018:
URL: https://github.com/apache/lucene/pull/12018#issuecomment-1351392230

   @jpountz Here is the backport.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351378582

   another way to say it: shading is a terrible, TERRIBLE idea and you only 
hear about it in java, because java is the only language with developers that 
are bad enough to consider it.
   
   Please, let's not talk about it ever again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351374308

   I feel that if we publish a shaded jar we become guilty of "contributing to 
the delinquency" of terrible supply chain management, by hiding third party 
dependencies and their versions from view. It causes problems for checkers and 
even humans if there were ever some issue with one of the dependencies (example 
log4j).  They might miss it entirely and get hacked, as an example. Then they 
will be mad at us for shading in the first place, even though its arguably 
their fault because they used a shaded jar. 
   
   Let's just not hand them the gun.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351368145

   > > Can we specify our dependencies in a different way (e.g. exact version) 
in our maven stuff so this won't happen? e.g. you can do this in python, and 
specify that you depend on antlr == x.y.z rather than just depend on antlr.
   > 
   > You can - I think what Uwe is describing is a problem for downstream 
projects where Lucene has antlr x.y.z and some other dependency has antlr a.b.c 
- then namespaces clash and the conflict is not easily resolved. Classloader 
separation is possible, of course, but it's hardly an easy alternative. :)
   > 
   > I personally don't mind shading artifacts but I do agree they are a 
pain... even tracking down which version a project is actually using is a 
problem then (because shaded artifacts don't manifest their versions as clearly 
as a maven dependency). Corporate environments will hate them for legal reasons 
(for reasons Rob mentioned).
   
   Hi,
   as said before this was just a suggestion from my experience with 
forbiddenapis with some artifacts like "ASM" and "ANTLR". They are always a 
pain. From the secruity perspective there are problems, but you can also see it 
like "code copied" - we can of course also do this, but that's a lot more 
hassle.
   
   What you can always do: Offser a shaded version without those 2 dependencies 
as a seüparate artifact. Often seen on maven as "uber" or "shaded" behind 
version number. If you want to use shaded version, you know consequences.
   
   Setting exact version in Maven POMs is not possible, unfortunately. Maven 
has some tricks (it wont silently upgrade across major versions, but bugfix 
releases are automatically applied). I don't know exacty how this is handles by 
Maven resolver.
   
   My idea would be: publish "lucene-expressions-shaded.x.y.z" in addition.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351366346

   Yes, java showed what a disaster it is around supply chains with the log4j 
vulnerability. Shading should not even be considered as an option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent opened a new pull request, #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)

2022-12-14 Thread GitBox


benwtrent opened a new pull request, #12018:
URL: https://github.com/apache/lucene/pull/12018

   Backport of #12004 
   
   This is the first commit of a much larger refactor. The overall goal is to 
separate the concerns of byte vectors and float vectors. Making their usage and 
APIs clearer for users. 
   
   This first step adds a new `KnnByteVectorQuery` and only allows it to be 
used against fields that have the `BYTE` encoding.
   
   Additionally, the original `KnnVectorQuery` can only be used against fields 
that have the `FLOAT32` encoding.
   
   this partially addresses: https://github.com/apache/lucene/issues/11963


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


dweiss commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351355396

   > Can we specify our dependencies in a different way (e.g. exact version) in 
our maven stuff so this won't happen? e.g. you can do this in python, and 
specify that you depend on antlr == x.y.z rather than just depend on antlr.
   
   You can - I think what Uwe is describing is a problem for downstream 
projects where Lucene has antlr x.y.z and some other dependency has antlr a.b.c 
- then namespaces clash and the conflict is not easily resolved. Classloader 
separation is possible, of course, but it's hardly an easy alternative. :)
   
   I personally don't mind shading artifacts but I do agree they are a pain... 
even tracking down which version a project is actually using is a problem then 
(because shaded artifacts don't manifest their versions as clearly as a maven 
dependency). Corporate environments will hate them for legal reasons (for 
reasons Rob mentioned).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351345609

   I also dont understand the issue where ppl think they can modify arbitrary 
versions of lucene dependencies.
   
   Can we specify our dependencies in a different way (e.g. exact version) in 
our maven stuff so this won't happen? e.g. you can do this in python, and 
specify that you depend on `antlr == x.y.z` rather than just depend on `antlr`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351332547

   personally i am against the shading. I think it is a huge antipattern, it 
hides third-party artifacts completely. think about someone trying to do 
security or license audit and they have "secret dependencies" hidden from view, 
as an example.
   
   I don't understand the issue myself. if you want to use different antlr 
version in two places just use two classloaders. thats what we did in 
elastic/opensearch.
   
   but like i said, i'm not opposed to upgrading IF concerns about the new 
version are taken care of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox


benwtrent commented on PR #12004:
URL: https://github.com/apache/lucene/pull/12004#issuecomment-1351211685

   @uschindler 100%! I can get the backport PR done asap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox


jpountz commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1048383541


##
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##
@@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws 
IOException {
   private int optCount(LeafReaderContext context, Occur occur) throws 
IOException {
 final int numDocs = context.reader().numDocs();
 int optCount = 0;
+boolean unknownCount = false;
 for (WeightedBooleanClause weightedClause : weightedClauses) {
   if (weightedClause.clause.getOccur() != occur) {
 continue;
   }
   int count = weightedClause.weight.count(context);
-  if (count == -1 || count == numDocs) {
-// If any of the clauses has a number of matches that is unknown, the 
number of matches of
-// the disjunction is unknown.
+  if (count == -1) {
+// If one clause has a number of matches that is unknown, let's be 
more aggressive to check
+// whether remain clauses could match all docs.
+unknownCount = true;
+continue;

Review Comment:
   use an `else if` instead of `continue` for consistency with how other cases 
are handled?



##
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##
@@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur 
occur) throws IOException
 return -1;

Review Comment:
   Maybe this line should also set `unkwownCount = true` instead of returning 
`-1` right away?



##
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() 
throws Exception {
 }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   Fold these tests into the existing `testDisjunctionMatchesCount` test?



##
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() 
throws Exception {
 }
   }
 
+  public void testAggressiveMatchCount() throws IOException {
+Directory dir = newDirectory();
+IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+Document doc = new Document();
+LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L);
+doc.add(longPoint);
+StringField stringField = new StringField("string", "abc", Store.NO);
+doc.add(stringField);
+writer.addDocument(doc);
+longPoint.setLongValues(10L, 11L, 12L);
+stringField.setStringValue("xyz");
+writer.addDocument(doc);
+
+IndexReader reader = DirectoryReader.open(writer);
+writer.close();
+IndexSearcher searcher = new IndexSearcher(reader);
+
+long[] lower = new long[] {4L, 5L, 6L};
+long[] upper = new long[] {9L, 10L, 11L};
+Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper);
+assert reader.leaves().size() == 1;
+assert searcher
+.createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f)
+.count(reader.leaves().get(0))
+== -1;
+
+Query query =
+new BooleanQuery.Builder()
+.add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+.add(unknownCountQuery, Occur.MUST_NOT)
+.add(new MatchAllDocsQuery(), Occur.MUST_NOT)
+.build();
+Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+// count of the first MUST_NOT clause is unknown, but the second MUST_NOT 
clause matches all
+// docs
+assertEquals(0, weight.count(reader.leaves().get(0)));
+
+query =
+new BooleanQuery.Builder()
+.add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+.add(unknownCountQuery, Occur.MUST_NOT)
+.add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT)
+.build();
+weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+// count of the first MUST_NOT clause is unknown, though the second 
MUST_NOT clause matche one
+// doc, we can't figure out the number of
+// docs
+assertEquals(-1, weight.count(reader.leaves().get(0)));
+
+// test pure disConjunction

Review Comment:
   ```suggestion
   // test pure disjunction
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


dweiss commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351155914

   > With ASM this was hard to do but with Grade it is quite easy. Just add 
another configuration with those two dependencies and use the "Gradle Shadow 
Plugin" (https://imperceptiblethoughts.com/shadow/) to transform their package 
names into the Lucene namespace.
   
   Technically - easy to do. Question is whether we want to do it (I don't see 
any problems here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request, #12017: Aggressive `count` in BooleanWeight

2022-12-14 Thread GitBox


LuXugang opened a new pull request, #12017:
URL: https://github.com/apache/lucene/pull/12017

   When BooleanQuery is pure disjunction, if at least one clause could match 
all docs, then we could get right `count`  even though there was other clause 
whose `count` is  unknown. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] fcofdez commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-14 Thread GitBox


fcofdez commented on PR #11997:
URL: https://github.com/apache/lucene/pull/11997#issuecomment-1351000741

   @jpountz thanks for looking into this, unfortunately I didn't have time to 
go through the last round of review comments. The change makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField

2022-12-14 Thread GitBox


jpountz commented on PR #11997:
URL: https://github.com/apache/lucene/pull/11997#issuecomment-1350964750

   @fcofdez I hope you don't mind but I checked out your branch to look into 
the above issue and ended up pushing my changes. These new fields not only 
support indexing sorted numeric doc values, so that queries do not have to 
specify whether floats/doubles are indexed via their raw bits or as sortable 
ints/longs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox


uschindler commented on PR #12004:
URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350722742

   > @benwtrent Would you mind helping with the backport by creating a PR? 
There are a few conflicts and I could use some help with resolving them.
   
   There are also Java 17 switch expressions instead of switch statements 
inside that need to be changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox


jpountz commented on PR #12004:
URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350702865

   @benwtrent Would you mind helping with the backport by creating a PR? There 
are a few conflicts and I could use some help with resolving them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries

2022-12-14 Thread GitBox


iverase commented on PR #12006:
URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350678308

   > If it gives a good speedup, we could think about applying the same trick 
to other queries too, if we can do it without making things too ugly. For 
example LatLonPoint, IntPoint, etc currently all use byte[] comparison for 
their range/bounding-box queries.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-14 Thread GitBox


uschindler commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350666129

   Hi, I agree with all Robert say. I would also like to make another 
suggestion (that could also be applied when this has been fixed). To me it 
looks like a bad decission of antlr, that the tool compiles code and creates 
some Java/Class files but at runtime it also needs some dependency JAR in 
exactly the correct version. This is a problem for a library like lucene that 
gets included into other projects. The same also applies for ASM, although this 
is no longer a big problem anymore, because ASM's APIs are now very stable and 
it does not matter if a project uses ASM 8 or ASM 9 if minimum requirements are 
guaranteed (so the user has more flexibility).
   
   In contrast javacc does not have the problem, because javacc or jflex 
generates all of the code and you don't need any runtime library to execute and 
access the AST.
   
   In Lucene both (ANTLR and ASM) are dependencies of one artifact: 
lucene-expressions, so it is only a real problem there. My suggestion would be: 
Let's shade the ANTLR (and possibly also ASM - until the JDK-included bytecode 
generator is out of incubation/preview) into the JAR file. With ASM this was 
hard to do but with Grade it is quite easy. Just add another configuration with 
those two dependencies and use the "Gradle Shadow Plugin" 
(https://imperceptiblethoughts.com/shadow/) to transform their package names 
into the Lucene namespace.
   
   I know some people don't like this, but I had exactly the same problems with 
ASM in the forbiddenapis plugin and shading ASM in there was the only way to 
work around different, old, incompatible versions of ASM inside Maven or 
Gradle's classpath, while forbiddenapis always needeing the newest one.
   
   If you like I could make a PR to demonstrate the shading for Lucene's Gradle 
build in Expressions. @dweiss : What do you think? Of course we should not use 
shading anywhere else, but ANTLR and ASM are the candidates that always bring 
problems. Their size is also small so the overhead is small, but you have a 
consistent package that is unlikely to break when other projects use different 
library versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #12004: Move byte vector queries into new KnnByteVectorQuery

2022-12-14 Thread GitBox


jpountz merged PR #12004:
URL: https://github.com/apache/lucene/pull/12004


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gf2121 commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries

2022-12-14 Thread GitBox


gf2121 commented on PR #12006:
URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350639546

   This brings a small jump for Distance Filter LatLonPoint task. 
http://people.apache.org/~mikemccand/geobench.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350455204

   Simple package-private `static` method to turn on the pickiness should do it.
   
   I don't want to see 100 new constructors/abstractions added with booleans, 
just because antlr made a bad decision. Our APIs should not have to suffer for 
it.
   
   Don't care what java developers or static analysis tools or whatever else 
wants to say about it: minimal abstractions here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


jdconrad commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350071474

   I agree with @rmuir that having an ambiguity check for tests similar to 
Painless would be great for expressions. I'm a bit surprised this change didn't 
require much additionally to the regeneration of the lexer/grammar. One other 
thing I'm curious about is if this suffers from the same issue that Painless 
did with multiple nested conditionals 
(https://github.com/elastic/elasticsearch/pull/52056). So that may be something 
else worth adding a test for. Thanks for looking into upgrading this @reta!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349804323

   Thanks for encouraging @rmuir !  I will be working on the matter this week 
and share my findings, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349778653

   In general, sorry if i discouraged before, it is really just a frustrating 
situation
   
   If you get stuck, just leave the PR open. I will try to dig into this too, I 
have been through it before. I do agree we can't stay on old releases forever.
   
   But the crazy shit they did between antlr 3.x and 4.x caused me pain:
   * made the mistake of not spelunking thru antlr guts to figure out how to 
prevent performance traps
   * had users report performance bugs (in all cases it was some grammar tweak 
to fix)
   * fix these performance bugs in subsequent releases
   * get frustrated with the leniency and figure out how to make the shit picky 
again.
   
   So I don't wish that on anyone. Currently the expressions is great because 
it is simple and performs well: Users can represent needs in a simple 
javascript-like fashion and trust that it has same performance as writing 
custom java code.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349610775

   cc: @jdconrad who might remember a lot more about this than me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349606909

   here is coverage report using the current antlr. I guess i dont know why so 
much is missing here:
   
   
https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/618/jacoco/org.apache.lucene.expressions.js/
   
   But yeah, if we exercise the possibilities of grammar from tests, AND tests 
use picky mode, then build will fail if grammar is ambiguous. It is definitely 
a PITA compared to antlr 3 :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349604029

   As far as inspecting coverage, I suspect it is pretty good. But there is 
instructions in https://github.com/apache/lucene/blob/main/help/tests.txt on 
how to generate reports.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349602516

   @reta I remember doing this adds overhead, that's why it is a boolean there. 
so it really just needs to be something we do from tests. for example it could 
be a package-private setter or similar?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349601503

   > Looks like this in painless: 
https://github.com/opensearch-project/OpenSearch/blob/04757607c5aead788b465c77cec6ef459720f625/modules/lang-painless/src/main/java/org/opensearch/painless/antlr/Walker.java#L224-L245
   
   Thanks a lot, @rmuir , I will take it from there


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349599904

   Looks like this in painless: 
https://github.com/opensearch-project/OpenSearch/blob/04757607c5aead788b465c77cec6ef459720f625/modules/lang-painless/src/main/java/org/opensearch/painless/antlr/Walker.java#L224-L245


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


rmuir commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349597878

   the only way i know to prevent the traps is to do like painless and "enable 
picky mode" which fails test instead of doing slow things. and to have 100% 
test coverage of grammar!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


reta commented on PR #12016:
URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349213966

   @rmuir @uschindler what kind of (performance? jmh?) testing would help to 
discard / prove that moving to 4.11.x makes / does not make sense. You have 
definitely seen traps in the past, I would very appreciate pointers / guidance 
to explore the possibility of any regressions, thank you in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] reta opened a new pull request, #12016: Upgrade ANTLR to version 4.11.1

2022-12-13 Thread GitBox


reta opened a new pull request, #12016:
URL: https://github.com/apache/lucene/pull/12016

   Signed-off-by: Andriy Redko 
   
   ### Description
   
   The Apache Lucene is using quite old version of ANTLR 4.5.1-1. By itself, it 
is not a showstopper, but more profound issue is that some ANTLR 3.x bits are 
used [1]. Since ANTLR 4.10.x (or even earlier), the compatibility layer with  
`3.x` release line has been dropped in `4.x` (see please [2]), which makes 
Apache Lucene impossile to be used with recent ANTLR 4.10.x+ releases [3]. The 
sample exception is below. 
   
   ```
  > java.lang.UnsupportedOperationException: 
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not 
deserialize ATN with version 3 (expected 4).
  > at 
org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:56)
  > at 
org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:48)
  > at 
org.apache.lucene.expressions@10.0.0-SNAPSHOT/org.apache.lucene.expressions.js.JavascriptLexer.(JavascriptLexer.java:279)
   
   ```
   
   [1] 
https://github.com/apache/lucene/blob/main/lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptLexer.java#L189
   [2] 
https://github.com/antlr/antlr4/commit/c68e127a7cf14470565d6e6ae1eff06db3e56ea7
   [3] https://github.com/opensearch-project/OpenSearch/pull/4546
   
   Closes https://github.com/apache/lucene/issues/11788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems

2022-12-13 Thread GitBox


rmuir closed issue #12012: Spotless runs before javac in `gradle check` which 
results in less-than-helpful errors on compilation problems
URL: https://github.com/apache/lucene/issues/12012


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2022-12-13 Thread GitBox


rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1348842317

   Yeah, I'm highly suspicious of this cache. I dug into this and found that 
previously stored fields (!) were used for this lookup. So no surprise there 
was a cache around it, since this is a slow approach!!!
   
   But @gautamworah96 fixed this, and these days BinaryDocValues are used 
instead: see #10490
   
   IMO there is no sense in caching BinaryDocValues lookups into the heap, it 
is already mmap'd and the operating system will do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)

2022-12-13 Thread GitBox


rmuir commented on PR #12015:
URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348790082

   Thanks again, it really helps make the backporting process less painful for 
me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12015: Run spotless after javac (#12012)

2022-12-13 Thread GitBox


dweiss commented on PR #12015:
URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348772038

   Sorry - I see it now. I've pushed to my local fork at gh. Didn't notice it. 
I've just pushed to 9x on apache as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12015: Run spotless after javac (#12012)

2022-12-13 Thread GitBox


dweiss commented on PR #12015:
URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348767456

   You have?! Well, that's strange - I've pushed it to 9x... Can you reproduce 
it with the head?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)

2022-12-13 Thread GitBox


rmuir commented on PR #12015:
URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348669494

   @dweiss do you mind pushing to 9.x, otherwise i will do it for you. I just 
hit this again in 9.x :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

2022-12-13 Thread GitBox


rmuir merged PR #11998:
URL: https://github.com/apache/lucene/pull/11998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)

2022-12-13 Thread GitBox


rmuir commented on PR #12015:
URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348488213

   thank you @dweiss 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase merged pull request #11988: Fix algorithm that chooses the bridge between a polygon and a hole

2022-12-13 Thread GitBox


iverase merged PR #11988:
URL: https://github.com/apache/lucene/pull/11988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase closed issue #11986: Polygons failing to tessellate

2022-12-13 Thread GitBox


iverase closed issue #11986: Polygons failing to tessellate
URL: https://github.com/apache/lucene/issues/11986


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems

2022-12-12 Thread GitBox


dweiss commented on issue #12012:
URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347874082

   I've applied the patch to 9x and main, btw.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #12015: Run spotless after javac (#12012)

2022-12-12 Thread GitBox


dweiss merged PR #12015:
URL: https://github.com/apache/lucene/pull/12015


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems

2022-12-12 Thread GitBox


dweiss commented on issue #12012:
URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347872711

   I filed a PR #12015 . Indeed the 'core' formatting task seems to be named 
'spotlessJava' (spotless + format name), the check and apply are just attached 
to it in a fancy manner via some caching service.
   
   I get the same error output as you do. I believe it's still prone to some 
fuzziness because ecj linter and javac can run in arbitrary order. You can see 
it (on the SlowDirectory.java error example) when you compare the output of:
   ```
   gradlew -p lucene/facet compileTestJava ecjLint --max-workers 1
   gradlew -p lucene/facet ecjLint compileTestJava --max-workers 1
   ```
   
   I don't think it's a big issue though since both of these emit reasonably 
looking error messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems

2022-12-12 Thread GitBox


dweiss commented on issue #12012:
URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347844975

   > I think we are not slowing down the spotlessJava task which is the one 
actually failing for me?
   
   Eh. I don't know what the relationships between those spotless tasks are. 
Let me dig.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #12010: Enable LongDoubleConversion error-prone check

2022-12-12 Thread GitBox


rmuir merged PR #12010:
URL: https://github.com/apache/lucene/pull/12010


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #12009: Find (and fix) places where we treat a long as a double without an explicit cast

2022-12-12 Thread GitBox


rmuir closed issue #12009: Find (and fix) places where we treat a long as a 
double without an explicit cast
URL: https://github.com/apache/lucene/issues/12009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-12 Thread GitBox


rmuir commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1347620183

   Yeah, I think if the fallback java code was 2x, 4x, or 8x slower (like you 
would expect from these intrinsics), we wouldn't be having this conversation :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12014: Ban use of Math.fma across the entire codebase

2022-12-12 Thread GitBox


gsmiller commented on PR #12014:
URL: https://github.com/apache/lucene/pull/12014#issuecomment-1347617300

   +1, seems reasonable to me. We can always remove this ban in the future if 
there's a good reason, but seems reasonable to put this in place to prevent it 
sneaking in for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12014: Ban use of Math.fma across the entire codebase

2022-12-12 Thread GitBox


rmuir opened a new pull request, #12014:
URL: https://github.com/apache/lucene/pull/12014

   When FMA is not supported by the hardware, these methods fall back to 
`BigDecimal` usage [1] which causes them to be 2500x slower [2].
   
   While most hardware in the last 10 years may have the support, out of box 
both VirtualBox and QEMU don't pass thru FMA support (for the latter at least 
you can tweak it with e.g. -cpu host or similar to fix this). 
   
   This creates a terrible undocumented performance trap, see [3] for an 
example of a 30x slowdown of an entire application. In my experience, 
developers are often too far detached from the production reality, and that 
reality is: we're not deploying to macbook pros in production, instead we are 
almost all using virtualization: we can't afford such performance traps.
   
   Practically it would be an issue too: e.g. Policeman jenkins instance that 
runs our tests currently uses virtualbox. It would be bad for vector-search 
tests to suddenly get 30x slower.
   
   We can't safely use this method anywhere, as we don't have access to check 
CPUID or anything to see if it will be insanely slow or not. Let's ban it 
completely: I'm concerned it will sneak into our codebase otherwise... it 
almost happened before: #10718
   
   [1] [Math.java source 
code](https://github.com/openjdk/jdk/blob/5d4ea9b9549b762b7c207e5c2ee65bc51591433b/src/java.base/share/classes/java/lang/Math.java#L2364-L2366)
   [2] [Comment on JIRA issue for x86 intrinsic mentioning 2500x 
speedup](https://bugs.openjdk.org/browse/JDK-8154122?focusedCommentId=13995171=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13995171)
   [3] [VirtualBox bug for lack of FMA 
support](https://www.virtualbox.org/ticket/15471)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2022-12-12 Thread GitBox


uschindler commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347501489

   Please remove the cache. Totally useless. It hurts more because the escape 
analysis can't work.
   When tiered compilation stepped in, this is just a useless extra
   
   My rule: Never ever cache object instances to prevent object creation and 
help GC. This was needed in in Java 6 and java 7, but not anymore. It has 
opposite problem: it actually creates those objects on heap and stresses GC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()

2022-12-12 Thread GitBox


rmuir commented on PR #12013:
URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347439475

   Also (summoning my inner @uschindler), curious what the perf impact is if we 
simply remove these caches completely. Maybe they are not relevant to the 
performance anymore due to faceting changes (e.g. storing labels in docvalues 
rather than payloads) ? Maybe they are even hurting performance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems

2022-12-12 Thread GitBox


rmuir commented on issue #12012:
URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347423692

   I ran this patch on the mac m1 and also got `BUILD SUCCESSFUL in 3m 33s`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



<    1   2   3   4   5   6   7   8   9   10   >