[GitHub] [lucene] rmuir commented on issue #12023: Mechanism to interrupt long-running/resource intensive queries
rmuir commented on issue #12023: URL: https://github.com/apache/lucene/issues/12023#issuecomment-1353025015 determinization has already been removed here. that is the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #12023: Mechanism to interrupt long-running/resource intensive queries
rmuir closed issue #12023: Mechanism to interrupt long-running/resource intensive queries URL: https://github.com/apache/lucene/issues/12023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Bukhtawar opened a new issue, #12023: Mechanism to interrupt long-running/resource intensive queries
Bukhtawar opened a new issue, #12023: URL: https://github.com/apache/lucene/issues/12023 ### Description As a part of https://github.com/opensearch-project/OpenSearch/issues/687 we detected that regex queries can run into tight loops for quite long. Below is the stack trace of the request for a wildcard query which consumed 100% CPU for an hour(although addressed in https://issues.apache.org/jira/browse/LUCENE-9981). OpenSearch has a mechanism to cancel task on timeout and one other option that was we were deliberating was to send interrupts to the long running thread if the request is consuming more Memory/CPU or time, independent of query timeout. The problem is not all costly executions in Lucene have a check on interrupts much like ExitableDirectoryReader#checkAndThrow ``` private void checkAndThrow() { if (queryTimeout.shouldExit()) { throw new ExitingReaderException("The request took too long to iterate over point values. Timeout: " + queryTimeout.toString() + ", PointValues=" + in ); } else if (Thread.interrupted()) { throw new ExitingReaderException("Interrupted while iterating over point values. PointValues=" + in); } } ``` Sharing the stack trace for the request ``` 100.2% (500.8ms out of 500ms) cpu usage by thread 'opensearch[917451917ca5731579187db45dd52853][search][T#5]' 4/10 snapshots sharing following 36 elements app//org.apache.lucene.util.automaton.Operations.determinize(Operations.java:780) app//org.apache.lucene.util.automaton.Operations.getCommonSuffixBytesRef(Operations.java:1155) app//org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:245) app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:110) app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:87) app//org.apache.lucene.search.AutomatonQuery.(AutomatonQuery.java:71) app//org.apache.lucene.search.WildcardQuery.(WildcardQuery.java:56) app//org.opensearch.index.mapper.StringFieldType.wildcardQuery(StringFieldType.java:158) app//org.opensearch.index.query.WildcardQueryBuilder.doToQuery(WildcardQueryBuilder.java:259) app//org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:116) app//org.opensearch.index.query.BoolQueryBuilder.addBooleanClauses(BoolQueryBuilder.java:337) app//org.opensearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:321) app//org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:116) app//org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:386) app//org.opensearch.index.query.QueryShardContext$$Lambda$5010/0x000801d4d840.apply(Unknown Source) app//org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:398) app//org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:385) app//org.opensearch.search.SearchService.parseSource(SearchService.java:903) app//org.opensearch.search.SearchService.createContext(SearchService.java:740) app//org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:442) app//org.opensearch.search.SearchService.access$500(SearchService.java:155) app//org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:415) app//org.opensearch.search.SearchService$2$$Lambda$4889/0x000801ce9840.get(Unknown Source) app//org.opensearch.search.SearchService$$Lambda$4891/0x000801ce9c40.get(Unknown Source) app//org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:71) app//org.opensearch.action.ActionRunnable$$Lambda$3741/0x0008011fa440.accept(Unknown Source) app//org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:86) app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) app//org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) app//org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57) app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:774) app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) java.base@11.0.16/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[GitHub] [lucene] rmuir commented on issue #12021: Large fields with large="true" can be truncated in v9+
rmuir commented on issue #12021: URL: https://github.com/apache/lucene/issues/12021#issuecomment-1352977757 This looks like a bug in solr code (SolrDocumentFetcher) so I'd recommend opening a bug over at https://github.com/apache/solr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a diff in pull request #12022: Fix flat polygons incorrectly containing intersecting geometries
iverase commented on code in PR #12022: URL: https://github.com/apache/lucene/pull/12022#discussion_r1049521314 ## lucene/CHANGES.txt: ## @@ -68,6 +68,8 @@ Bug Fixes * LUCENE-10599: LogMergePolicy is more likely to keep merging segments until they reach the maximum merge size. (Adrien Grand) +* GITHUB#12020: Fixes bug whereby very flat polygons can incorrectly contain intersecting geometries. (Craig Taverner) + Review Comment: Could you move this entry to the lucene 9.5 section? I am planning in backporting it so it will make life easier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a diff in pull request #12022: Fix flat polygons incorrectly containing intersecting geometries
iverase commented on code in PR #12022: URL: https://github.com/apache/lucene/pull/12022#discussion_r1049521314 ## lucene/CHANGES.txt: ## @@ -68,6 +68,8 @@ Bug Fixes * LUCENE-10599: LogMergePolicy is more likely to keep merging segments until they reach the maximum merge size. (Adrien Grand) +* GITHUB#12020: Fixes bug whereby very flat polygons can incorrectly contain intersecting geometries. (Craig Taverner) + Review Comment: Could you move this entry to Lucene 9.5? I am planning in backporting it so it will make life easier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] craigtaverner opened a new pull request, #12022: Fix flat polygons incorrectly containing intersecting geometries
craigtaverner opened a new pull request, #12022: URL: https://github.com/apache/lucene/pull/12022 Fixes https://github.com/apache/lucene/issues/12020 ### Description When performing a search using a shape geometry query of relation type `QueryRelation.CONTAINS`, it is possible to get a false positive when two geometries intersect, but neither actually contains the other. This happens if the indexed geometry is a polygon that is so flat that one of its triangles is simplified to a single line segment. The bug is that the line segment currently records whether it is part of the external polygon by taking that knowledge from first line segment of the triangle, not necessarily the part of the triangle being retained. This first line segment could be an internal line (part of the triangle, but not the polygon). The simplification code should instead take this knowledge from a line segment that is not being collapsed by the simplification. The consequence of this bug occur later during the contains search when the search query deduces that the geometry is contained because the polygon is not closed. The search does not realise it intersects an outer line of the polygon because that line is not correctly marked as outer. The fix is to select the correct flag from the surviving line segments instead of the first line segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nosvalds opened a new issue, #12021: Large fields with large="true" can be truncated in v9+
nosvalds opened a new issue, #12021: URL: https://github.com/apache/lucene/issues/12021 ### Description ## Issue For fields using `large="true"`, large fields (which is what they are intended for) can be truncated in v9+ of Lucene. Example fieldtype definition: ``` ``` ## Cause Looks like this is a bug introduced along with [LUCENE-8805](https://issues.apache.org/jira/browse/LUCENE-8805) / https://github.com/apache/lucene/issues/9849: [https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/sol[…]r/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java](https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465) Specifically with respect to "large" fields handling. The length in utf8 bytes will often be longer than the string length `value.length()`, hence the truncation. ## Fix The Fix would be: `bytesRef.length = bytesRef.bytes.length` ### Version and environment details - Solr v9.1.0 - Lucene v9.3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] craigtaverner opened a new issue, #12020: Very flat polygons give incorrect 'contains' result
craigtaverner opened a new issue, #12020: URL: https://github.com/apache/lucene/issues/12020 ### Description When performing a search using a shape geometry query of relation type `QueryRelation.CONTAINS`, it is possible to get a false positive when two geometries intersect, but neither actually contains the other. This happens if the indexed geometry is a polygon that is so flat that one of its triangles is simplified to a single line segment. The bug is that the line segment currently records whether it is part of the external polygon by taking that knowledge from first line segment of the triangle, not necessarily the part of the triangle being retained. This first line segment could be an internal line (part of the triangle, but not the polygon). The simplification code should instead take this knowledge from a line segment that is not being collapsed by the simplification. The consequence of this bug occur later during the contains search when the search query deduces that the geometry is contained because the polygon is not closed. The search does not realise it intersects an outer line of the polygon because that line is not correctly marked as outer. ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight
LuXugang commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162986 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception { } } + public void testAggressiveMatchCount() throws IOException { Review Comment: addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369 and https://github.com/apache/lucene/pull/12017/commits/ae3f8d67ed75724ac2662c24025c85ae7008612a -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight
LuXugang commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1049163040 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception { } } + public void testAggressiveMatchCount() throws IOException { +Directory dir = newDirectory(); +IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig()); +Document doc = new Document(); +LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L); +doc.add(longPoint); +StringField stringField = new StringField("string", "abc", Store.NO); +doc.add(stringField); +writer.addDocument(doc); +longPoint.setLongValues(10L, 11L, 12L); +stringField.setStringValue("xyz"); +writer.addDocument(doc); + +IndexReader reader = DirectoryReader.open(writer); +writer.close(); +IndexSearcher searcher = new IndexSearcher(reader); + +long[] lower = new long[] {4L, 5L, 6L}; +long[] upper = new long[] {9L, 10L, 11L}; +Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper); +assert reader.leaves().size() == 1; +assert searcher +.createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f) +.count(reader.leaves().get(0)) +== -1; + +Query query = +new BooleanQuery.Builder() +.add(new TermQuery(new Term("string", "xyz")), Occur.MUST) +.add(unknownCountQuery, Occur.MUST_NOT) +.add(new MatchAllDocsQuery(), Occur.MUST_NOT) +.build(); +Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f); +// count of the first MUST_NOT clause is unknown, but the second MUST_NOT clause matches all +// docs +assertEquals(0, weight.count(reader.leaves().get(0))); + +query = +new BooleanQuery.Builder() +.add(new TermQuery(new Term("string", "xyz")), Occur.MUST) +.add(unknownCountQuery, Occur.MUST_NOT) +.add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT) +.build(); +weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f); +// count of the first MUST_NOT clause is unknown, though the second MUST_NOT clause matche one +// doc, we can't figure out the number of +// docs +assertEquals(-1, weight.count(reader.leaves().get(0))); + +// test pure disConjunction Review Comment: addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight
LuXugang commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162892 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException { private int optCount(LeafReaderContext context, Occur occur) throws IOException { final int numDocs = context.reader().numDocs(); int optCount = 0; +boolean unknownCount = false; for (WeightedBooleanClause weightedClause : weightedClauses) { if (weightedClause.clause.getOccur() != occur) { continue; } int count = weightedClause.weight.count(context); - if (count == -1 || count == numDocs) { -// If any of the clauses has a number of matches that is unknown, the number of matches of -// the disjunction is unknown. + if (count == -1) { +// If one clause has a number of matches that is unknown, let's be more aggressive to check +// whether remain clauses could match all docs. +unknownCount = true; +continue; Review Comment: addressed in https://github.com/apache/lucene/pull/12017/commits/dc13ae7bddc3ca272af7c45dd998258984469904 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur occur) throws IOException return -1; Review Comment: Thanks @jpountz , you are right. addressed in https://github.com/apache/lucene/pull/12017/commits/477b2a4e1f95f408562fc1745979bf9815ffaa75 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception { } } + public void testAggressiveMatchCount() throws IOException { Review Comment: addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352328979 > I will check it out and inspect the coverage report from .js tests and see if there are any holes. If i find them I will push more tests. I am just really paranoid about some of the slow things antlr4 will do, having fought thru these issues with @jdconrad once before. I am a bit afraid because those blobs with DFAs growed quite a lot in the patch. So yes, let's review how it behaves. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352322297 I will check it out and inspect the coverage report from .js tests and see if there are any holes. If i find them I will push more tests. I am just really paranoid about some of the slow things antlr4 will do, having fought thru these issues with @jdconrad once before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049026102 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: Ok. No problem. I just noticed name change. I think it's is just a named automodule with a manifest entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352291207 > sorry i didnt do the other tests, I gotta run for now. i can do them later tonight or tomorrow if you need but just wanted to prototype it out @rmuir tests have been migrated, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049019777 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: ah ... it is still automatic :( although changed ... ``` module-info.java:21: warning: requires directive for an automatic module requires org.antlr.antlr4.runtime; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049016403 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.expressions.js; + +/** Settings for expression compiler for javascript expressions. */ +final class JavascriptCompilerSettings { Review Comment: I agree with Uwe, let's move to a boolean (actually only need one package-private ctor with the boolean, the way the new base CompilerTestCase class works). If we have more options to this thing later, we can deal with the issue at that time. It is all package-private anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049015663 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.expressions.js; + +/** Settings for expression compiler for javascript expressions. */ +final class JavascriptCompilerSettings { Review Comment: Personally I hate mutable instances with getters and setters. 郎 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049014391 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.expressions.js; + +/** Settings for expression compiler for javascript expressions. */ +final class JavascriptCompilerSettings { Review Comment: No, not really, but it could be useful to add more settings later on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049010145 ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.expressions.js; + +/** Settings for expression compiler for javascript expressions. */ +final class JavascriptCompilerSettings { Review Comment: Do we really need this additional class only holding one boolean? I think a simple additional PKG private compile method taking the boolean picky would have same effect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on code in PR #12016: URL: https://github.com/apache/lucene/pull/12016#discussion_r1049005864 ## lucene/expressions/src/java/module-info.java: ## @@ -19,7 +19,7 @@ module org.apache.lucene.expressions { Review Comment: Can we now remove this suppression? I think all modules here are now real ones. ## lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompilerSettings.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.expressions.js; + +/** Settings for expression compiler for javascript expressions. */ +final class JavascriptCompilerSettings { Review Comment: Do we really need this additional class only holding one boolean? I think a simple additional PKG private ctor would have same effect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352259233 sorry i didnt do the other tests, I gotta run for now. i can do them later tonight or tomorrow if you need but just wanted to prototype it out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352258413 > I pushed a proposal to your branch (only fixing one of the tests in .js to use it). If you are really against it, just revert the commit. But i think it keeps tests simpler. :+1: No objections, thanks a lot for helping out @rmuir , it keeps tests simpler indeed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352248407 I pushed a proposal to your branch (only fixing one of the tests in .js to use it). If you are really against it, just revert the commit. But i think it keeps tests simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352177919 we don't need to parameterize the pickiness IMO, we can just turn it on in these tests. Thanks for getting this hooked up. I will look at your branch and try to play with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1352084948 @rmuir first pass over pickiness, all tests are now run w/o diagnostics listener (picky / not picky mode), the rough observations so far are encouraging, there are no significant changes in the test duration between these two modes (some random timings below): ![image](https://user-images.githubusercontent.com/509855/207701868-7a252ef2-20c4-4169-979c-4bb5c1f08f4e.png) I am moving further to compare the numbers against `main` and to conduct more precise measurements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #12019: Clean up vector backward-codecs
benwtrent opened a new pull request, #12019: URL: https://github.com/apache/lucene/pull/12019 I noticed that the Lucene94 backward-codec was still allowing KNN fields writer. I have removed it, and updated the related tests similarly to other KNN backward-codec changes. There are various other house-keeping things that I noticed in the backward_codecs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048856364 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if k is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { +this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: @msokolov you haven't missed anything. I am specifically talking about users providing `similarityThreshold` to the query. If they have calculating that they want a specific `cosine` or `dotProduct` similarity, they would then need to adjust that to match Lucene's scoring transformation. I think that `similarityThreshold` should mean vector similarities. We can transform it for the user to reflect the score that similarity represents (given vector encoding type and similarity function). An example here is `dotProduct`. The user knows they want `FLOAT32` vectors within a dotProduct of 0.7. With this API that ACTUALLY means they want to limit the scores to .85 (`(1 + dotProduct)/2`). How is the user supposed to know that? This seems really weird to me. This doesn't take into account the different scoring methods between vector types as well, which can get even more confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11946: add similarity threshold for hnsw
msokolov commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048846385 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if k is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { +this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: Well, the scores we are talking about here are at least always in [0, 1]. I'm not sure what you mean by the actual similarity of vectors. We used to have a two-step process where we would compute the similarity and then convert to a query score, but I think it's unified today and they are the same? Aren't the scores being thresholded here the output of VectorSimilarityFunction.compare? I may have missed something along the way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API
rmuir commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351946133 I would tend to expect some organization like this: indexing: * ByteVectorField * FloatVectorField searching: * ByteVectorField.newQuery * FloatVectorField.newQuery codec api: * KnnVectorsReader.getByteVectors(String field) * KnnVectorsReader.getFloatVectors(String field) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on issue #11963: Improve vector quantization API
benwtrent commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1351900658 OK, step one of this is done with the `KnnByteVectorQuery`. I am next approaching a new `KnnByteVectorField` and this will cause some refactoring to `LeafReader` or `VectorValues`. Right now, `VectorValues` assumes all vector values are `float[]` and "expands" the bytes, needlessly. There are many ways to approach this and I am honestly not sure the way to go here (being unfamiliar with typical API patterns). It seems there are a handful of seemingly valid options: * 1. Add a method to `VectorValues` (which is returned via `LeafReader`) that explicitly returns `byteVectorValue()` and will throw if `vectorValue()` is called when the encoding is `BYTE` (and throw on `byteVectorValue()` if encoding is `FLOAT32`). * This doesn't seem like the best to me. Though we already require users to choose the correct methods based on the vector encoding. * 2. Add a new `AbstractVectorValues` where `vectorValue()` returns `T`. `LeafReader` would be changed to return that instead of `VectorValues`. * There don't seem to be precedent on using generics like this anywhere, and this breaks with how we handle DocValues. The main difference with vectors and doc values is that numeric doc values are all returned as Long, but really could be any number whose bytes FIT in a long... * 3. Add a new `LeafReader#getByteVectorValues()` that returns a `ByteVectorValues` object. The main downside here is that we are not making it flexible for adding new vector kinds in the future. We will want to add boolean vectors & hamming distance in the future. These are important for image search. * 4. Simply make `VectorValues` always return `BytesRef` for everything and provide similarity that knows the encoding and can decode the `float[]` if that is the encoding. This feels closer to what we do with doc values (taking `DoubleDocValuesField` & `DoubleValuesSource` as prior art with how that interaction is done with `NumericDocValues`). @jpountz @rmuir do y'all have opinions here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] AlmostFamiliar commented on issue #9231: HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled [LUCENE-8183]
AlmostFamiliar commented on issue #9231: URL: https://github.com/apache/lucene/issues/9231#issuecomment-1351830837 Are there still plans to merge this? I would also be very interested in this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048764283 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if k is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { +this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: As it is, callers of this method need to know the inner nuances of how we calculate the score given the similarity. I would prefer them not having to know that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #11946: add similarity threshold for hnsw
benwtrent commented on code in PR #11946: URL: https://github.com/apache/lucene/pull/11946#discussion_r1048763428 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -76,12 +91,29 @@ public KnnVectorQuery(String field, float[] target, int k) { * @throws IllegalArgumentException if k is less than 1 */ public KnnVectorQuery(String field, float[] target, int k, Query filter) { +this(field, target, k, Float.NEGATIVE_INFINITY, filter); + } + + /** + * Find the k nearest documents to the target vector according to the vectors in the + * given field. target vector. + * + * @param field a field that has been indexed as a {@link KnnVectorField}. + * @param target the target of the search + * @param k the number of documents to find (the upper bound) + * @param similarityThreshold the minimum acceptable value of similarity Review Comment: So, right now, `similarityThreshold` is really `scoreThreshold`. I would prefer it to be the actual similarity of the vectors and NOT how we translate it to scores (which for ByteVectors has some surprising behavior). @msokolov What do you think here? Its already tricky that "similarity implies score". But truly, similarity != score. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1
jdconrad commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351772233 > javacc? I was under the impression this was EOL? If it's still well supported I'm not familiar enough with the code generation to know if this would avoid the pitfalls ANTLR has. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11946: add similarity threshold for hnsw
msokolov commented on PR #11946: URL: https://github.com/apache/lucene/pull/11946#issuecomment-1351747886 It seems there are conflicts due to a recent refactor of this query - would you mind merging from main and resolving those please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351743460 > One other possibility would be to hand-roll the expression lexer/parser. This would get rid of the need for any additional dependencies and generated code. From what I can tell the API has been stable for a long time, so I don't think there would be a problem with maintenance. This would have the added benefit of likely improved performance as ANTLR isn't the fastest, and potentially better error messages. javacc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12014: Ban use of Math.fma across the entire codebase
uschindler commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351734347 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12014: Ban use of Math.fma across the entire codebase
rmuir commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351680479 I looked at what e.g. glibc does here as a fallback out of curiousity, for floats it is very simple (using Dekker algorithm), but requires changing the FP rounding mode, which you cant do in java. For doubles it is more complicated but still no bigdecimal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1
jdconrad commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351666949 One other possibility would be to hand-roll the expression lexer/parser. This would get rid of the need for any additional dependencies and generated code. From what I can tell the API has been stable for a long time, so I don't think there would be a problem with maintenance. This would have the added benefit of likely improved performance as ANTLR isn't the fastest, and potentially better error messages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #12014: Ban use of Math.fma across the entire codebase
dweiss commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351660443 I honestly don't know who can use this method without any provided cpuid check... We actually use fma in our code but do so by detecting the performance difference between a naive implementation on primitive types and Math.fma (during bootstrap). It's ugly like hell but the difference is so vast that it works. I'm not sure who'd even gain from using the bigdecimal-based implementation... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12014: Ban use of Math.fma across the entire codebase
benwtrent commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1351560532 Holy crap, creating `BigDecimal` and then multiplying & adding is crazy. This is a completely unacceptable fallback calculation for this method. +1 on banning its use in the code base. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)
jpountz merged PR #12018: URL: https://github.com/apache/lucene/pull/12018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)
benwtrent commented on PR #12018: URL: https://github.com/apache/lucene/pull/12018#issuecomment-1351392230 @jpountz Here is the backport. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351378582 another way to say it: shading is a terrible, TERRIBLE idea and you only hear about it in java, because java is the only language with developers that are bad enough to consider it. Please, let's not talk about it ever again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351374308 I feel that if we publish a shaded jar we become guilty of "contributing to the delinquency" of terrible supply chain management, by hiding third party dependencies and their versions from view. It causes problems for checkers and even humans if there were ever some issue with one of the dependencies (example log4j). They might miss it entirely and get hacked, as an example. Then they will be mad at us for shading in the first place, even though its arguably their fault because they used a shaded jar. Let's just not hand them the gun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351368145 > > Can we specify our dependencies in a different way (e.g. exact version) in our maven stuff so this won't happen? e.g. you can do this in python, and specify that you depend on antlr == x.y.z rather than just depend on antlr. > > You can - I think what Uwe is describing is a problem for downstream projects where Lucene has antlr x.y.z and some other dependency has antlr a.b.c - then namespaces clash and the conflict is not easily resolved. Classloader separation is possible, of course, but it's hardly an easy alternative. :) > > I personally don't mind shading artifacts but I do agree they are a pain... even tracking down which version a project is actually using is a problem then (because shaded artifacts don't manifest their versions as clearly as a maven dependency). Corporate environments will hate them for legal reasons (for reasons Rob mentioned). Hi, as said before this was just a suggestion from my experience with forbiddenapis with some artifacts like "ASM" and "ANTLR". They are always a pain. From the secruity perspective there are problems, but you can also see it like "code copied" - we can of course also do this, but that's a lot more hassle. What you can always do: Offser a shaded version without those 2 dependencies as a seüparate artifact. Often seen on maven as "uber" or "shaded" behind version number. If you want to use shaded version, you know consequences. Setting exact version in Maven POMs is not possible, unfortunately. Maven has some tricks (it wont silently upgrade across major versions, but bugfix releases are automatically applied). I don't know exacty how this is handles by Maven resolver. My idea would be: publish "lucene-expressions-shaded.x.y.z" in addition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351366346 Yes, java showed what a disaster it is around supply chains with the log4j vulnerability. Shading should not even be considered as an option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #12018: Move byte vector queries into new KnnByteVectorQuery (#12004)
benwtrent opened a new pull request, #12018: URL: https://github.com/apache/lucene/pull/12018 Backport of #12004 This is the first commit of a much larger refactor. The overall goal is to separate the concerns of byte vectors and float vectors. Making their usage and APIs clearer for users. This first step adds a new `KnnByteVectorQuery` and only allows it to be used against fields that have the `BYTE` encoding. Additionally, the original `KnnVectorQuery` can only be used against fields that have the `FLOAT32` encoding. this partially addresses: https://github.com/apache/lucene/issues/11963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1
dweiss commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351355396 > Can we specify our dependencies in a different way (e.g. exact version) in our maven stuff so this won't happen? e.g. you can do this in python, and specify that you depend on antlr == x.y.z rather than just depend on antlr. You can - I think what Uwe is describing is a problem for downstream projects where Lucene has antlr x.y.z and some other dependency has antlr a.b.c - then namespaces clash and the conflict is not easily resolved. Classloader separation is possible, of course, but it's hardly an easy alternative. :) I personally don't mind shading artifacts but I do agree they are a pain... even tracking down which version a project is actually using is a problem then (because shaded artifacts don't manifest their versions as clearly as a maven dependency). Corporate environments will hate them for legal reasons (for reasons Rob mentioned). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351345609 I also dont understand the issue where ppl think they can modify arbitrary versions of lucene dependencies. Can we specify our dependencies in a different way (e.g. exact version) in our maven stuff so this won't happen? e.g. you can do this in python, and specify that you depend on `antlr == x.y.z` rather than just depend on `antlr`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351332547 personally i am against the shading. I think it is a huge antipattern, it hides third-party artifacts completely. think about someone trying to do security or license audit and they have "secret dependencies" hidden from view, as an example. I don't understand the issue myself. if you want to use different antlr version in two places just use two classloaders. thats what we did in elastic/opensearch. but like i said, i'm not opposed to upgrading IF concerns about the new version are taken care of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery
benwtrent commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1351211685 @uschindler 100%! I can get the backport PR done asap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight
jpountz commented on code in PR #12017: URL: https://github.com/apache/lucene/pull/12017#discussion_r1048383541 ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException { private int optCount(LeafReaderContext context, Occur occur) throws IOException { final int numDocs = context.reader().numDocs(); int optCount = 0; +boolean unknownCount = false; for (WeightedBooleanClause weightedClause : weightedClauses) { if (weightedClause.clause.getOccur() != occur) { continue; } int count = weightedClause.weight.count(context); - if (count == -1 || count == numDocs) { -// If any of the clauses has a number of matches that is unknown, the number of matches of -// the disjunction is unknown. + if (count == -1) { +// If one clause has a number of matches that is unknown, let's be more aggressive to check +// whether remain clauses could match all docs. +unknownCount = true; +continue; Review Comment: use an `else if` instead of `continue` for consistency with how other cases are handled? ## lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java: ## @@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur occur) throws IOException return -1; Review Comment: Maybe this line should also set `unkwownCount = true` instead of returning `-1` right away? ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception { } } + public void testAggressiveMatchCount() throws IOException { Review Comment: Fold these tests into the existing `testDisjunctionMatchesCount` test? ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception { } } + public void testAggressiveMatchCount() throws IOException { +Directory dir = newDirectory(); +IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig()); +Document doc = new Document(); +LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L); +doc.add(longPoint); +StringField stringField = new StringField("string", "abc", Store.NO); +doc.add(stringField); +writer.addDocument(doc); +longPoint.setLongValues(10L, 11L, 12L); +stringField.setStringValue("xyz"); +writer.addDocument(doc); + +IndexReader reader = DirectoryReader.open(writer); +writer.close(); +IndexSearcher searcher = new IndexSearcher(reader); + +long[] lower = new long[] {4L, 5L, 6L}; +long[] upper = new long[] {9L, 10L, 11L}; +Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper); +assert reader.leaves().size() == 1; +assert searcher +.createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f) +.count(reader.leaves().get(0)) +== -1; + +Query query = +new BooleanQuery.Builder() +.add(new TermQuery(new Term("string", "xyz")), Occur.MUST) +.add(unknownCountQuery, Occur.MUST_NOT) +.add(new MatchAllDocsQuery(), Occur.MUST_NOT) +.build(); +Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f); +// count of the first MUST_NOT clause is unknown, but the second MUST_NOT clause matches all +// docs +assertEquals(0, weight.count(reader.leaves().get(0))); + +query = +new BooleanQuery.Builder() +.add(new TermQuery(new Term("string", "xyz")), Occur.MUST) +.add(unknownCountQuery, Occur.MUST_NOT) +.add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT) +.build(); +weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f); +// count of the first MUST_NOT clause is unknown, though the second MUST_NOT clause matche one +// doc, we can't figure out the number of +// docs +assertEquals(-1, weight.count(reader.leaves().get(0))); + +// test pure disConjunction Review Comment: ```suggestion // test pure disjunction ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #12016: Upgrade ANTLR to version 4.11.1
dweiss commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1351155914 > With ASM this was hard to do but with Grade it is quite easy. Just add another configuration with those two dependencies and use the "Gradle Shadow Plugin" (https://imperceptiblethoughts.com/shadow/) to transform their package names into the Lucene namespace. Technically - easy to do. Question is whether we want to do it (I don't see any problems here). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request, #12017: Aggressive `count` in BooleanWeight
LuXugang opened a new pull request, #12017: URL: https://github.com/apache/lucene/pull/12017 When BooleanQuery is pure disjunction, if at least one clause could match all docs, then we could get right `count` even though there was other clause whose `count` is unknown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] fcofdez commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField
fcofdez commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1351000741 @jpountz thanks for looking into this, unfortunately I didn't have time to go through the last round of review comments. The change makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11997: Add IntField, LongField, FloatField and DoubleField
jpountz commented on PR #11997: URL: https://github.com/apache/lucene/pull/11997#issuecomment-1350964750 @fcofdez I hope you don't mind but I checked out your branch to look into the above issue and ended up pushing my changes. These new fields not only support indexing sorted numeric doc values, so that queries do not have to specify whether floats/doubles are indexed via their raw bits or as sortable ints/longs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery
uschindler commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350722742 > @benwtrent Would you mind helping with the backport by creating a PR? There are a few conflicts and I could use some help with resolving them. There are also Java 17 switch expressions instead of switch statements inside that need to be changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12004: Move byte vector queries into new KnnByteVectorQuery
jpountz commented on PR #12004: URL: https://github.com/apache/lucene/pull/12004#issuecomment-1350702865 @benwtrent Would you mind helping with the backport by creating a PR? There are a few conflicts and I could use some help with resolving them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
iverase commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350678308 > If it gives a good speedup, we could think about applying the same trick to other queries too, if we can do it without making things too ugly. For example LatLonPoint, IntPoint, etc currently all use byte[] comparison for their range/bounding-box queries. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12016: Upgrade ANTLR to version 4.11.1
uschindler commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350666129 Hi, I agree with all Robert say. I would also like to make another suggestion (that could also be applied when this has been fixed). To me it looks like a bad decission of antlr, that the tool compiles code and creates some Java/Class files but at runtime it also needs some dependency JAR in exactly the correct version. This is a problem for a library like lucene that gets included into other projects. The same also applies for ASM, although this is no longer a big problem anymore, because ASM's APIs are now very stable and it does not matter if a project uses ASM 8 or ASM 9 if minimum requirements are guaranteed (so the user has more flexibility). In contrast javacc does not have the problem, because javacc or jflex generates all of the code and you don't need any runtime library to execute and access the AST. In Lucene both (ANTLR and ASM) are dependencies of one artifact: lucene-expressions, so it is only a real problem there. My suggestion would be: Let's shade the ANTLR (and possibly also ASM - until the JDK-included bytecode generator is out of incubation/preview) into the JAR file. With ASM this was hard to do but with Grade it is quite easy. Just add another configuration with those two dependencies and use the "Gradle Shadow Plugin" (https://imperceptiblethoughts.com/shadow/) to transform their package names into the Lucene namespace. I know some people don't like this, but I had exactly the same problems with ASM in the forbiddenapis plugin and shading ASM in there was the only way to work around different, old, incompatible versions of ASM inside Maven or Gradle's classpath, while forbiddenapis always needeing the newest one. If you like I could make a PR to demonstrate the shading for Lucene's Gradle build in Expressions. @dweiss : What do you think? Of course we should not use shading anywhere else, but ANTLR and ASM are the candidates that always bring problems. Their size is also small so the overhead is small, but you have a consistent package that is unlikely to break when other projects use different library versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12004: Move byte vector queries into new KnnByteVectorQuery
jpountz merged PR #12004: URL: https://github.com/apache/lucene/pull/12004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 commented on pull request #12006: Do int compare instead of ArrayUtil#compareUnsigned4 in LatlonPointQueries
gf2121 commented on PR #12006: URL: https://github.com/apache/lucene/pull/12006#issuecomment-1350639546 This brings a small jump for Distance Filter LatLonPoint task. http://people.apache.org/~mikemccand/geobench.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350455204 Simple package-private `static` method to turn on the pickiness should do it. I don't want to see 100 new constructors/abstractions added with booleans, just because antlr made a bad decision. Our APIs should not have to suffer for it. Don't care what java developers or static analysis tools or whatever else wants to say about it: minimal abstractions here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jdconrad commented on pull request #12016: Upgrade ANTLR to version 4.11.1
jdconrad commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1350071474 I agree with @rmuir that having an ambiguity check for tests similar to Painless would be great for expressions. I'm a bit surprised this change didn't require much additionally to the regeneration of the lexer/grammar. One other thing I'm curious about is if this suffers from the same issue that Painless did with multiple nested conditionals (https://github.com/elastic/elasticsearch/pull/52056). So that may be something else worth adding a test for. Thanks for looking into upgrading this @reta! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349804323 Thanks for encouraging @rmuir ! I will be working on the matter this week and share my findings, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349778653 In general, sorry if i discouraged before, it is really just a frustrating situation If you get stuck, just leave the PR open. I will try to dig into this too, I have been through it before. I do agree we can't stay on old releases forever. But the crazy shit they did between antlr 3.x and 4.x caused me pain: * made the mistake of not spelunking thru antlr guts to figure out how to prevent performance traps * had users report performance bugs (in all cases it was some grammar tweak to fix) * fix these performance bugs in subsequent releases * get frustrated with the leniency and figure out how to make the shit picky again. So I don't wish that on anyone. Currently the expressions is great because it is simple and performs well: Users can represent needs in a simple javascript-like fashion and trust that it has same performance as writing custom java code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349610775 cc: @jdconrad who might remember a lot more about this than me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349606909 here is coverage report using the current antlr. I guess i dont know why so much is missing here: https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/618/jacoco/org.apache.lucene.expressions.js/ But yeah, if we exercise the possibilities of grammar from tests, AND tests use picky mode, then build will fail if grammar is ambiguous. It is definitely a PITA compared to antlr 3 :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349604029 As far as inspecting coverage, I suspect it is pretty good. But there is instructions in https://github.com/apache/lucene/blob/main/help/tests.txt on how to generate reports. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349602516 @reta I remember doing this adds overhead, that's why it is a boolean there. so it really just needs to be something we do from tests. for example it could be a package-private setter or similar? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349601503 > Looks like this in painless: https://github.com/opensearch-project/OpenSearch/blob/04757607c5aead788b465c77cec6ef459720f625/modules/lang-painless/src/main/java/org/opensearch/painless/antlr/Walker.java#L224-L245 Thanks a lot, @rmuir , I will take it from there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349599904 Looks like this in painless: https://github.com/opensearch-project/OpenSearch/blob/04757607c5aead788b465c77cec6ef459720f625/modules/lang-painless/src/main/java/org/opensearch/painless/antlr/Walker.java#L224-L245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12016: Upgrade ANTLR to version 4.11.1
rmuir commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349597878 the only way i know to prevent the traps is to do like painless and "enable picky mode" which fails test instead of doing slow things. and to have 100% test coverage of grammar! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta commented on pull request #12016: Upgrade ANTLR to version 4.11.1
reta commented on PR #12016: URL: https://github.com/apache/lucene/pull/12016#issuecomment-1349213966 @rmuir @uschindler what kind of (performance? jmh?) testing would help to discard / prove that moving to 4.11.x makes / does not make sense. You have definitely seen traps in the past, I would very appreciate pointers / guidance to explore the possibility of any regressions, thank you in advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta opened a new pull request, #12016: Upgrade ANTLR to version 4.11.1
reta opened a new pull request, #12016: URL: https://github.com/apache/lucene/pull/12016 Signed-off-by: Andriy Redko ### Description The Apache Lucene is using quite old version of ANTLR 4.5.1-1. By itself, it is not a showstopper, but more profound issue is that some ANTLR 3.x bits are used [1]. Since ANTLR 4.10.x (or even earlier), the compatibility layer with `3.x` release line has been dropped in `4.x` (see please [2]), which makes Apache Lucene impossile to be used with recent ANTLR 4.10.x+ releases [3]. The sample exception is below. ``` > java.lang.UnsupportedOperationException: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with version 3 (expected 4). > at org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:56) > at org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:48) > at org.apache.lucene.expressions@10.0.0-SNAPSHOT/org.apache.lucene.expressions.js.JavascriptLexer.(JavascriptLexer.java:279) ``` [1] https://github.com/apache/lucene/blob/main/lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptLexer.java#L189 [2] https://github.com/antlr/antlr4/commit/c68e127a7cf14470565d6e6ae1eff06db3e56ea7 [3] https://github.com/opensearch-project/OpenSearch/pull/4546 Closes https://github.com/apache/lucene/issues/11788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems
rmuir closed issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems URL: https://github.com/apache/lucene/issues/12012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()
rmuir commented on PR #12013: URL: https://github.com/apache/lucene/pull/12013#issuecomment-1348842317 Yeah, I'm highly suspicious of this cache. I dug into this and found that previously stored fields (!) were used for this lookup. So no surprise there was a cache around it, since this is a slow approach!!! But @gautamworah96 fixed this, and these days BinaryDocValues are used instead: see #10490 IMO there is no sense in caching BinaryDocValues lookups into the heap, it is already mmap'd and the operating system will do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)
rmuir commented on PR #12015: URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348790082 Thanks again, it really helps make the backporting process less painful for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #12015: Run spotless after javac (#12012)
dweiss commented on PR #12015: URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348772038 Sorry - I see it now. I've pushed to my local fork at gh. Didn't notice it. I've just pushed to 9x on apache as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #12015: Run spotless after javac (#12012)
dweiss commented on PR #12015: URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348767456 You have?! Well, that's strange - I've pushed it to 9x... Can you reproduce it with the head? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)
rmuir commented on PR #12015: URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348669494 @dweiss do you mind pushing to 9.x, otherwise i will do it for you. I just hit this again in 9.x :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #11998: Migrate away from per-segment-per-threadlocals on SegmentReader
rmuir merged PR #11998: URL: https://github.com/apache/lucene/pull/11998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12015: Run spotless after javac (#12012)
rmuir commented on PR #12015: URL: https://github.com/apache/lucene/pull/12015#issuecomment-1348488213 thank you @dweiss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #11988: Fix algorithm that chooses the bridge between a polygon and a hole
iverase merged PR #11988: URL: https://github.com/apache/lucene/pull/11988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase closed issue #11986: Polygons failing to tessellate
iverase closed issue #11986: Polygons failing to tessellate URL: https://github.com/apache/lucene/issues/11986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems
dweiss commented on issue #12012: URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347874082 I've applied the patch to 9x and main, btw. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #12015: Run spotless after javac (#12012)
dweiss merged PR #12015: URL: https://github.com/apache/lucene/pull/12015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems
dweiss commented on issue #12012: URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347872711 I filed a PR #12015 . Indeed the 'core' formatting task seems to be named 'spotlessJava' (spotless + format name), the check and apply are just attached to it in a fancy manner via some caching service. I get the same error output as you do. I believe it's still prone to some fuzziness because ecj linter and javac can run in arbitrary order. You can see it (on the SlowDirectory.java error example) when you compare the output of: ``` gradlew -p lucene/facet compileTestJava ecjLint --max-workers 1 gradlew -p lucene/facet ecjLint compileTestJava --max-workers 1 ``` I don't think it's a big issue though since both of these emit reasonably looking error messages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems
dweiss commented on issue #12012: URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347844975 > I think we are not slowing down the spotlessJava task which is the one actually failing for me? Eh. I don't know what the relationships between those spotless tasks are. Let me dig. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #12010: Enable LongDoubleConversion error-prone check
rmuir merged PR #12010: URL: https://github.com/apache/lucene/pull/12010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #12009: Find (and fix) places where we treat a long as a double without an explicit cast
rmuir closed issue #12009: Find (and fix) places where we treat a long as a double without an explicit cast URL: https://github.com/apache/lucene/issues/12009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12014: Ban use of Math.fma across the entire codebase
rmuir commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1347620183 Yeah, I think if the fallback java code was 2x, 4x, or 8x slower (like you would expect from these intrinsics), we wouldn't be having this conversation :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12014: Ban use of Math.fma across the entire codebase
gsmiller commented on PR #12014: URL: https://github.com/apache/lucene/pull/12014#issuecomment-1347617300 +1, seems reasonable to me. We can always remove this ban in the future if there's a good reason, but seems reasonable to put this in place to prevent it sneaking in for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #12014: Ban use of Math.fma across the entire codebase
rmuir opened a new pull request, #12014: URL: https://github.com/apache/lucene/pull/12014 When FMA is not supported by the hardware, these methods fall back to `BigDecimal` usage [1] which causes them to be 2500x slower [2]. While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this). This creates a terrible undocumented performance trap, see [3] for an example of a 30x slowdown of an entire application. In my experience, developers are often too far detached from the production reality, and that reality is: we're not deploying to macbook pros in production, instead we are almost all using virtualization: we can't afford such performance traps. Practically it would be an issue too: e.g. Policeman jenkins instance that runs our tests currently uses virtualbox. It would be bad for vector-search tests to suddenly get 30x slower. We can't safely use this method anywhere, as we don't have access to check CPUID or anything to see if it will be insanely slow or not. Let's ban it completely: I'm concerned it will sneak into our codebase otherwise... it almost happened before: #10718 [1] [Math.java source code](https://github.com/openjdk/jdk/blob/5d4ea9b9549b762b7c207e5c2ee65bc51591433b/src/java.base/share/classes/java/lang/Math.java#L2364-L2366) [2] [Comment on JIRA issue for x86 intrinsic mentioning 2500x speedup](https://bugs.openjdk.org/browse/JDK-8154122?focusedCommentId=13995171=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13995171) [3] [VirtualBox bug for lack of FMA support](https://www.virtualbox.org/ticket/15471) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()
uschindler commented on PR #12013: URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347501489 Please remove the cache. Totally useless. It hurts more because the escape analysis can't work. When tiered compilation stepped in, this is just a useless extra My rule: Never ever cache object instances to prevent object creation and help GC. This was needed in in Java 6 and java 7, but not anymore. It has opposite problem: it actually creates those objects on heap and stresses GC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12013: Clear thread local values on UTF8TaxonomyWriterCache.close()
rmuir commented on PR #12013: URL: https://github.com/apache/lucene/pull/12013#issuecomment-1347439475 Also (summoning my inner @uschindler), curious what the perf impact is if we simply remove these caches completely. Maybe they are not relevant to the performance anymore due to faceting changes (e.g. storing labels in docvalues rather than payloads) ? Maybe they are even hurting performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12012: Spotless runs before javac in `gradle check` which results in less-than-helpful errors on compilation problems
rmuir commented on issue #12012: URL: https://github.com/apache/lucene/issues/12012#issuecomment-1347423692 I ran this patch on the mac m1 and also got `BUILD SUCCESSFUL in 3m 33s`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org