Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-02-01 Thread via GitHub
github-actions[bot] commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1922531005 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2024-02-01 Thread via GitHub
github-actions[bot] commented on PR #12938: URL: https://github.com/apache/lucene/pull/12938#issuecomment-1922531062 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix DV update files referenced by merge will be deleted by concurrent flush [lucene]

2024-02-01 Thread via GitHub
github-actions[bot] commented on PR #13017: URL: https://github.com/apache/lucene/pull/13017#issuecomment-1922530938 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1475060677 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +83,32 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOExce

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1475057976 ## lucene/core/src/java/org/apache/lucene/search/knn/KnnCollectorManager.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-02-01 Thread via GitHub
mayya-sharipova commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1474964972 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +83,32 @@ public Query rewrite(IndexSearcher indexSearcher) throws

Re: [PR] Backport SOLR-14765 to branch_8_11 [lucene-solr]

2024-02-01 Thread via GitHub
HoustonPutman commented on PR #2682: URL: https://github.com/apache/lucene-solr/pull/2682#issuecomment-1921919670 Hey @risdenk are you still planning on getting this in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Fix knn vector visit limit fence post error [lucene]

2024-02-01 Thread via GitHub
jpountz commented on PR #13058: URL: https://github.com/apache/lucene/pull/13058#issuecomment-1921879126 Yeah, it's annoying, but I agree that adding more APIs to fix this is overkill. Can you update your change to keep cost = cardinality, and only do the +1 when calling `approximateSearch(

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-02-01 Thread via GitHub
jimczi commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1474802862 ## lucene/core/src/java/org/apache/lucene/index/LeafReader.java: ## @@ -236,27 +235,24 @@ public final PostingsEnum postings(Term term) throws IOException { *

Re: [I] Contributing a deep-learning, BERT-based analyzer [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on issue #13065: URL: https://github.com/apache/lucene/issues/13065#issuecomment-1921775426 For the analyzer, are you meaning something that tokenizes into an embedding? Or just creates the tokens (wordpiece + dictionary)? -- This is an automated message from t

Re: [PR] Fix normalization in TeluguAnalyzer [lucene]

2024-02-01 Thread via GitHub
jpountz merged PR #13059: URL: https://github.com/apache/lucene/pull/13059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[I] Contributing a deep-learning, BERT-based analyzer [lucene]

2024-02-01 Thread via GitHub
lmessinger opened a new issue, #13065: URL: https://github.com/apache/lucene/issues/13065 ### Description Hi, We are building an open-source custom Hebrew/Arabic analyzer (lemmatizer and stopwords), based on a BERT model. We'd like to contribute this to this repository. How ca

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-02-01 Thread via GitHub
jpountz merged PR #13036: URL: https://github.com/apache/lucene/pull/13036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix knn vector visit limit fence post error [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on PR #13058: URL: https://github.com/apache/lucene/pull/13058#issuecomment-1921169639 > "exactly visitLimit hits have been collected but I collected all that I needed" and "exactly visitLimit hits have been collected but I would need to collect more to be done"?

Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1921160057 > What was the decision behind adding these candidates `outOfOrder`? Speed, once we know things are sorted, we know they have been checked for diversity. But with any optimi

Re: [PR] Fix broken loop in TestDocumentsWriterStallControl.assertState() [lucene]

2024-02-01 Thread via GitHub
s1monw merged PR #13062: URL: https://github.com/apache/lucene/pull/13062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] TestDocumentsWriterStallControl.assertState() does not do what it appears it would [lucene]

2024-02-01 Thread via GitHub
s1monw closed issue #13061: TestDocumentsWriterStallControl.assertState() does not do what it appears it would URL: https://github.com/apache/lucene/issues/13061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-02-01 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1921135758 @nitirajrathore very interesting results. This sort of indicates to me that no matter the heuristic, we just need a second pass over the graph to ensure connectedness and fix it u

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-02-01 Thread via GitHub
jfreden commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1920986116 Thank you @jpountz ! I've pushed changes to the tests, added the comment and also added an entry to `CHANGES.txt`. -- This is an automated message from the Apache Git Service. To resp

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-02-01 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1920943319 Hi @benwtrent, I left Amazon but I was able to run some tests with open dataset and also with Amazon dataset before leaving. I cannot share whole lot of detail about

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-02-01 Thread via GitHub
jpountz commented on code in PR #13036: URL: https://github.com/apache/lucene/pull/13036#discussion_r1474129857 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -962,6 +962,118 @@ public void testDisjunctionMatchesCount() throws IOException { d

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-02-01 Thread via GitHub
jfreden commented on code in PR #13036: URL: https://github.com/apache/lucene/pull/13036#discussion_r1474087852 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -962,6 +962,46 @@ public void testDisjunctionMatchesCount() throws IOException { di

Re: [PR] Fix knn vector visit limit fence post error [lucene]

2024-02-01 Thread via GitHub
jpountz commented on PR #13058: URL: https://github.com/apache/lucene/pull/13058#issuecomment-1920761327 Hmm, sorry it still doesn't feel completely right... It feels like the issue is that the collector doesn't distinguish between "exactly visitLimit hits have been collected but I collecte