[jira] [Comment Edited] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943317#comment-16943317 ] Atri Sharma edited comment on LUCENE-8213 at 10/3/19 4:18 AM: -- Thanks [~hossman] Interestingly, all but one test failures are coming from LatLon queries -- is there anything special about them? was (Author: atris): Thanks [~hossman] Interestingly, all but one test failures are coming from LatLong queries -- is there anything special about them? > Cache costly subqueries asynchronously > -- > > Key: LUCENE-8213 > URL: https://issues.apache.org/jira/browse/LUCENE-8213 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Affects Versions: 7.2.1 >Reporter: Amir Hadadi >Priority: Minor > Labels: performance > Attachments: > 0001-Reproduce-across-segment-caching-of-same-query.patch, > thetaphi_Lucene-Solr-master-Linux_24839.log.txt > > Time Spent: 13h > Remaining Estimate: 0h > > IndexOrDocValuesQuery allows to combine costly range queries with a selective > lead iterator in an optimized way. However, the range query at some point > gets cached by a querying thread in LRUQueryCache, which negates the > optimization of IndexOrDocValuesQuery for that specific query. > It would be nice to see an asynchronous caching implementation in such cases, > so that queries involving IndexOrDocValuesQuery would have consistent > performance characteristics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943317#comment-16943317 ] Atri Sharma commented on LUCENE-8213: - Thanks [~hossman] Interestingly, all but one test failures are coming from LatLong queries -- is there anything special about them? > Cache costly subqueries asynchronously > -- > > Key: LUCENE-8213 > URL: https://issues.apache.org/jira/browse/LUCENE-8213 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Affects Versions: 7.2.1 >Reporter: Amir Hadadi >Priority: Minor > Labels: performance > Attachments: > 0001-Reproduce-across-segment-caching-of-same-query.patch, > thetaphi_Lucene-Solr-master-Linux_24839.log.txt > > Time Spent: 13h > Remaining Estimate: 0h > > IndexOrDocValuesQuery allows to combine costly range queries with a selective > lead iterator in an optimized way. However, the range query at some point > gets cached by a querying thread in LRUQueryCache, which negates the > optimization of IndexOrDocValuesQuery for that specific query. > It would be nice to see an asynchronous caching implementation in such cases, > so that queries involving IndexOrDocValuesQuery would have consistent > performance characteristics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #853: SOLR-13737 Moving SolrCloud on the README with some cues.
dsmiley commented on issue #853: SOLR-13737 Moving SolrCloud on the README with some cues. URL: https://github.com/apache/lucene-solr/pull/853#issuecomment-537766519 Hey could you update the README and try again? Also replace "distributed" with "clustered" per recent conversation on the dev list (CC @chatman ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task
[ https://issues.apache.org/jira/browse/LUCENE-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943296#comment-16943296 ] Atri Sharma commented on LUCENE-8989: - bq. Hmm, that is a good point – maybe we should not handle this exception and let it throw? I would still argue that the default case should be to handle the exception and let the query execute gracefully. The case where executor has been erratically shut down is a good point, but for edges like this, we should probably override IndexSearcher's default behaviour. If it helps, we could potentially introduce a toggle which is enabled by default and indicates that IndexSearcher will perform the graceful handling, and if the user wishes the error to propagate, just call a new API method to disable the toggle? > IndexSearcher Should Handle Rejection of Concurrent Task > > > Key: LUCENE-8989 > URL: https://issues.apache.org/jira/browse/LUCENE-8989 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As discussed in [https://github.com/apache/lucene-solr/pull/815,] > IndexSearcher should handle the case when the executor rejects the execution > of a task (unavailability of threads?). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943283#comment-16943283 ] Ben Manes commented on SOLR-8241: - Thanks [~ab], [~dsmiley], [~elyograg]! > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399
[ https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated LUCENE-8991: --- Fix Version/s: 8.3 master (9.0) Assignee: Chris M. Hostetter Resolution: Fixed Status: Resolved (was: Patch Available) > disable java.util.HashMap assertions to avoid spurious vailures due to > JDK-8205399 > -- > > Key: LUCENE-8991 > URL: https://issues.apache.org/jira/browse/LUCENE-8991 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Labels: Java10, Java11 > Fix For: master (9.0), 8.3 > > Attachments: LUCENE-8991.patch, LUCENE-8991.patch > > > An incredibly common class of jenkins failure (at least in Solr tests) stems > from triggering assertion failures in java.util.HashMap -- evidently > triggering bug JDK-8205399, first introduced in java-10, and fixed in > java-12, but has never been backported to any java-10 or java-11 bug fix > release... >https://bugs.openjdk.java.net/browse/JDK-8205399 > SOLR-13653 tracks how this bug can affect Solr users, but I think it would > make sense to disable java.util.HashMap in our build system to reduce the > confusing failures when users/jenkins runs tests, since there is nothing we > can do to work around this when testing with java-11 (or java-10 on branch_8x) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399
[ https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943255#comment-16943255 ] ASF subversion and git services commented on LUCENE-8991: - Commit 60b9ec0866e6223afb269fa377203f731cca2973 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=60b9ec0 ] LUCENE-8991: disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 (cherry picked from commit 10da07a396777e3e7cfb091c5dec826b6df11284) > disable java.util.HashMap assertions to avoid spurious vailures due to > JDK-8205399 > -- > > Key: LUCENE-8991 > URL: https://issues.apache.org/jira/browse/LUCENE-8991 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Labels: Java10, Java11 > Attachments: LUCENE-8991.patch, LUCENE-8991.patch > > > An incredibly common class of jenkins failure (at least in Solr tests) stems > from triggering assertion failures in java.util.HashMap -- evidently > triggering bug JDK-8205399, first introduced in java-10, and fixed in > java-12, but has never been backported to any java-10 or java-11 bug fix > release... >https://bugs.openjdk.java.net/browse/JDK-8205399 > SOLR-13653 tracks how this bug can affect Solr users, but I think it would > make sense to disable java.util.HashMap in our build system to reduce the > confusing failures when users/jenkins runs tests, since there is nothing we > can do to work around this when testing with java-11 (or java-10 on branch_8x) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] KoenDG opened a new pull request #919: LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll().
KoenDG opened a new pull request #919: LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). URL: https://github.com/apache/lucene-solr/pull/919 https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-8994 If you have actual serious issues to attend, no need to bother with this PR, it is code cleanup, not features or fixes. A small and unimportant PR. Some code cleanup. And perhaps in some cases a small performance gain. I would understand if issue was taken concerning readability in some cases. I could change those cases to look something like this, if it made if more readable: ``` new ArrayList<>( nameOfCollection.getMethod(someExtraVars) ); ``` Frankly, it already was equally unreadable in such cases as: ``` resources.addAll(Accountables.namedAccountables("field", fields)); ``` Depends on what the reviewer wants, I suppose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] KoenDG commented on issue #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2
KoenDG commented on issue #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2 URL: https://github.com/apache/lucene-solr/pull/881#issuecomment-537724000 Updated as requested. Also, the initial changes were not manual, they were automatic with the Intellij IDE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8991) disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399
[ https://issues.apache.org/jira/browse/LUCENE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943222#comment-16943222 ] ASF subversion and git services commented on LUCENE-8991: - Commit 10da07a396777e3e7cfb091c5dec826b6df11284 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=10da07a ] LUCENE-8991: disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 > disable java.util.HashMap assertions to avoid spurious vailures due to > JDK-8205399 > -- > > Key: LUCENE-8991 > URL: https://issues.apache.org/jira/browse/LUCENE-8991 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Labels: Java10, Java11 > Attachments: LUCENE-8991.patch, LUCENE-8991.patch > > > An incredibly common class of jenkins failure (at least in Solr tests) stems > from triggering assertion failures in java.util.HashMap -- evidently > triggering bug JDK-8205399, first introduced in java-10, and fixed in > java-12, but has never been backported to any java-10 or java-11 bug fix > release... >https://bugs.openjdk.java.net/browse/JDK-8205399 > SOLR-13653 tracks how this bug can affect Solr users, but I think it would > make sense to disable java.util.HashMap in our build system to reduce the > confusing failures when users/jenkins runs tests, since there is nothing we > can do to work around this when testing with java-11 (or java-10 on branch_8x) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter reopened LUCENE-8213: > Cache costly subqueries asynchronously > -- > > Key: LUCENE-8213 > URL: https://issues.apache.org/jira/browse/LUCENE-8213 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Affects Versions: 7.2.1 >Reporter: Amir Hadadi >Priority: Minor > Labels: performance > Attachments: > 0001-Reproduce-across-segment-caching-of-same-query.patch, > thetaphi_Lucene-Solr-master-Linux_24839.log.txt > > Time Spent: 13h > Remaining Estimate: 0h > > IndexOrDocValuesQuery allows to combine costly range queries with a selective > lead iterator in an optimized way. However, the range query at some point > gets cached by a querying thread in LRUQueryCache, which negates the > optimization of IndexOrDocValuesQuery for that specific query. > It would be nice to see an asynchronous caching implementation in such cases, > so that queries involving IndexOrDocValuesQuery would have consistent > performance characteristics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943217#comment-16943217 ] Chris M. Hostetter commented on LUCENE-8213: FWIW: I just attached thetaphi_Lucene-Solr-master-Linux_24839.log.txt, a jenkins log that shows two other failures that seem to be related to this issue... {noformat} Checking out Revision 3c399bb696073bdb30f278309410a50effabd0e7 (refs/remotes/origin/master) ... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestLatLonDocValuesQueries -Dtests.method=testAllLatEqual -Dtests.seed=8D56B48917EDB35F -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=zh-Hans -Dtests.timezone=America/Bahia -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 2.67s J0 | TestLatLonDocValuesQueries.testAllLatEqual <<< [junit4]> Throwable #1: java.lang.AssertionError: wrong hit (first of possibly more): [junit4]> FAIL: id=235 should match but did not [junit4]> query=point:polygons([[0.0, 1.401298464324817E-45] [35.0, 1.401298464324817E-45] [35.0, 180.0] [0.0, 180.0] [0.0, 1.401298464324817E-45] ]) docID=227 [junit4]> lat=0.0 lon=41.82071185670793 [junit4]> deleted?=false polygon=[0.0, 1.401298464324817E-45] [35.0, 1.401298464324817E-45] [35.0, 180.0] [0.0, 180.0] [0.0, 1.401298464324817E-45] ... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheSortRandom -Dtests.method=testRandomStringSort -Dtests.seed=F7E42E6905E37945 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=khq-ML -Dtests.timezone=America/Chihuahua -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.15s J1 | TestFieldCacheSortRandom.testRandomStringSort <<< [junit4]> Throwable #1: java.lang.AssertionError: expected:<[65 76 65 64 6d 68 76 68 75 77 61 71 64 63 65 63 7a 79 77 77 63 69 71 62 76 70 7a 62 6d 66 75 67 61]> but was:<[66 6c 75 6e 78 63 67 66 63 77 7a 6b 69 6d 7a 77 62 6c 71 61 79 61 67 71 6f 69 66 71 64 6a 66]> {noformat} both of those seeds seem to fail reliably against GIT:3c399bb6960 (the version checked out by the jenkins run) but both seem to start passing reliably as of GIT:302cd09b4ce (when this issue was reverted) I should point out however in case it's helpful: when the failures reproduce, the specifics of the failures are always different -- suggesting that parallel thread execution is affecting the results, since the randomization of the index data should be deterministic based on the seed. > Cache costly subqueries asynchronously > -- > > Key: LUCENE-8213 > URL: https://issues.apache.org/jira/browse/LUCENE-8213 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Affects Versions: 7.2.1 >Reporter: Amir Hadadi >Priority: Minor > Labels: performance > Attachments: > 0001-Reproduce-across-segment-caching-of-same-query.patch, > thetaphi_Lucene-Solr-master-Linux_24839.log.txt > > Time Spent: 13h > Remaining Estimate: 0h > > IndexOrDocValuesQuery allows to combine costly range queries with a selective > lead iterator in an optimized way. However, the range query at some point > gets cached by a querying thread in LRUQueryCache, which negates the > optimization of IndexOrDocValuesQuery for that specific query. > It would be nice to see an asynchronous caching implementation in such cases, > so that queries involving IndexOrDocValuesQuery would have consistent > performance characteristics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated LUCENE-8213: --- Attachment: thetaphi_Lucene-Solr-master-Linux_24839.log.txt > Cache costly subqueries asynchronously > -- > > Key: LUCENE-8213 > URL: https://issues.apache.org/jira/browse/LUCENE-8213 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Affects Versions: 7.2.1 >Reporter: Amir Hadadi >Priority: Minor > Labels: performance > Attachments: > 0001-Reproduce-across-segment-caching-of-same-query.patch, > thetaphi_Lucene-Solr-master-Linux_24839.log.txt > > Time Spent: 13h > Remaining Estimate: 0h > > IndexOrDocValuesQuery allows to combine costly range queries with a selective > lead iterator in an optimized way. However, the range query at some point > gets cached by a querying thread in LRUQueryCache, which negates the > optimization of IndexOrDocValuesQuery for that specific query. > It would be nice to see an asynchronous caching implementation in such cases, > so that queries involving IndexOrDocValuesQuery would have consistent > performance characteristics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments
[ https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-13797: - Fix Version/s: master (9.0) Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review Anshum. Fixed the annotation and pushed. > SolrResourceLoader produces inconsistent results when given bad arguments > - > > Key: SOLR-13797 > URL: https://issues.apache.org/jira/browse/SOLR-13797 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch > > > SolrResourceLoader will attempt to do some magic to infer what the user > wanted when loading TokenFilter and Tokenizer classes. However, this can end > up putting the wrong class in the cache such that the request succeeds the > first time but fails subsequent times. It should either succeed or fail > consistently on every call. > This can be triggered in a variety of ways, but the simplest is maybe by > specifying the wrong element type in an indexing chain. Consider the field > type definition: > {code:xml} > > > > maxGramSize="2"/> > > > {code} > If loaded by itself (e.g. docker container for standalone validation) then > the schema will pass and collection will succeed, with Solr actually figuring > out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded > on a cluster with other collections where the {{NGramTokenizerFactory}} has > been loaded correctly then we get {{ClassCastException}}. Or if this > collection is loaded first then others using the Tokenizer will fail instead. > I'd argue that succeeding on both calls is the better approach because it > does what the user likely wants instead of what the user explicitly asks for, > and creates a nicer user experience that is marginally less pedantic. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments
[ https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943203#comment-16943203 ] ASF subversion and git services commented on SOLR-13797: Commit 2d3baf6e8fc1d86f13e2e0f62eba97bb5ec9afca in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d3baf6 ] SOLR-13797 SolrResourceLoader no longer caches bad results when asked for wrong type > SolrResourceLoader produces inconsistent results when given bad arguments > - > > Key: SOLR-13797 > URL: https://issues.apache.org/jira/browse/SOLR-13797 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch > > > SolrResourceLoader will attempt to do some magic to infer what the user > wanted when loading TokenFilter and Tokenizer classes. However, this can end > up putting the wrong class in the cache such that the request succeeds the > first time but fails subsequent times. It should either succeed or fail > consistently on every call. > This can be triggered in a variety of ways, but the simplest is maybe by > specifying the wrong element type in an indexing chain. Consider the field > type definition: > {code:xml} > > > > maxGramSize="2"/> > > > {code} > If loaded by itself (e.g. docker container for standalone validation) then > the schema will pass and collection will succeed, with Solr actually figuring > out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded > on a cluster with other collections where the {{NGramTokenizerFactory}} has > been loaded correctly then we get {{ClassCastException}}. Or if this > collection is loaded first then others using the Tokenizer will fail instead. > I'd argue that succeeding on both calls is the better approach because it > does what the user likely wants instead of what the user explicitly asks for, > and creates a nicer user experience that is marginally less pedantic. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage
[ https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943196#comment-16943196 ] Lucene/Solr QA commented on SOLR-13812: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 58s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 50s{color} | {color:green} core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 34s{color} | {color:green} test-framework in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13812 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982002/SOLR-13812.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / a57ec14 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/564/testReport/ | | modules | C: solr/core solr/test-framework U: solr | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/564/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test > coverage > > > Key: SOLR-13812 > URL: https://issues.apache.org/jira/browse/SOLR-13812 > Project: Solr > Issue Type: Test >Reporter: Diego Ceccarelli >Assignee: Christine Poerschke >Priority: Minor > Attachments: SOLR-13812.patch > > > In > https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d > on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 > [~diegoceccarelli] proposes to add javadocs and uneven length parameter > rejection for the {{SolrTestCaseJ4.params(String...)}} method. > This ticket proposes to do that plus to also add basic test coverage for the > method, separately from the unrelated SOLR-11831 changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-8998) OverviewImplTest.testIsOptimized reproducible failure
Chris M. Hostetter created LUCENE-8998: -- Summary: OverviewImplTest.testIsOptimized reproducible failure Key: LUCENE-8998 URL: https://issues.apache.org/jira/browse/LUCENE-8998 Project: Lucene - Core Issue Type: Bug Components: luke Reporter: Chris M. Hostetter Assignee: Tomoko Uchida The following seed reproduces reliably for me on master... (NOTE: the {{ERROR StatusLogger}} messages include the one about the AccessControlException occur even with other seeds when the test passes) {noformat} [mkdir] Created dir: /home/hossman/lucene/alt_dev/lucene/build/luke/test [junit4:pickseed] Seed property 'tests.seed' already defined: 9123DD19C50D658 [mkdir] Created dir: /home/hossman/lucene/alt_dev/lucene/build/luke/test/temp [junit4] says cześć! Master seed: 9123DD19C50D658 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(8576@localhost). [junit4] Suite: org.apache.lucene.luke.models.overview.OverviewImplTest [junit4] 2> ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 [junit4] 2> ERROR StatusLogger Could not reconfigure JMX [junit4] 2> java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "createMBeanServer") [junit4] 2>at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) [junit4] 2>at java.base/java.security.AccessController.checkPermission(AccessController.java:897) [junit4] 2>at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:322) [junit4] 2>at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:479) [junit4] 2>at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140) [junit4] 2>at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:559) [junit4] 2>at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:620) [junit4] 2>at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:637) [junit4] 2>at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:231) [junit4] 2>at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153) [junit4] 2>at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45) [junit4] 2>at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194) [junit4] 2>at org.apache.logging.log4j.LogManager.getLogger(LogManager.java:581) [junit4] 2>at org.apache.lucene.luke.util.LoggerFactory.getLogger(LoggerFactory.java:70) [junit4] 2>at org.apache.lucene.luke.models.util.IndexUtils.(IndexUtils.java:62) [junit4] 2>at org.apache.lucene.luke.models.LukeModel.(LukeModel.java:60) [junit4] 2>at org.apache.lucene.luke.models.overview.OverviewImpl.(OverviewImpl.java:50) [junit4] 2>at org.apache.lucene.luke.models.overview.OverviewImplTest.testIsOptimized(OverviewImplTest.java:77) [junit4] 2>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2>at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2>at java.base/java.lang.reflect.Method.invoke(Method.java:566) [junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) [junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) [junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) [junit4] 2>at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) [junit4] 2>at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) [junit4] 2>at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) [junit4] 2>at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) [junit4] 2>at or
[GitHub] [lucene-solr] jpountz commented on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues
jpountz commented on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues URL: https://github.com/apache/lucene-solr/pull/905#issuecomment-537680322 @colings86 pointed me to https://math.stackexchange.com/questions/1175295/urn-problem-probability-of-drawing-balls-of-k-unique-colors which seems to have the answer to the problem we are trying to solve, if we can make the assumption that all docs have about the same number of values, which is likely not always the case but still probably a fair assumption for this kind of heuristic? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz edited a comment on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues
jpountz edited a comment on issue #905: LUCENE-8990: Add estimateDocCount(visitor) method to PointValues URL: https://github.com/apache/lucene-solr/pull/905#issuecomment-537456383 I was more thinking that there should be a better formula, which takes into account the fact that when you match few points, then it's more likely that all points belong to different documents than that they belong to the same documents? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8993) Change Maven POM repository URLs to https
[ https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943150#comment-16943150 ] Uwe Schindler edited comment on LUCENE-8993 at 10/2/19 8:41 PM: After deleting the local Maven repos on Jnekins, I had to fix this issue again: https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237 It looks like the Ant/Ivy build no longer works on a completely clean infrastructure (no maven repo, no ivy cache). was (Author: thetaphi): After deleting the Maven repo, I had to fix this issue again: https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237 > Change Maven POM repository URLs to https > - > > Key: LUCENE-8993 > URL: https://issues.apache.org/jira/browse/LUCENE-8993 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Affects Versions: 7.7.2, 8.2, 8.1.1 >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.3 > > Attachments: LUCENE-8993.patch > > > After fixing LUCENE-8807 I figured out today, that Lucene's build system uses > HTTPS URLs everywhere. But the POMs deployed to Maven central still use http > (I assumed that those are inherited from the ANT build). > This will fix it for later versions by changing the POM templates. Hopefully > this will not happen in Gradle! > [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses > HTTPS for all hard configured repositories (like Cloudera)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8993) Change Maven POM repository URLs to https
[ https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943150#comment-16943150 ] Uwe Schindler commented on LUCENE-8993: --- After deleting the Maven repo, I had to fix this issue again: https://issues.apache.org/jira/browse/LUCENE-2562?focusedCommentId=16816237&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16816237 > Change Maven POM repository URLs to https > - > > Key: LUCENE-8993 > URL: https://issues.apache.org/jira/browse/LUCENE-8993 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Affects Versions: 7.7.2, 8.2, 8.1.1 >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.3 > > Attachments: LUCENE-8993.patch > > > After fixing LUCENE-8807 I figured out today, that Lucene's build system uses > HTTPS URLs everywhere. But the POMs deployed to Maven central still use http > (I assumed that those are inherited from the ANT build). > This will fix it for later versions by changing the POM templates. Hopefully > this will not happen in Gradle! > [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses > HTTPS for all hard configured repositories (like Cloudera)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching
[ https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943149#comment-16943149 ] David Wayne Smiley commented on SOLR-13790: --- Interesting; your proposal makes sense. Thanks [~ab]. > LRUStatsCache size explosion and ineffective caching > > > Key: SOLR-13790 > URL: https://issues.apache.org/jira/browse/SOLR-13790 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2, 8.3 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13790.patch, SOLR-13790.patch > > > On a sizeable cluster with multi-shard multi-replica collections, when > {{LRUStatsCache}} was in use we encountered excessive memory usage, which > consequently led to severe performance problems. > On a closer examination of the heapdumps it became apparent that when > {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of > {{FastLRUCache}} using the passed {{shard}} argument - however, the value of > this argument is not a simple shard name but instead it's a randomly ordered > list of ALL replica URLs for this shard. > As a result, due to the combinatoric number of possible keys, over time the > map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries... > The fix seems to be simply to extract the shard name and cache using this > name instead of the full string value of the {{shard}} parameter. Existing > unit tests also need much improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330756503 ## File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java ## @@ -138,7 +138,7 @@ protected List fieldsWithDefaultValue = new ArrayList<>(); protected Collection requiredFields = new HashSet<>(); - protected volatile DynamicField[] dynamicFields; Review comment: > @sarowe why was this volatile? It's fishy to see this as the only volatile field. I was looking for any double check pattern, or lazy init, found nothing, as you mentioned it is the only one, so i removed it, makes no sense from my point of view This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
thomaswoeckinger commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330755828 ## File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java ## @@ -1556,48 +1547,46 @@ SimpleOrderedMap getProperties(SchemaField sf) { } } } -if (null != dynamicCopyFields) { Review comment: > Is this big diff just a re-indent after removing the wrapping null check? Yes just removed the null check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943141#comment-16943141 ] David Wayne Smiley commented on SOLR-8241: -- Woohoo! Thanks [~ab] and for your extreme persistence [~ben.manes]. Better late than never. I'd hope to see this as the default in solr configs in 9.0. > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330750867 ## File path: solr/core/src/java/org/apache/solr/schema/ManagedIndexSchema.java ## @@ -81,7 +81,7 @@ /** Solr-managed schema - non-user-editable, but can be mutable via internal and external REST API requests. */ public final class ManagedIndexSchema extends IndexSchema { - private boolean isMutable = false; + private final boolean isMutable; Review comment: nice This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330751954 ## File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java ## @@ -1556,48 +1547,46 @@ SimpleOrderedMap getProperties(SchemaField sf) { } } } -if (null != dynamicCopyFields) { Review comment: Is this big diff just a re-indent after removing the wrapping null check? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330751354 ## File path: solr/core/src/test/org/apache/solr/rest/schema/TestBulkSchemaAPI.java ## @@ -184,8 +188,6 @@ public void testAnalyzerClass() throws Exception { response = restTestHarness.post("/schema", json(addFieldTypeAnalyzerWithClass + suffix)); map = (Map) fromJSONString(response); assertNull(response, map.get("error")); - - restTestHarness.checkAdminResponseStatus("/admin/cores?wt=xml&action=RELOAD&core=" + coreName, "0"); Review comment: I'm glad you remembered This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted.
dsmiley commented on a change in pull request #902: SOLR-13795: Reload solr core after schema is persisted. URL: https://github.com/apache/lucene-solr/pull/902#discussion_r330752451 ## File path: solr/core/src/java/org/apache/solr/schema/IndexSchema.java ## @@ -138,7 +138,7 @@ protected List fieldsWithDefaultValue = new ArrayList<>(); protected Collection requiredFields = new HashSet<>(); - protected volatile DynamicField[] dynamicFields; Review comment: @sarowe why was this volatile? It's fishy to see this as the only volatile field. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8989) IndexSearcher Should Handle Rejection of Concurrent Task
[ https://issues.apache.org/jira/browse/LUCENE-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943135#comment-16943135 ] Michael McCandless commented on LUCENE-8989: {quote}Suppose the executor has been shut down and is rejecting all requests for new work. In that case an Exception here would help the caller to understand they are doing something unusual instead of burying the error case and attempting to handle it. {quote} Hmm, that is a good point – maybe we should not handle this exception and let it throw? > IndexSearcher Should Handle Rejection of Concurrent Task > > > Key: LUCENE-8989 > URL: https://issues.apache.org/jira/browse/LUCENE-8989 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As discussed in [https://github.com/apache/lucene-solr/pull/815,] > IndexSearcher should handle the case when the executor rejects the execution > of a task (unavailability of threads?). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539
thomaswoeckinger commented on issue #665: Fixes SOLR-13539 URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537648923 pre commit check is still toggling This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13722) Package Management APIs
[ https://issues.apache.org/jira/browse/SOLR-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943087#comment-16943087 ] David Wayne Smiley commented on SOLR-13722: --- I'm confused on the status here. It's in master but not 8x; was that deliberate? This is not an implied "ask" to merge to 8x as it ought to be reviewed any way. > Package Management APIs > --- > > Key: SOLR-13722 > URL: https://issues.apache.org/jira/browse/SOLR-13722 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Labels: package > > This ticket totally eliminates the need for an external service to host the > jars. So a url will no longer be required. An external URL leads to > unreliability because the service may go offline or it can be DDoSed if/when > too many requests are sent to them > > > Add a jar to cluster as follows > {code:java} > curl -X POST -H 'Content-Type: application/octet-stream' --data-binary > @myjar.jar http://localhost:8983/api/cluster/filestore/package?name=myjar.jar > {code} > This does the following operations > * Upload this jar to all the live nodes in the system > * The name of the file is the {{sha256-}} of the file/payload > * The store is agnostic of the content of the file/payload > h2. How it works? > A blob that is POSTed to the {{/api/cluster/blob}} end point is persisted > locally & all nodes are instructed to download it from this node or from any > other available node. If a node comes up later, it can query other nodes in > the system and download the blobs as required > h2. {{add}} package command > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary '{ > "add": { >"name": "my-package" , > "file":{"id" : "", "sig" : ""} > }}' http://localhost:8983/api/cluster/package > {code} > ] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13813) Shared storage online split support
[ https://issues.apache.org/jira/browse/SOLR-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943070#comment-16943070 ] Yonik Seeley commented on SOLR-13813: - The other notable thing is that I *thought* the test was solid when sharedStorage=false, but then I looped the test overnight and also got a failure. I'm going to extract a version of this test that doesn't depend on SOLR-13101 and see if it's still reproducible. > Shared storage online split support > --- > > Key: SOLR-13813 > URL: https://issues.apache.org/jira/browse/SOLR-13813 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The strategy for online shard splitting is the same as that for normal (non > SHARED shards.) > During a split, the leader will forward updates to sub-shard leaders, those > updates will be buffered by the transaction log while the split is in > progress, and then the buffered updates are replayed. > One change that was added was to push the local index to blob store after > buffered updates are applied (but before it is marked as ACTIVE): > See > https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663 > This issue is about adding tests and ensuring that online shard splitting > (while updates are flowing) works reliably. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] magibney edited a comment on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
magibney edited a comment on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537538822 Here's a thought: what if we provided a boolean configuration option like `assumeExternalUnicodeNormalization`. Many of these transforms work on NFD input, and produce NFC output, but they generally are configured defensively (not assuming input to be NFD, and not assuming that output will be externally converted to NFC). This is understandable, but results in the odd situation (for example) that an analysis component like "ICUTransformFilter(Cyrillic-Latin)" would have NFC output, but _only_ for characters whose input representation matched the top-level Cyrillic-Latin filter (which is pretty restrictive). Input characters that didn't match the top-level filter would be untouched by any component of the underlying CompoundTransliterator. So if you want fully unicode-normalized output (and in the context of an analysis chain, most do), you have to separately apply post-transform NFC normalization anyway. At best, for this ends up doing some redundant work; but for the performance case we're considering here, there are particular implications. NFC, as a trailing transformation step, is both _very_ common and _very_ active -- active in the sense that it will in many common contexts block output waiting for combining diacritics for literally almost every character. If we know we're externally applying unicode normalization over the entire output, skipping baked-in post-NFC for every transform component avoids redundant work, but more importantly avoids a common case that's virtually guaranteed to result in a substantial amount of partial transliteration, rollback, etc. I think this can be done relatively cleanly using Transliterator getElements(), toRules(false), and createFromRules(...). I'd be curious to know what you think, @msokolov, and perhaps @rmuir? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13813) Shared storage online split support
[ https://issues.apache.org/jira/browse/SOLR-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943057#comment-16943057 ] Yonik Seeley commented on SOLR-13813: - This PR ( https://github.com/apache/lucene-solr/pull/918 ) adds a simple test. It usually fails for shared storage :-( . Example: {code} java.lang.AssertionError: Expected :50 Actual :49 {code} And I normally see background exceptions like the following during the run: {code} Error occured pulling shard=shard1_1 collection=livesplit1 from shared store java.lang.Exception: Local Directory content /private/var/folders/_f/2q_bxy9d0kz45_rk451rds3n9r_x9g/T/solr.store.blob.SharedStorageSplitTest_E4343DDDB931B9AE-001/tempDir-001/node1/./livesplit1_shard1_1_replica_s4/data/index/ has changed since Blob pull started. Aborting pull. at org.apache.solr.store.blob.util.BlobStoreUtils.syncLocalCoreWithSharedStore(BlobStoreUtils.java:128) at org.apache.solr.update.processor.DistributedZkUpdateProcessor.readFromSharedStoreIfNecessary(DistributedZkUpdateProcessor.java:1096) at org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:202) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:200) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2609) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:816) {code} Best guess is that this is caused by lack of concurrency support that needs to still be addressed in the blob puller/pusher code. > Shared storage online split support > --- > > Key: SOLR-13813 > URL: https://issues.apache.org/jira/browse/SOLR-13813 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The strategy for online shard splitting is the same as that for normal (non > SHARED shards.) > During a split, the leader will forward updates to sub-shard leaders, those > updates will be buffered by the transaction log while the split is in > progress, and then the buffered updates are replayed. > One change that was added was to push the local index to blob store after > buffered updates are applied (but before it is marked as ACTIVE): > See > https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663 > This issue is about adding tests and ensuring that online shard splitting > (while updates are flowing) works reliably. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] yonik opened a new pull request #918: SOLR-13813: SHARED: add basic test for online shard splitting
yonik opened a new pull request #918: SOLR-13813: SHARED: add basic test for online shard splitting URL: https://github.com/apache/lucene-solr/pull/918 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943041#comment-16943041 ] Andrzej Bialecki commented on SOLR-8241: Updated patch: * reduced contention in stats counting by using LongAdder-s instead of AtomicLong-s. * added option to set maxRamMB limit (it's an either or with the maxSize limit). I'm not sure I did the right thing when changing the value of this option - basically, if the existing cache was not weighted the {{setMaxRamMB}} rebuilds the cache, instead of just changing the policy limits. * added unit test for testing the limit changes on a live cache. If this patch looks more or less ok I'll add the RefGuide changes and commit it shortly (hopefully in time for 8.3 :) ) > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-8241) Evaluate W-TinyLfu cache
[ https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-8241: --- Attachment: SOLR-8241.patch > Evaluate W-TinyLfu cache > > > Key: SOLR-8241 > URL: https://issues.apache.org/jira/browse/SOLR-8241 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Ben Manes >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0) > > Attachments: EvictionBenchmark.png, GetPutBenchmark.png, > SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, SOLR-8241.patch, > SOLR-8241.patch, SOLR-8241.patch, caffeine-benchmark.txt, proposal.patch, > solr_caffeine.patch.gz, solr_jmh_results.json > > > SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). > The discussions seem to indicate that the higher hit rate (vs LRU) is offset > by the slower performance of the implementation. An original goal appeared to > be to introduce ARC, a patented algorithm that uses ghost entries to retain > history information. > My analysis of Window TinyLfu indicates that it may be a better option. It > uses a frequency sketch to compactly estimate an entry's popularity. It uses > LRU to capture recency and operate in O(1) time. When using available > academic traces the policy provides a near optimal hit rate regardless of the > workload. > I'm getting ready to release the policy in Caffeine, which Solr already has a > dependency on. But, the code is fairly straightforward and a port into Solr's > caches instead is a pragmatic alternative. More interesting is what the > impact would be in Solr's workloads and feedback on the policy's design. > https://github.com/ben-manes/caffeine/wiki/Efficiency -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8993) Change Maven POM repository URLs to https
[ https://issues.apache.org/jira/browse/LUCENE-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943037#comment-16943037 ] Mark Miller commented on LUCENE-8993: - FYI, I’ve done this, though I’ll check stuff like this again before the merge to master. > Change Maven POM repository URLs to https > - > > Key: LUCENE-8993 > URL: https://issues.apache.org/jira/browse/LUCENE-8993 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Affects Versions: 7.7.2, 8.2, 8.1.1 >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.3 > > Attachments: LUCENE-8993.patch > > > After fixing LUCENE-8807 I figured out today, that Lucene's build system uses > HTTPS URLs everywhere. But the POMs deployed to Maven central still use http > (I assumed that those are inherited from the ANT build). > This will fix it for later versions by changing the POM templates. Hopefully > this will not happen in Gradle! > [~markrmil...@gmail.com]: Can you make sure that the new Gradle build uses > HTTPS for all hard configured repositories (like Cloudera)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13797) SolrResourceLoader produces inconsistent results when given bad arguments
[ https://issues.apache.org/jira/browse/SOLR-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943033#comment-16943033 ] Anshum Gupta commented on SOLR-13797: - [~mdrob] - LGTM. Just a minor question/suggestion. Can you also annotate clearCache w/ @VisibleForTesting ? > SolrResourceLoader produces inconsistent results when given bad arguments > - > > Key: SOLR-13797 > URL: https://issues.apache.org/jira/browse/SOLR-13797 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Attachments: SOLR-13797.v1.patch, SOLR-13797.v2.patch > > > SolrResourceLoader will attempt to do some magic to infer what the user > wanted when loading TokenFilter and Tokenizer classes. However, this can end > up putting the wrong class in the cache such that the request succeeds the > first time but fails subsequent times. It should either succeed or fail > consistently on every call. > This can be triggered in a variety of ways, but the simplest is maybe by > specifying the wrong element type in an indexing chain. Consider the field > type definition: > {code:xml} > > > > maxGramSize="2"/> > > > {code} > If loaded by itself (e.g. docker container for standalone validation) then > the schema will pass and collection will succeed, with Solr actually figuring > out that it needs an {{NGramTokenFilterFactory}}. However, if this is loaded > on a cluster with other collections where the {{NGramTokenizerFactory}} has > been loaded correctly then we get {{ClassCastException}}. Or if this > collection is loaded first then others using the Tokenizer will fail instead. > I'd argue that succeeding on both calls is the better approach because it > does what the user likely wants instead of what the user explicitly asks for, > and creates a nicer user experience that is marginally less pedantic. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-13813) Shared storage online split support
Yonik Seeley created SOLR-13813: --- Summary: Shared storage online split support Key: SOLR-13813 URL: https://issues.apache.org/jira/browse/SOLR-13813 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley The strategy for online shard splitting is the same as that for normal (non SHARED shards.) During a split, the leader will forward updates to sub-shard leaders, those updates will be buffered by the transaction log while the split is in progress, and then the buffered updates are replayed. One change that was added was to push the local index to blob store after buffered updates are applied (but before it is marked as ACTIVE): See https://github.com/apache/lucene-solr/commit/fe17c813f5fe6773c0527f639b9e5c598b98c7d4#diff-081b7c2242d674bb175b41b6afc21663 This issue is about adding tests and ensuring that online shard splitting (while updates are flowing) works reliably. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537601926 Tried this out, and the performance gain was indeed significant. Comparing apples to apples here: [charFilterPerformanceTest2.txt](https://github.com/apache/lucene-solr/files/3682491/charFilterPerformanceTest2.txt) Results (again, very quick and dirty): ``` [junit4] Suite: org.apache.lucene.analysis.icu.TestICUTransformCharFilter [junit4] 2> tokenCount: 100, elapsed: 20244 [junit4] OK 20.6s | TestICUTransformCharFilter.testFilterPerformanceChar [junit4] 2> tokenCount: 100, elapsed: 3040 [junit4] OK 3.05s | TestICUTransformCharFilter.testFilterPerformanceToken [junit4] 2> tokenCount: 100, elapsed: 4339 [junit4] OK 4.36s | TestICUTransformCharFilter.testFilterPerformanceModifiedChar ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
[ https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-13811: -- Attachment: hoss_local_failure_after_refactoring.log.txt apache_Lucene-Solr-NightlyTests-8.x_221.log.txt Status: Open (was: Open) As noted by gitbot, I've committed some refactoring to help clean this up and isolate the problematic test logic. I'm attaching two files: * {{apache_Lucene-Solr-NightlyTests-8.x_221.log.txt}} - showing and example of how the problem has manifested in jenkins builds _prior_ to the refactoring I've just committed. * {{hoss_local_failure_after_refactoring.log.txt}} - showing how the newly refactored {{testRapidStopStartStopWithPropChange()}} can fail demonstrating the same problem in isolation. Note that {{testRapidStopStartStopWithPropChange()}} does not fail deterministically – the behavior is dependent on the timing of when exactly {{NodeLostTrigger}} fires _after_ the node is restarted, but before it is stopped again. Perhaps there is a way to "pause" the triggers to increase the odds of this happening? ... not sure. (It also seems to fail much more often in the Hdfs version of the test ... i'm not sure if that's because the MOVEREPLICA logic works faster/slower then in the non hdfs situation? ... i actaully haven't been able to trigger the failure w/the refactoring in place) [~ab] : can you please take a look at this and chime in with wether you think the current code in {{testRapidStopStartStopWithPropChange()}} is something that should pass reliably given the way the code is designed to work? ... if so please update the jira summary/description to make it clear what the underlying bug is, if not we should go ahead and: delete this test method, reclassify this issue as a "Test" task, and resolve as "DONE". > possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest > refactoring / fixes > > > Key: SOLR-13811 > URL: https://issues.apache.org/jira/browse/SOLR-13811 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > Attachments: apache_Lucene-Solr-NightlyTests-8.x_221.log.txt, > hoss_local_failure_after_refactoring.log.txt > > > I've noticed a pattern of failure behavior in jenkins runs of > {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass > {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which > indicates either: > # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a > situation where the current impl of {{NodeLostTrigger}} isn't smart enough to > handle > # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't. > The test failure is currently somewhat finicky to reproduce, and depends on a > node being stoped, restarted, and stopped again – while an affected > collection is changed from {{autoAddReplicas=false}} to > {{autoAddReplicas=true}} before the second "stop" > Regardless of which of the 2 above is true: the test itself is somewhat > convoluted. It creates a sequence of events (some randomized, some static) > and asserting specific outcomes after each – but the timing of scheduled > triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a > random node to shutdown" with a subsequent "explicitly shut down node2" (even > if it was the node randomly shut down earlier) is confusing. > I'm creating this issue to track two tightly dependent objectives: > # refactoring this test to: > ** better isolate the specific things it's trying to test in individual test > methods. > ** have a singular test method that triggers the specific sequence of events > that is currently problematic (ideally in such a way that it reliably fails). > # AwaitsFix this new test method until someone with a better understand of > the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is > faulty or the code being tested is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
[ https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943010#comment-16943010 ] ASF subversion and git services commented on SOLR-13811: Commit 18bf61504fbd9d8becff1a572642b4207dc7d54c in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=18bf615 ] SOLR-13811: Refactor AutoAddReplicasIntegrationTest to isolate problematic situation into an AwaitsFix test method (cherry picked from commit a57ec148e52507104fdf0f99381d2b485fa846fc) > possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest > refactoring / fixes > > > Key: SOLR-13811 > URL: https://issues.apache.org/jira/browse/SOLR-13811 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > > I've noticed a pattern of failure behavior in jenkins runs of > {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass > {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which > indicates either: > # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a > situation where the current impl of {{NodeLostTrigger}} isn't smart enough to > handle > # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't. > The test failure is currently somewhat finicky to reproduce, and depends on a > node being stoped, restarted, and stopped again – while an affected > collection is changed from {{autoAddReplicas=false}} to > {{autoAddReplicas=true}} before the second "stop" > Regardless of which of the 2 above is true: the test itself is somewhat > convoluted. It creates a sequence of events (some randomized, some static) > and asserting specific outcomes after each – but the timing of scheduled > triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a > random node to shutdown" with a subsequent "explicitly shut down node2" (even > if it was the node randomly shut down earlier) is confusing. > I'm creating this issue to track two tightly dependent objectives: > # refactoring this test to: > ** better isolate the specific things it's trying to test in individual test > methods. > ** have a singular test method that triggers the specific sequence of events > that is currently problematic (ideally in such a way that it reliably fails). > # AwaitsFix this new test method until someone with a better understand of > the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is > faulty or the code being tested is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cbuescher commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection
cbuescher commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330672796 ## File path: lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java ## @@ -283,17 +299,25 @@ public int compare(Pair o1, Pair o2) { * * If a filter is applied, the queue size is increased by * half the number of live documents. + * + * If the collector can reject documents upon collecting, the queue size is + * increased by half the number of live documents again. + * * * The maximum queue size is {@link #MAX_TOP_N_QUEUE_SIZE} */ - private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled) { + private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled, Review comment: I considered this as well first, but then moved to the two independent flags. Will revert that part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on issue #300: SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas Patch)
cpoerschke commented on issue #300: SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas Patch) URL: https://github.com/apache/lucene-solr/pull/300#issuecomment-537594398 Thanks @diegoceccarelli for the updates and for splitting the unrelated `maxScore` change out into a separate PR! I've just opened https://issues.apache.org/jira/browse/SOLR-13812 for the also unrelated (though admittedly smaller and less controversional) `SolrTestCaseJ4` change and with added test coverage for that too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage
[ https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-13812: --- Status: Patch Available (was: Open) > SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test > coverage > > > Key: SOLR-13812 > URL: https://issues.apache.org/jira/browse/SOLR-13812 > Project: Solr > Issue Type: Test >Reporter: Diego Ceccarelli >Assignee: Christine Poerschke >Priority: Minor > Attachments: SOLR-13812.patch > > > In > https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d > on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 > [~diegoceccarelli] proposes to add javadocs and uneven length parameter > rejection for the {{SolrTestCaseJ4.params(String...)}} method. > This ticket proposes to do that plus to also add basic test coverage for the > method, separately from the unrelated SOLR-11831 changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage
[ https://issues.apache.org/jira/browse/SOLR-13812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-13812: --- Attachment: SOLR-13812.patch > SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test > coverage > > > Key: SOLR-13812 > URL: https://issues.apache.org/jira/browse/SOLR-13812 > Project: Solr > Issue Type: Test >Reporter: Diego Ceccarelli >Assignee: Christine Poerschke >Priority: Minor > Attachments: SOLR-13812.patch > > > In > https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d > on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 > [~diegoceccarelli] proposes to add javadocs and uneven length parameter > rejection for the {{SolrTestCaseJ4.params(String...)}} method. > This ticket proposes to do that plus to also add basic test coverage for the > method, separately from the unrelated SOLR-11831 changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-13812) SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage
Christine Poerschke created SOLR-13812: -- Summary: SolrTestCaseJ4.params(String...) javadocs, uneven rejection, basic test coverage Key: SOLR-13812 URL: https://issues.apache.org/jira/browse/SOLR-13812 Project: Solr Issue Type: Test Reporter: Diego Ceccarelli Assignee: Christine Poerschke In https://github.com/apache/lucene-solr/commit/4fedd7bd77219223cb09a660a3e2ce0e89c26eea#diff-21d4224105244d0fb50fe7e586a8495d on https://github.com/apache/lucene-solr/pull/300 for SOLR-11831 [~diegoceccarelli] proposes to add javadocs and uneven length parameter rejection for the {{SolrTestCaseJ4.params(String...)}} method. This ticket proposes to do that plus to also add basic test coverage for the method, separately from the unrelated SOLR-11831 changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
[ https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942993#comment-16942993 ] ASF subversion and git services commented on SOLR-13811: Commit a57ec148e52507104fdf0f99381d2b485fa846fc in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a57ec14 ] SOLR-13811: Refactor AutoAddReplicasIntegrationTest to isolate problematic situation into an AwaitsFix test method > possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest > refactoring / fixes > > > Key: SOLR-13811 > URL: https://issues.apache.org/jira/browse/SOLR-13811 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > > I've noticed a pattern of failure behavior in jenkins runs of > {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass > {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which > indicates either: > # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a > situation where the current impl of {{NodeLostTrigger}} isn't smart enough to > handle > # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't. > The test failure is currently somewhat finicky to reproduce, and depends on a > node being stoped, restarted, and stopped again – while an affected > collection is changed from {{autoAddReplicas=false}} to > {{autoAddReplicas=true}} before the second "stop" > Regardless of which of the 2 above is true: the test itself is somewhat > convoluted. It creates a sequence of events (some randomized, some static) > and asserting specific outcomes after each – but the timing of scheduled > triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a > random node to shutdown" with a subsequent "explicitly shut down node2" (even > if it was the node randomly shut down earlier) is confusing. > I'm creating this issue to track two tightly dependent objectives: > # refactoring this test to: > ** better isolate the specific things it's trying to test in individual test > methods. > ** have a singular test method that triggers the specific sequence of events > that is currently problematic (ideally in such a way that it reliably fails). > # AwaitsFix this new test method until someone with a better understand of > the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is > faulty or the code being tested is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942991#comment-16942991 ] ASF subversion and git services commented on SOLR-13101: Commit 8a34ce0257cd48ad2c65a94ace2d9d3e8d102f60 in lucene-solr's branch refs/heads/jira/SOLR-13101 from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8a34ce0 ] SOLR-13101: fix test compilation > Shared storage support in SolrCloud > --- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Yonik Seeley >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down >- durability not linked to number of replcias.. a single replica could be > common for write workloads >- could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log >- don't pay for what you don't need >- a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] yonik merged pull request #917: SOLR-13101: fix test compilation
yonik merged pull request #917: SOLR-13101: fix test compilation URL: https://github.com/apache/lucene-solr/pull/917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection
jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330667847 ## File path: lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestPrefixCompletionQuery.java ## @@ -253,6 +263,126 @@ public void testDocFiltering() throws Exception { iw.close(); } + /** + * Test that the correct amount of documents are collected if using a collector that also rejects documents. + */ + public void testCollectorThatRejects() throws Exception { +// use synonym analyzer to have multiple paths to same suggested document. This mock adds "dog" as synonym for "dogs" +Analyzer analyzer = new MockSynonymAnalyzer(); +RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwcWithSuggestField(analyzer, "suggest_field")); +List expectedResults = new ArrayList(); + +for (int docCount = 10; docCount > 0; docCount--) { + Document document = new Document(); + String value = "ab" + docCount + " dogs"; + document.add(new SuggestField("suggest_field", value, docCount)); + expectedResults.add(new Entry(value, docCount)); + iw.addDocument(document); +} + +if (rarely()) { + iw.commit(); +} + +DirectoryReader reader = iw.getReader(); +SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader); + +PrefixCompletionQuery query = new PrefixCompletionQuery(analyzer, new Term("suggest_field", "ab")); +int topN = 5; + +// use a TopSuggestDocsCollector that rejects results with duplicate docIds +TopSuggestDocsCollector collector = new TopSuggestDocsCollector(topN, false) { + + private Set seenDocIds = new HashSet<>(); + + @Override + public boolean collect(int docID, CharSequence key, CharSequence context, float score) throws IOException { + int globalDocId = docID + docBase; + boolean collected = false; + if (seenDocIds.contains(globalDocId) == false) { + super.collect(docID, key, context, score); + seenDocIds.add(globalDocId); + collected = true; + } + return collected; + } + + @Override + protected boolean canReject() { +return true; + } +}; + +indexSearcher.suggest(query, collector); +TopSuggestDocs suggestions = collector.get(); +assertSuggestions(suggestions, expectedResults.subList(0, topN).toArray(new Entry[0])); +assertTrue(suggestions.isComplete()); + +reader.close(); +iw.close(); + } + + /** + * A large scale tests where the collector rejects based on docIds + */ + public void testCollectorWithManyRejects() throws Exception { +Analyzer analyzer = new MockAnalyzer(random()); +RandomIndexWriter iw = new RandomIndexWriter(random(), dir, iwcWithSuggestField(analyzer, "suggest_field")); +Set acceptedDocs = new HashSet<>(); +List expectedResults = new ArrayList(); + +for (int docCount = 0; docCount < 1; docCount++) { + Document document = new Document(); + String value = "ab" + RandomStrings.randomAsciiAlphanumOfLength(random(), 10) +"_" + docCount; + document.add(new SuggestField("suggest_field", value, docCount)); + if (random().nextDouble() > 0.75) { Review comment: the maximum queue size is `5000` so we should ensure that we don't reject more than this number if we want to ensure that the search is complete. If you change the live docs to contain at least `5000` docs, this test should work fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection
jimczi commented on a change in pull request #913: LUCENE-8995: TopSuggestDocsCollector#collect should be able to signal rejection URL: https://github.com/apache/lucene-solr/pull/913#discussion_r330659818 ## File path: lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java ## @@ -283,17 +299,25 @@ public int compare(Pair o1, Pair o2) { * * If a filter is applied, the queue size is increased by * half the number of live documents. + * + * If the collector can reject documents upon collecting, the queue size is + * increased by half the number of live documents again. + * * * The maximum queue size is {@link #MAX_TOP_N_QUEUE_SIZE} */ - private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled) { + private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled, Review comment: I am not sure we need to differentiate the case where there is a filter and when the collector can reject. It's the same thing, we don't know the number of rejections beforehand so just adding `(numDocs/2)` once should be enough. So we can maybe just merge the two boolean and applies the heuristic once ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-13811) possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
Chris M. Hostetter created SOLR-13811: - Summary: possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes Key: SOLR-13811 URL: https://issues.apache.org/jira/browse/SOLR-13811 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter I've noticed a pattern of failure behavior in jenkins runs of {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which indicates either: # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a situation where the current impl of {{NodeLostTrigger}} isn't smart enough to handle # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't. The test failure is currently somewhat finicky to reproduce, and depends on a node being stoped, restarted, and stopped again – while an affected collection is changed from {{autoAddReplicas=false}} to {{autoAddReplicas=true}} before the second "stop" Regardless of which of the 2 above is true: the test itself is somewhat convoluted. It creates a sequence of events (some randomized, some static) and asserting specific outcomes after each – but the timing of scheduled triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a random node to shutdown" with a subsequent "explicitly shut down node2" (even if it was the node randomly shut down earlier) is confusing. I'm creating this issue to track two tightly dependent objectives: # refactoring this test to: ** better isolate the specific things it's trying to test in individual test methods. ** have a singular test method that triggers the specific sequence of events that is currently problematic (ideally in such a way that it reliably fails). # AwaitsFix this new test method until someone with a better understand of the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is faulty or the code being tested is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] yonik opened a new pull request #917: SOLR-13101: fix test compilation
yonik opened a new pull request #917: SOLR-13101: fix test compilation URL: https://github.com/apache/lucene-solr/pull/917 Looks like the merge when the original PR was put up broke test compilation. Here's the simple fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539
thomaswoeckinger commented on issue #665: Fixes SOLR-13539 URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537570032 Pushed again without ignored binary test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger edited a comment on issue #665: Fixes SOLR-13539
thomaswoeckinger edited a comment on issue #665: Fixes SOLR-13539 URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537560022 > Hi Thomas. I'd really like to get this (#665) in for 8.3, but right now it's bundled to your change in #883. I can't merge this without #883. > > And I haven't had the time I need to understand how the different pieces of #883 (response formats, field types, binary content) fit together and what the best design is there. > > So what I'm going to do is try to unbundle the two PRs, or at least flip the ordering. I'll take what you've uploaded to #665 here and either comment out or remove entirely the tests that hit this binary-XML issue. They can be added back in #883, once binary-xml works for these field types. > > There might still be time to get #883 in, as I have a chunk of time now that I didn't before. But even if we don't get to it for 8.3, it won't prevent #665 from getting in. No problem i can comment out the binary test, rebase and push again. Just a second! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thomaswoeckinger commented on issue #665: Fixes SOLR-13539
thomaswoeckinger commented on issue #665: Fixes SOLR-13539 URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537560022 > Hi Thomas. I'd really like to get this (#665) in for 8.3, but right now it's bundled to your change in #883. I can't merge this without #883. > > And I haven't had the time I need to understand how the different pieces of #883 (response formats, field types, binary content) fit together and what the best design is there. > > So what I'm going to do is try to unbundle the two PRs, or at least flip the ordering. I'll take what you've uploaded to #665 here and either comment out or remove entirely the tests that hit this binary-XML issue. They can be added back in #883, once binary-xml works for these field types. > > There might still be time to get #883 in, as I have a chunk of time now that I didn't before. But even if we don't get to it for 8.3, it won't prevent #665 from getting in. No problem i can comment out the binary test, rebase and push again. Just a second! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija commented on issue #665: Fixes SOLR-13539
gerlowskija commented on issue #665: Fixes SOLR-13539 URL: https://github.com/apache/lucene-solr/pull/665#issuecomment-537556502 Hi Thomas. I'd really like to get this (#665) in for 8.3, but right now it's bundled to your change in #883. I can't merge this without #883. And I haven't had the time I need to understand how the different pieces of #883 (response formats, field types, binary content) fit together and what the best design is there. So what I'm going to do is try to unbundle the two PRs, or at least flip the ordering. I'll take what you've uploaded to #665 here and either comment out or remove entirely the tests that hit this binary-XML issue. They can be added back in #883, once binary-xml works for these field types. There might still be time to get #883 in, as I have a chunk of time now that I didn't before. But even if we don't get to it for 8.3, it won't prevent #665 from getting in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching
[ https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942908#comment-16942908 ] Andrzej Bialecki commented on SOLR-13790: - Oh, and until the staleness issue is fixed I would recommend using only {{ExactStatsCache}} - other implementations can only make matters worse, both in terms of memory use and scoring inaccuracies. > LRUStatsCache size explosion and ineffective caching > > > Key: SOLR-13790 > URL: https://issues.apache.org/jira/browse/SOLR-13790 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2, 8.3 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13790.patch, SOLR-13790.patch > > > On a sizeable cluster with multi-shard multi-replica collections, when > {{LRUStatsCache}} was in use we encountered excessive memory usage, which > consequently led to severe performance problems. > On a closer examination of the heapdumps it became apparent that when > {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of > {{FastLRUCache}} using the passed {{shard}} argument - however, the value of > this argument is not a simple shard name but instead it's a randomly ordered > list of ALL replica URLs for this shard. > As a result, due to the combinatoric number of possible keys, over time the > map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries... > The fix seems to be simply to extract the shard name and cache using this > name instead of the full string value of the {{shard}} parameter. Existing > unit tests also need much improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13790) LRUStatsCache size explosion and ineffective caching
[ https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942904#comment-16942904 ] Andrzej Bialecki commented on SOLR-13790: - Upon further examination it looks like {{ExactSharedStatsCache}} and {{LRUStatsCache}} have a problem with staleness - they don't track updates in the shards so they have no way of knowing when to refresh the stats. As a result the global stats may be even more wrong than if we used just local stats - imagine a scenario where there's a heavy indexing activity that adds a lot of terms and postings. In this scenario local stats from the local shard would reflect this growth, albeit partially, but the global stats that are stale would not. Another issue is with the purported optimization in {{LRUStatsCache}} and {{ExactSharedStatsCache}} - the claimed advantage of these caches is that they help to avoid unnecessary fetching of stats from shards. Only they don't ... as explained in my previous comment, both of these implementations always send ShardRequest-s to fetch the stats, thus adding one more round-trip to every query. Since the stats are fetched on every request at least there was no problem with the staleness ;) but the "caching" aspect was completely false - per-shard stats were being fetched on every request, and on every request new global stats would be built and send out. I plan to address these issues separately, the current patch is already large. Updated patch with the following additional changes: * the biggest change is that now StatsCache instances are tied to SolrIndexSearcher and its life-cycle and not to SolrCore - this helps to at least mitigate the problem of staleness and also the problem of unbound memory consumption of {{ExactSharedStatsCache}}. The downside is that after every commit the cache needs to be re-populated. * more optimization and safety in StatsUtil serialization code * fixed a bug in {{DebugComponent}} where only local stats would be used for explanations - this threw me off for a while, as I relied on explanations to explain the details of scoring :) * added more substance to SolrCloud unit tests All tests are passing. If there are no objections I'd like to commit this shortly. > LRUStatsCache size explosion and ineffective caching > > > Key: SOLR-13790 > URL: https://issues.apache.org/jira/browse/SOLR-13790 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2, 8.3 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13790.patch, SOLR-13790.patch > > > On a sizeable cluster with multi-shard multi-replica collections, when > {{LRUStatsCache}} was in use we encountered excessive memory usage, which > consequently led to severe performance problems. > On a closer examination of the heapdumps it became apparent that when > {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of > {{FastLRUCache}} using the passed {{shard}} argument - however, the value of > this argument is not a simple shard name but instead it's a randomly ordered > list of ALL replica URLs for this shard. > As a result, due to the combinatoric number of possible keys, over time the > map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries... > The fix seems to be simply to extract the shard name and cache using this > name instead of the full string value of the {{shard}} parameter. Existing > unit tests also need much improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13764) Parse Interval Query from JSON API
[ https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942895#comment-16942895 ] Mikhail Khludnev commented on SOLR-13764: - I'm sorry. Need to switch to something different. > Parse Interval Query from JSON API > -- > > Key: SOLR-13764 > URL: https://issues.apache.org/jira/browse/SOLR-13764 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Minor > > h2. Context > Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy > man's Spans/Phrases. Note: It's not about ranges nor facets. > h2. Problem > There's no way to search by IntervalQuery via JSON Query DSL. > h2. Suggestion > * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie > one can combine a few such refs in {{json.query.bool}} > * It accepts just a name of JSON params, nothing like this happens yet. > * This param carries plain json which is accessible via {{req.getJSON()}} > please examine > https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON > for syntax proposal. > h2. Challenges > * I have no idea about particular JSON DSL for these queries, Lucene API > seems like easy JSON-able. Proposals are welcome. > * Another awkward things is combining analysis and low level query API. eg > what if one request term for one word and analysis yield two tokens, and vice > versa requesting phrase might end up with single token stream. > * Putting json into Jira ticket description > h2. Q: Why don't.. > .. put intervals DSL right into {{json.query}}, avoiding these odd param > refs? > A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined > for handling old good http parametrized queires. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API
[ https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-13764: Fix Version/s: (was: 8.3) > Parse Interval Query from JSON API > -- > > Key: SOLR-13764 > URL: https://issues.apache.org/jira/browse/SOLR-13764 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Minor > > h2. Context > Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy > man's Spans/Phrases. Note: It's not about ranges nor facets. > h2. Problem > There's no way to search by IntervalQuery via JSON Query DSL. > h2. Suggestion > * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie > one can combine a few such refs in {{json.query.bool}} > * It accepts just a name of JSON params, nothing like this happens yet. > * This param carries plain json which is accessible via {{req.getJSON()}} > please examine > https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON > for syntax proposal. > h2. Challenges > * I have no idea about particular JSON DSL for these queries, Lucene API > seems like easy JSON-able. Proposals are welcome. > * Another awkward things is combining analysis and low level query API. eg > what if one request term for one word and analysis yield two tokens, and vice > versa requesting phrase might end up with single token stream. > * Putting json into Jira ticket description > h2. Q: Why don't.. > .. put intervals DSL right into {{json.query}}, avoiding these odd param > refs? > A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined > for handling old good http parametrized queires. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13764) Parse Interval Query from JSON API
[ https://issues.apache.org/jira/browse/SOLR-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-13764: Priority: Minor (was: Blocker) > Parse Interval Query from JSON API > -- > > Key: SOLR-13764 > URL: https://issues.apache.org/jira/browse/SOLR-13764 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Mikhail Khludnev >Priority: Minor > Fix For: 8.3 > > > h2. Context > Lucene has Intervals query LUCENE-8196. Note: these are a kind of healthy > man's Spans/Phrases. Note: It's not about ranges nor facets. > h2. Problem > There's no way to search by IntervalQuery via JSON Query DSL. > h2. Suggestion > * Create classic QParser \{{ {!interval df=text_content}a_json_param}}, ie > one can combine a few such refs in {{json.query.bool}} > * It accepts just a name of JSON params, nothing like this happens yet. > * This param carries plain json which is accessible via {{req.getJSON()}} > please examine > https://cwiki.apache.org/confluence/display/SOLR/SOLR-13764+Discussion+-+Interval+Queries+in+JSON > for syntax proposal. > h2. Challenges > * I have no idea about particular JSON DSL for these queries, Lucene API > seems like easy JSON-able. Proposals are welcome. > * Another awkward things is combining analysis and low level query API. eg > what if one request term for one word and analysis yield two tokens, and vice > versa requesting phrase might end up with single token stream. > * Putting json into Jira ticket description > h2. Q: Why don't.. > .. put intervals DSL right into {{json.query}}, avoiding these odd param > refs? > A: It requires heavy lifting for {{JsonQueryConverter}} which is streamlined > for handling old good http parametrized queires. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942890#comment-16942890 ] Yonik Seeley edited comment on SOLR-13101 at 10/2/19 3:10 PM: -- bq. How far do you think is it complete? Do you forsee a lot of more work going in here? Or, do you suggest we start reviewing it and attempt to merge it soon (in a week or so?). I think it's got a bit more to go. It would be nice if the behavior matched normal solr semantics a little closer... would be easier to get better test coverage by reusing existing tests and changing the replica type. Some things off the top of my head: - a commit doesn't cause latest changes to be visible on replicas (a query on a non-leader replica actually causes an async pull from blob of the latest index) - there are currently some concurrency issues with index pushing - I *think* one still needs to specify a commit to get a push to blob... this needs to be implicit (commit=true,openSearcher=false) for data durability by default I need to dig into the code in general more... as you can see from the commits on the branch, this work was all done by my colleagues, not me. But we're working on encouraging more open development! was (Author: ysee...@gmail.com): bq. How far do you think is it complete? Do you forsee a lot of more work going in here? Or, do you suggest we start reviewing it and attempt to merge it soon (in a week or so?). I think it's got a bit more to go. It would be nice if the behavior matched normal solr semantics a little closer... would be easier to get better test coverage by reusing existing tests and changing the replica type. Some things off the top of my head: - a commit doesn't cause latest changes to be visible on replicas (a query on a non-leader replica actually causes an async pull from blob of the latest index) - there are currently some concurrency issues with index pushing - I I need to dig into the code in general more... as you can see from the commits on the branch, this work was all done by my colleagues, not me. But we're working on encouraging more open development! > Shared storage support in SolrCloud > --- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Yonik Seeley >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down >- durability not linked to number of replcias.. a single replica could be > common for write workloads >- could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log >- don't pay for what you don't need >- a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942890#comment-16942890 ] Yonik Seeley commented on SOLR-13101: - bq. How far do you think is it complete? Do you forsee a lot of more work going in here? Or, do you suggest we start reviewing it and attempt to merge it soon (in a week or so?). I think it's got a bit more to go. It would be nice if the behavior matched normal solr semantics a little closer... would be easier to get better test coverage by reusing existing tests and changing the replica type. Some things off the top of my head: - a commit doesn't cause latest changes to be visible on replicas (a query on a non-leader replica actually causes an async pull from blob of the latest index) - there are currently some concurrency issues with index pushing - I I need to dig into the code in general more... as you can see from the commits on the branch, this work was all done by my colleagues, not me. But we're working on encouraging more open development! > Shared storage support in SolrCloud > --- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Yonik Seeley >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down >- durability not linked to number of replcias.. a single replica could be > common for write workloads >- could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log >- don't pay for what you don't need >- a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation
magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-537538822 Here's a thought: what if we provided a boolean configuration option like `assumeExternalUnicodeNormalization`. Many of these transforms work on NFD input, and produce NFC output, but they generally are configured defensively (not assuming input to be NFD, and not assuming that output will be externally converted to NFC). This is understandable, but results in the odd situation (for example) that an analysis component like "ICUTransformFilter(Cyrillic-Latin)" would have NFC output, but _only_ for characters whose input representation matched the top-level Cyrillic-Latin filter (which is pretty restrictive). Input characters that didn't match the top-level filter would be untouched by any component of the underlying CompoundTransliterator. So if you want fully unicode-normalized output (and in the context of an analysis chain, most do), you have to separately apply post-transform NFD normalization anyway. At best, for this ends up doing some redundant work; but for the performance case we're considering here, there are particular implications. NFC, as a trailing transformation step, is both _very_ common and _very_ active -- active in the sense that it will in many common contexts block output waiting for combining diacritics for literally almost every character. If we know we're externally applying unicode normalization over the entire output, skipping baked-in post-NFC for every transform component avoids redundant work, but more importantly avoids a common case that's virtually guaranteed to result in a substantial amount of partial transliteration, rollback, etc. I think this can be done relatively cleanly using Transliterator getElements(), toRules(false), and createFromRules(...). I'd be curious to know what you think, @msokolov, and perhaps @rmuir? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13790) LRUStatsCache size explosion and ineffective caching
[ https://issues.apache.org/jira/browse/SOLR-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-13790: Attachment: SOLR-13790.patch > LRUStatsCache size explosion and ineffective caching > > > Key: SOLR-13790 > URL: https://issues.apache.org/jira/browse/SOLR-13790 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.7.2, 8.2, 8.3 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Critical > Fix For: 7.7.3, 8.3 > > Attachments: SOLR-13790.patch, SOLR-13790.patch > > > On a sizeable cluster with multi-shard multi-replica collections, when > {{LRUStatsCache}} was in use we encountered excessive memory usage, which > consequently led to severe performance problems. > On a closer examination of the heapdumps it became apparent that when > {{LRUStatsCache.addToPerShardTermStats}} is called it creates instances of > {{FastLRUCache}} using the passed {{shard}} argument - however, the value of > this argument is not a simple shard name but instead it's a randomly ordered > list of ALL replica URLs for this shard. > As a result, due to the combinatoric number of possible keys, over time the > map in {{LRUStatsCache.perShardTemStats}} grew to contain ~2 mln entries... > The fix seems to be simply to extract the shard name and cache using this > name instead of the full string value of the {{shard}} parameter. Existing > unit tests also need much improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on issue #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#issuecomment-537530015 @jpountz Updated, please see and let me know your thoughts This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942871#comment-16942871 ] ASF subversion and git services commented on SOLR-13105: Commit 97c516b9ba805717033e20ba7deaee5006cb5bce in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97c516b ] SOLR-13105: Update regression header toc 3 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942869#comment-16942869 ] ASF subversion and git services commented on SOLR-13105: Commit 4ed2ac4a9f94079494d388e44165e1d8ce9d511a in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4ed2ac4 ] SOLR-13105: Update regression header toc 2 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942868#comment-16942868 ] ASF subversion and git services commented on SOLR-13105: Commit c9455b27be50a4bf04b50ad597cd0a412743be33 in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c9455b2 ] SOLR-13105: Update regression header toc 1 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942866#comment-16942866 ] ASF subversion and git services commented on SOLR-13105: Commit f1d6b5efc946163fa1e554387a5ced4767e41332 in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f1d6b5e ] SOLR-13105: Add simulations header toc 1 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942862#comment-16942862 ] ASF subversion and git services commented on SOLR-13105: Commit a0878f9325b8557c533d621e4cfeb09ea245891d in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a0878f9 ] SOLR-13105: Add simulations header toc > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330584691 ## File path: lucene/core/src/java/org/apache/lucene/search/QueryCache.java ## @@ -33,4 +35,10 @@ */ Weight doCache(Weight weight, QueryCachingPolicy policy); + /** + * Same as above, but allows passing in an Executor to perform caching + * asynchronously + */ + Weight doCache(Weight weight, QueryCachingPolicy policy, Executor executor); Review comment: We cannot remove this constructor since it is defined in `QueryCache` -- made it directly delegate to the new constructor This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330582970 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -88,13 +93,36 @@ * @lucene.experimental */ public class LRUQueryCache implements QueryCache, Accountable { + /** Act as key for the inflight queries map */ + private static class MapKey { +private final Query query; +private final IndexReader.CacheKey cacheKey; + +public MapKey(Query query, IndexReader.CacheKey cacheKey) { + this.query = query; + this.cacheKey = cacheKey; +} + +public Query getQuery() { + return query; +} + +public IndexReader.CacheKey getCacheKey() { + return cacheKey; +} + } private final int maxSize; private final long maxRamBytesUsed; private final Predicate leavesToCache; // maps queries that are contained in the cache to a singleton so that this // cache does not store several copies of the same query private final Map uniqueQueries; + // Marks the inflight queries that are being asynchronously loaded into the cache + // This is used to ensure that multiple threads do not trigger loading + // of the same query in the same cache. We use a set because it is an invariant that + // the entries of this data structure be unique. + private final Map inFlightAsyncLoadQueries = new HashMap<>(); Review comment: Moved to set This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330581040 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -389,6 +431,7 @@ public void clear() { cache.clear(); // Note that this also clears the uniqueQueries map since mostRecentlyUsedQueries is the uniqueQueries.keySet view: mostRecentlyUsedQueries.clear(); + inFlightAsyncLoadQueries.clear(); Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330580984 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -368,10 +400,20 @@ public void clearQuery(Query query) { onEviction(singleton); } } finally { + removeQuery(query); Review comment: Removed, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2 URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330579248 ## File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/utils/Config.java ## @@ -403,15 +404,15 @@ public String getColsValuesForValsByRound(int roundNum) { return ""; } StringBuilder sb = new StringBuilder(); -for (final String name : colForValByRound.keySet()) { - String colName = colForValByRound.get(name); +for (final Map.Entry entry : colForValByRound.entrySet()) { + String colName = entry.getValue(); String template = " " + colName; if (roundNum < 0) { // just append blanks sb.append(Format.formatPaddLeft("-", template)); } else { // append actual values, for that round -Object a = valByRound.get(name); +Object a = valByRound.get(entry.getKey()); Review comment: can you extract a variable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2 URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330578921 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/query/QueryAutoStopWordAnalyzer.java ## @@ -200,10 +200,10 @@ protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComp */ public Term[] getStopWords() { List allStopWords = new ArrayList<>(); -for (String fieldName : stopWordsPerField.keySet()) { - Set stopWords = stopWordsPerField.get(fieldName); +for (Map.Entry> entry : stopWordsPerField.entrySet()) { + Set stopWords = entry.getValue(); for (String text : stopWords) { -allStopWords.add(new Term(fieldName, text)); +allStopWords.add(new Term(entry.getKey(), text)); Review comment: can you extract a variable to improve readability? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2
jpountz commented on a change in pull request #881: LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - part 2 URL: https://github.com/apache/lucene-solr/pull/881#discussion_r330579524 ## File path: lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java ## @@ -506,10 +506,10 @@ public Query rewrite(IndexReader reader) throws IOException { @Override public void visit(QueryVisitor visitor) { -for (BooleanClause.Occur occur : clauseSets.keySet()) { - if (clauseSets.get(occur).size() > 0) { -QueryVisitor v = visitor.getSubVisitor(occur, this); -for (Query q : clauseSets.get(occur)) { +for (Map.Entry> entry : clauseSets.entrySet()) { + if (entry.getValue().size() > 0) { +QueryVisitor v = visitor.getSubVisitor(entry.getKey(), this); +for (Query q : entry.getValue()) { Review comment: can you extract variables? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330579304 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -813,8 +918,23 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + boolean cacheSynchronously = executor == null; + // If asynchronous caching is requested, perform the same and return + // the uncached iterator + if (cacheSynchronously == false) { +cacheSynchronously = cacheAsynchronously(context, cacheHelper); + +// If async caching failed, we will perform synchronous caching +// hence do not return the uncached value here +if (cacheSynchronously == false) { Review comment: Not necessarily -- cacheAsynchronously() might have failed, in which case it will return a true and this code path will trigger a synchronous caching This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330578984 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +821,24 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + boolean cacheSynchronously = executor == null; + + // If asynchronous caching is requested, perform the same and return + // the uncached iterator + if (cacheSynchronously == false) { +cacheSynchronously = cacheAsynchronously(context, cacheHelper); + +// If async caching failed, synchronous caching will +// be performed, hence do not return the uncached value +if (cacheSynchronously == false) { + return in.scorerSupplier(context); +} + } + + if (cacheSynchronously) { Review comment: We need this to be checked even after async caching since async caching might have failed, in which case we will have to perform a synchronous caching This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330577294 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -88,13 +93,36 @@ * @lucene.experimental */ public class LRUQueryCache implements QueryCache, Accountable { + /** Act as key for the inflight queries map */ + private static class MapKey { +private final Query query; +private final IndexReader.CacheKey cacheKey; + +public MapKey(Query query, IndexReader.CacheKey cacheKey) { + this.query = query; + this.cacheKey = cacheKey; +} + +public Query getQuery() { + return query; +} + +public IndexReader.CacheKey getCacheKey() { + return cacheKey; +} + } Review comment: Oops, dont know how it did not make it in this commit, let me check that right away This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330575572 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -88,13 +93,36 @@ * @lucene.experimental */ public class LRUQueryCache implements QueryCache, Accountable { + /** Act as key for the inflight queries map */ + private static class MapKey { +private final Query query; +private final IndexReader.CacheKey cacheKey; + +public MapKey(Query query, IndexReader.CacheKey cacheKey) { + this.query = query; + this.cacheKey = cacheKey; +} + +public Query getQuery() { + return query; +} + +public IndexReader.CacheKey getCacheKey() { + return cacheKey; +} + } private final int maxSize; private final long maxRamBytesUsed; private final Predicate leavesToCache; // maps queries that are contained in the cache to a singleton so that this // cache does not store several copies of the same query private final Map uniqueQueries; + // Marks the inflight queries that are being asynchronously loaded into the cache + // This is used to ensure that multiple threads do not trigger loading + // of the same query in the same cache. We use a set because it is an invariant that + // the entries of this data structure be unique. + private final Map inFlightAsyncLoadQueries = new HashMap<>(); Review comment: I originally used a Set, but moved it to a map specifically to enable tests for the double caching case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942851#comment-16942851 ] ASF subversion and git services commented on SOLR-13105: Commit 9060aee4d8e8cd4b14846dc9990d650e390fdb09 in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9060aee ] SOLR-13105: Update machine learning docs 11 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330550479 ## File path: lucene/core/src/java/org/apache/lucene/search/QueryCache.java ## @@ -33,4 +35,10 @@ */ Weight doCache(Weight weight, QueryCachingPolicy policy); + /** + * Same as above, but allows passing in an Executor to perform caching + * asynchronously + */ + Weight doCache(Weight weight, QueryCachingPolicy policy, Executor executor); Review comment: Let's remove the other doCache and only have this one, with a `null` executor signaling that things should get cached in the current thread? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330554876 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -88,13 +93,36 @@ * @lucene.experimental */ public class LRUQueryCache implements QueryCache, Accountable { + /** Act as key for the inflight queries map */ + private static class MapKey { +private final Query query; +private final IndexReader.CacheKey cacheKey; + +public MapKey(Query query, IndexReader.CacheKey cacheKey) { + this.query = query; + this.cacheKey = cacheKey; +} + +public Query getQuery() { + return query; +} + +public IndexReader.CacheKey getCacheKey() { + return cacheKey; +} + } Review comment: We need equals/hashcode, or this will never prevent the caching of the same query multiple times. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330564331 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -832,5 +952,47 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, ScoreMode.COMPLETE_NO_SCORES, disi)); } +// Perform a cache load asynchronously +// @return true if synchronous caching is needed, false otherwise +private boolean cacheAsynchronously(LeafReaderContext context, IndexReader.CacheHelper cacheHelper) { + /* + * If the current query is already being asynchronously cached, + * do not trigger another cache operation + */ + Object returnValue = inFlightAsyncLoadQueries.putIfAbsent(new MapKey(in.getQuery(), + cacheHelper.getKey()), cacheHelper.getKey()); + + assert returnValue == null || returnValue == cacheHelper.getKey(); + + if (returnValue != null) { +return false; + } + + FutureTask task = new FutureTask<>(() -> { +DocIdSet localDocIdSet = cache(context); +putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper); + +// Remove the key from inflight -- the key is loaded now +Object retValue = inFlightAsyncLoadQueries.remove(new MapKey(in.getQuery(), cacheHelper.getKey())); + +// The query should have been present in the inflight queries set before +// we actually loaded it -- hence the removal of the key should be successful +assert retValue != null; + +if (countDownLatch != null) { + countDownLatch.countDown(); +} + +return null; + }); + try { +executor.execute(task); + } catch (RejectedExecutionException e) { +// Trigger synchronous caching +return true; + } Review comment: Same here, we need to remove from inFlightAsyncLoadQueries on every code path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330553473 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -389,6 +431,7 @@ public void clear() { cache.clear(); // Note that this also clears the uniqueQueries map since mostRecentlyUsedQueries is the uniqueQueries.keySet view: mostRecentlyUsedQueries.clear(); + inFlightAsyncLoadQueries.clear(); Review comment: same here, I don't think it's correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330546874 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -813,8 +918,23 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + boolean cacheSynchronously = executor == null; + // If asynchronous caching is requested, perform the same and return + // the uncached iterator + if (cacheSynchronously == false) { +cacheSynchronously = cacheAsynchronously(context, cacheHelper); + +// If async caching failed, we will perform synchronous caching +// hence do not return the uncached value here +if (cacheSynchronously == false) { Review comment: cacheSynchronously is necessarily false already? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330572346 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -448,13 +491,48 @@ void assertConsistent() { } } + // pkg-private for testing + void setCountDownLatch(CountDownLatch latch) { Review comment: do you think we could avoid setting a latch here, and maybe instead calling countDown from a subclass' onDocIdSetCache? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330552818 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -368,10 +400,20 @@ public void clearQuery(Query query) { onEviction(singleton); } } finally { + removeQuery(query); Review comment: I don't think this is correct? The fact that we are removing entries for a query doesn't cancel the loading of cache entries? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330549857 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -832,5 +952,47 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, ScoreMode.COMPLETE_NO_SCORES, disi)); } +// Perform a cache load asynchronously +// @return true if synchronous caching is needed, false otherwise +private boolean cacheAsynchronously(LeafReaderContext context, IndexReader.CacheHelper cacheHelper) { + /* + * If the current query is already being asynchronously cached, + * do not trigger another cache operation + */ + Object returnValue = inFlightAsyncLoadQueries.putIfAbsent(new MapKey(in.getQuery(), + cacheHelper.getKey()), cacheHelper.getKey()); + + assert returnValue == null || returnValue == cacheHelper.getKey(); + + if (returnValue != null) { +return false; + } + + FutureTask task = new FutureTask<>(() -> { +DocIdSet localDocIdSet = cache(context); +putIfAbsent(in.getQuery(), localDocIdSet, cacheHelper); + +// Remove the key from inflight -- the key is loaded now +Object retValue = inFlightAsyncLoadQueries.remove(new MapKey(in.getQuery(), cacheHelper.getKey())); Review comment: We should probably put it in a finally block to make sure it runs even in case of exceptions in above calls. Otherwise we'd have a memory leak. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330546325 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +821,24 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + boolean cacheSynchronously = executor == null; + + // If asynchronous caching is requested, perform the same and return + // the uncached iterator + if (cacheSynchronously == false) { +cacheSynchronously = cacheAsynchronously(context, cacheHelper); + +// If async caching failed, synchronous caching will +// be performed, hence do not return the uncached value +if (cacheSynchronously == false) { + return in.scorerSupplier(context); +} + } + + if (cacheSynchronously) { Review comment: make it an `else`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r330545241 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -88,13 +93,36 @@ * @lucene.experimental */ public class LRUQueryCache implements QueryCache, Accountable { + /** Act as key for the inflight queries map */ + private static class MapKey { +private final Query query; +private final IndexReader.CacheKey cacheKey; + +public MapKey(Query query, IndexReader.CacheKey cacheKey) { + this.query = query; + this.cacheKey = cacheKey; +} + +public Query getQuery() { + return query; +} + +public IndexReader.CacheKey getCacheKey() { + return cacheKey; +} + } private final int maxSize; private final long maxRamBytesUsed; private final Predicate leavesToCache; // maps queries that are contained in the cache to a singleton so that this // cache does not store several copies of the same query private final Map uniqueQueries; + // Marks the inflight queries that are being asynchronously loaded into the cache + // This is used to ensure that multiple threads do not trigger loading + // of the same query in the same cache. We use a set because it is an invariant that + // the entries of this data structure be unique. + private final Map inFlightAsyncLoadQueries = new HashMap<>(); Review comment: use a set instead? I see you added a couple of assertions on return values, but they don't seem to add more value than what we could get with a set? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz closed pull request #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jpountz closed pull request #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jpountz commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-537483307 Closing now that @dsmiley merged this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942791#comment-16942791 ] ASF subversion and git services commented on SOLR-13105: Commit 4481e5ba9f94f1182cc228653d722bc09689a7df in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4481e5b ] SOLR-13105: Update machine learning docs 10 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org