[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153233#comment-17153233 ] Ishan Chattopadhyaya commented on SOLR-14616: - bq. Ishan Chattopadhyaya I put one thing I found by running "gradlew check" up on the PR, basically there's still a reference in the ref guide "major changes from 5 to 6" that references other pages in the CDCR ref guide. Oops, I'll take a look. Thanks for the review. bq. I also have a repeatable failure on this test, why I haven't a clue I haven't looked at all: ./gradlew :solr:core:test --tests "org.apache.solr.cloud.DistribDocExpirationUpdateProcessorTest" -Ptests.seed=E5E0398C331D4F6B -Ptests.file.encoding=US-ASCII I'll take a look! :-) > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153129#comment-17153129 ] Erick Erickson commented on SOLR-14616: --- I'm also reluctant to jerk something out from under users, but in this case Where's the evidence that CDCR is useful in the field? I've seen a couple of clients manage, with great difficulty, to get it running. Sort of. With lots of custom monitoring in place. With infinite TLOG growth, disk full issues. Index corruption due to disk full. And on and on. Personally, I think we do people a favor by not letting them even try to go down this road in 9.x. If they're desperate for a solution, one option is to stay on 8x until we put something in place in 9.?. If we do. I'm not even sure Solr should have anything built in to even try to maintain cross DC synchronization. If we do leave it in 9.0, we're effectively saying to users "There's this thing called CDCR, good luck. Don't ask us for help 'cause we aren't going to fix anything". Not a pretty message either. What's the "least worst" option here? [~ichattopadhyaya] I put one thing I found by running "gradlew check" up on the PR, basically there's still a reference in the ref guide "major changes from 5 to 6" that references other pages in the CDCR ref guide. I also have a repeatable failure on this test, why I haven't a clue I haven't looked at all: ./gradlew :solr:core:test --tests "org.apache.solr.cloud.DistribDocExpirationUpdateProcessorTest" -Ptests.seed=E5E0398C331D4F6B -Ptests.file.encoding=US-ASCII > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14635) improve ThreadDumpHandler to show more info related to locks
[ https://issues.apache.org/jira/browse/SOLR-14635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153128#comment-17153128 ] Lucene/Solr QA commented on SOLR-14635: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 13s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.handler.TestContainerPlugin | | | solr.util.TestCircuitBreaker | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-14635 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13007231/SOLR-14635.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-108-generic #109-Ubuntu SMP Fri Jun 19 11:33:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / d3f4b21deb0 | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/776/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/776/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/776/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > improve ThreadDumpHandler to show more info related to locks > > > Key: SOLR-14635 > URL: https://issues.apache.org/jira/browse/SOLR-14635 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14635.patch > > > Having recently spent some time trying to use ThreadDumpHandler to diagnose a > "lock leak" i realized there are quite a few bits of info available from the > ThreadMXBean/ThreadInfo datastcutures that are not included in the response, > and i think we should add them: > * switch from {{findMonitorDeadlockedThreads()}} to > {{findDeadlockedThreads()}} to also detect deadlocks from ownable > syncrhonizers (ie: ReintrantLocks) > * for each thread: > ** in addition to outputing the current {{getLockName()}} when a thread is > blocked/waiting, return info about the lock owner when available. > *** there's already dead code checking this and then throwing away the info > ** return the list of all locks (both monitors and ownable synchronizers) > held by each thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9423) Leaking FileChannel in NIOFSDirectory#openInput
[ https://issues.apache.org/jira/browse/LUCENE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-9423: Status: Patch Available (was: Open) > Leaking FileChannel in NIOFSDirectory#openInput > --- > > Key: LUCENE-9423 > URL: https://issues.apache.org/jira/browse/LUCENE-9423 > Project: Lucene - Core > Issue Type: Bug > Components: core/store >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > If we fail to get the > [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] > of a file in the constructor of NIOFSIndexInput, then we will leak a > FileChannel opened in NIOFSDirectory#openInput. This bug is discovered by a > test failure in > [Elasticsearch|https://github.com/elastic/elasticsearch/issues/39585#issuecomment-654995186]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn opened a new pull request #1658: LUCENE-9423: Handle exception in NIOFSDirectory#openInput
dnhatn opened a new pull request #1658: URL: https://github.com/apache/lucene-solr/pull/1658 If we fail to get the size of a file in the constructor of NIOFSIndexInput, then we will leak a FileChannel opened in NIOFSDirectory#openInput. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9423) Leaking FileChannel in NIOFSDirectory#openInput
[ https://issues.apache.org/jira/browse/LUCENE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-9423: Component/s: core/store Affects Version/s: 8.7 master (9.0) > Leaking FileChannel in NIOFSDirectory#openInput > --- > > Key: LUCENE-9423 > URL: https://issues.apache.org/jira/browse/LUCENE-9423 > Project: Lucene - Core > Issue Type: Bug > Components: core/store >Affects Versions: master (9.0), 8.7 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > > If we fail to get the > [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] > of a file in the constructor of NIOFSIndexInput, then we will leak a > FileChannel opened in NIOFSDirectory#openInput. This bug is discovered by a > test failure in > [Elasticsearch|https://github.com/elastic/elasticsearch/issues/39585#issuecomment-654995186]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153079#comment-17153079 ] Ishan Chattopadhyaya commented on SOLR-14616: - It is totally broken. I've heard reports of the source cluster going down due to instabilities in target cluster. I'm +1 on removing it even in 8.7. bq. That way we maintain the back-compat I don't think we should hold our breath on having a comparable solution by 9.0. If it lands in 9.0, thats fine. But, CDCR in its current form must not just be discouraged but also disallowed. bq. That will also give us time to come up with alternate options. We have plenty of time to come up with alternatives right now. If such an alternative is built now, that is great. If not, we shouldn't hold back. > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9423) Leaking FileChannel in NIOFSDirectory#openInput
[ https://issues.apache.org/jira/browse/LUCENE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-9423: Description: If we fail to get the [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] of a file in the constructor of NIOFSIndexInput, then we will leak a FileChannel opened in NIOFSDirectory#openInput. This bug is discovered by a test failure in [Elasticsearch|https://github.com/elastic/elasticsearch/issues/39585#issuecomment-654995186]. (was: If we fail to get the [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] of a file in the constructor of NIOFSIndexInput, then we will leak a FileChannel opened in NIOFSDirectory#openInput.) > Leaking FileChannel in NIOFSDirectory#openInput > --- > > Key: LUCENE-9423 > URL: https://issues.apache.org/jira/browse/LUCENE-9423 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > > If we fail to get the > [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] > of a file in the constructor of NIOFSIndexInput, then we will leak a > FileChannel opened in NIOFSDirectory#openInput. This bug is discovered by a > test failure in > [Elasticsearch|https://github.com/elastic/elasticsearch/issues/39585#issuecomment-654995186]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9423) Leaking FileChannel in NIOFSDirectory#openInput
Nhat Nguyen created LUCENE-9423: --- Summary: Leaking FileChannel in NIOFSDirectory#openInput Key: LUCENE-9423 URL: https://issues.apache.org/jira/browse/LUCENE-9423 Project: Lucene - Core Issue Type: Bug Reporter: Nhat Nguyen Assignee: Nhat Nguyen If we fail to get the [size|https://github.com/apache/lucene-solr/blob/82692e76e054d3e6938034e96a4e9632bd9f7a70/lucene/core/src/java/org/apache/lucene/store/NIOFSDirectory.java#L107] of a file in the constructor of NIOFSIndexInput, then we will leak a FileChannel opened in NIOFSDirectory#openInput. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153076#comment-17153076 ] Anshum Gupta commented on SOLR-14616: - We can always remove the code in 9.1. That way we maintain the back-compat, allow people to upgrade to 9.0 without losing any features, and then remove the already deprecated features in the new release. That will also give us time to come up with alternate options. > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153073#comment-17153073 ] Ishan Chattopadhyaya commented on SOLR-14616: - There is plenty of time. I think there will be two more 8.x releases. CDCR is a totally broken solution. Whether or not there exists an alternative or not, we cannot carry the burden of CDCR for another full major release. bq. I've never used this CDCR approach, but saw questions and answers about it in the mailing lists. It is good that we deprecated it now, and now we can just tell people not to use it. > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153063#comment-17153063 ] Anshum Gupta commented on SOLR-14616: - +1 [~tflobbe] I don't think we should rush into removing stuff even if it's broken. There are ways people are currently using it, and this is too short a notice for them to move off and find another way of solving this problem. Whatever alternate we come up with, should be allowed to bake and not release for mainstream consumption too. Removing this, would not allow us to do that. > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14616) Remove CDCR from 9.0
[ https://issues.apache.org/jira/browse/SOLR-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153053#comment-17153053 ] Houston Putman commented on SOLR-14616: --- I agree with Tomas. I think it's very rushed to deprecate something in one of the last minor releases and remove it in the next major version. The point of deprecation is to give people a warning and time to move to other solutions, and I don't think this provides adequate time. Also if we wait to remove until 10, we can hopefully have a replacement solution in place before removing the functionality. > Remove CDCR from 9.0 > > > Key: SOLR-14616 > URL: https://issues.apache.org/jira/browse/SOLR-14616 > Project: Solr > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > This was deprecated in SOLR-14022 and should be removed in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9280) Add ability to skip non-competitive documents on field sort
[ https://issues.apache.org/jira/browse/LUCENE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova resolved LUCENE-9280. - Fix Version/s: master (9.0) Resolution: Fixed > Add ability to skip non-competitive documents on field sort > > > Key: LUCENE-9280 > URL: https://issues.apache.org/jira/browse/LUCENE-9280 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Priority: Minor > Fix For: master (9.0) > > Time Spent: 18h 20m > Remaining Estimate: 0h > > Today collectors, once they collect enough docs, can instruct scorers to > update their iterators to skip non-competitive documents. This is applicable > only for a case when we need top docs by _score. > It would be nice to also have an ability to skip non-competitive docs when we > need top docs sorted by other fields different from _score. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9280) Add ability to skip non-competitive documents on field sort
[ https://issues.apache.org/jira/browse/LUCENE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153052#comment-17153052 ] Mayya Sharipova commented on LUCENE-9280: - [~mikemccand] Thanks for checking it. I will resolve the issue. We have a separate [issue|https://issues.apache.org/jira/browse/LUCENE-9384] for backporting to 8.x, as it requires introducing a new parameter for SortField. > Add ability to skip non-competitive documents on field sort > > > Key: LUCENE-9280 > URL: https://issues.apache.org/jira/browse/LUCENE-9280 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Priority: Minor > Time Spent: 18h 20m > Remaining Estimate: 0h > > Today collectors, once they collect enough docs, can instruct scorers to > update their iterators to skip non-competitive documents. This is applicable > only for a case when we need top docs by _score. > It would be nice to also have an ability to skip non-competitive docs when we > need top docs sorted by other fields different from _score. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14021) Deprecate HDFS support from 8x
[ https://issues.apache.org/jira/browse/SOLR-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152979#comment-17152979 ] Ishan Chattopadhyaya commented on SOLR-14021: - Thanks for chiming in, [~gezapeti]. So far, [~atris] and you have expressed interest in supporting the efforts towards a packaged version of HDFS support. I offer my full co-operation towards that effort. As a first step, we need to identify the gaps in the current package manager that we need to plug in. It would be great if Cloudera can manage the package going forward. I'm +1 to defer removal from 9.0 until package manager based HDFS support looks possible. > Deprecate HDFS support from 8x > -- > > Key: SOLR-14021 > URL: https://issues.apache.org/jira/browse/SOLR-14021 > Project: Solr > Issue Type: Improvement > Components: hdfs >Reporter: Joel Bernstein >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.6 > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket is to deprecate HDFS support from 8x. > There appears to be growing consensus among committers that it's time to > start removing features so committers can have a manageable system to > maintain. HDFS has come up a number of times as needing to be removed. The > HDFS tests have not been maintained over the years and fail frequently. We > need to start removing features that no one cares about enough to even > maintain the tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14636) Provide a reference implementation for SolrCloud that is stable and fast.
Mark Robert Miller created SOLR-14636: - Summary: Provide a reference implementation for SolrCloud that is stable and fast. Key: SOLR-14636 URL: https://issues.apache.org/jira/browse/SOLR-14636 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Reporter: Mark Robert Miller Assignee: Mark Robert Miller SolrCloud powers critical infrastructure and needs the ability to run quickly with stability. This reference implementation will allow for this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14021) Deprecate HDFS support from 8x
[ https://issues.apache.org/jira/browse/SOLR-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152971#comment-17152971 ] Gézapeti commented on SOLR-14021: - We're at Cloudera are relying heavily on HDFS support. We understand that it's a burden to support and we're happy to help with it. We're not happy about the deprecation, but the amount of dependencies pulled in by HDFS is outrageous. I'd ask not to remove it until there is proper package manger support for the functions it provides or needs. > Deprecate HDFS support from 8x > -- > > Key: SOLR-14021 > URL: https://issues.apache.org/jira/browse/SOLR-14021 > Project: Solr > Issue Type: Improvement > Components: hdfs >Reporter: Joel Bernstein >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.6 > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket is to deprecate HDFS support from 8x. > There appears to be growing consensus among committers that it's time to > start removing features so committers can have a manageable system to > maintain. HDFS has come up a number of times as needing to be removed. The > HDFS tests have not been maintained over the years and fail frequently. We > need to start removing features that no one cares about enough to even > maintain the tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14635) improve ThreadDumpHandler to show more info related to locks
[ https://issues.apache.org/jira/browse/SOLR-14635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14635: -- Attachment: SOLR-14635.patch Assignee: Chris M. Hostetter Status: Open (was: Open) patch with test > improve ThreadDumpHandler to show more info related to locks > > > Key: SOLR-14635 > URL: https://issues.apache.org/jira/browse/SOLR-14635 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14635.patch > > > Having recently spent some time trying to use ThreadDumpHandler to diagnose a > "lock leak" i realized there are quite a few bits of info available from the > ThreadMXBean/ThreadInfo datastcutures that are not included in the response, > and i think we should add them: > * switch from {{findMonitorDeadlockedThreads()}} to > {{findDeadlockedThreads()}} to also detect deadlocks from ownable > syncrhonizers (ie: ReintrantLocks) > * for each thread: > ** in addition to outputing the current {{getLockName()}} when a thread is > blocked/waiting, return info about the lock owner when available. > *** there's already dead code checking this and then throwing away the info > ** return the list of all locks (both monitors and ownable synchronizers) > held by each thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14635) improve ThreadDumpHandler to show more info related to locks
[ https://issues.apache.org/jira/browse/SOLR-14635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14635: -- Status: Patch Available (was: Open) > improve ThreadDumpHandler to show more info related to locks > > > Key: SOLR-14635 > URL: https://issues.apache.org/jira/browse/SOLR-14635 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14635.patch > > > Having recently spent some time trying to use ThreadDumpHandler to diagnose a > "lock leak" i realized there are quite a few bits of info available from the > ThreadMXBean/ThreadInfo datastcutures that are not included in the response, > and i think we should add them: > * switch from {{findMonitorDeadlockedThreads()}} to > {{findDeadlockedThreads()}} to also detect deadlocks from ownable > syncrhonizers (ie: ReintrantLocks) > * for each thread: > ** in addition to outputing the current {{getLockName()}} when a thread is > blocked/waiting, return info about the lock owner when available. > *** there's already dead code checking this and then throwing away the info > ** return the list of all locks (both monitors and ownable synchronizers) > held by each thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14635) improve ThreadDumpHandler to show more info related to locks
Chris M. Hostetter created SOLR-14635: - Summary: improve ThreadDumpHandler to show more info related to locks Key: SOLR-14635 URL: https://issues.apache.org/jira/browse/SOLR-14635 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter Having recently spent some time trying to use ThreadDumpHandler to diagnose a "lock leak" i realized there are quite a few bits of info available from the ThreadMXBean/ThreadInfo datastcutures that are not included in the response, and i think we should add them: * switch from {{findMonitorDeadlockedThreads()}} to {{findDeadlockedThreads()}} to also detect deadlocks from ownable syncrhonizers (ie: ReintrantLocks) * for each thread: ** in addition to outputing the current {{getLockName()}} when a thread is blocked/waiting, return info about the lock owner when available. *** there's already dead code checking this and then throwing away the info ** return the list of all locks (both monitors and ownable synchronizers) held by each thread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness
[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152953#comment-17152953 ] Chris M. Hostetter commented on SOLR-13132: --- bq. I'll try to pull something together that encapsulates these variables in an illustrative way: comparing with filterCache sufficient to hold all terms, filterCache disabled, filterCache thrashing (probably worst-case, and most realistic), and for different cardinality fields and DocSet domains. Yeah .. we don't need to go crazy, but it would be nice to double check that as the branch stands now: * using sweep on a field whose cardinality is much greater then the filterCache size *IS* _faster_ then master. * using sweep on a field whose cardinality is smaller then the filterCache size *IS NOT* _significantly slower_ then master. ** even if it is: that's not a deal breaker, we then just need to *** check that in this case: disabling sweep *IS* _as fast_ as master *** document the trade off of disabling sweep Speaking of docs: i skimmed the state of the branch and the ref-guide edits look out of date from some of the code changes? (mentions {{disable_sweep_collection}} and {{cacheDf}}) ... can you please fix? > Improve JSON "terms" facet performance when sorted by relatedness > -- > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module >Affects Versions: 7.4, master (9.0) >Reporter: Michael Gibney >Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on pull request #1648: SOLR-14616: Remove CDCR from Solr 9.x
ErickErickson commented on pull request #1648: URL: https://github.com/apache/lucene-solr/pull/1648#issuecomment-654988979 Relative link points at dest file that doesn't exist: cross-data-center-replication-cdcr.html#cross-data-center-replication-cdcr ... source: file:/Users/Erick/apache/solrVersions/playspace/solr/solr-ref-guide/build/bare-bones-html/major-changes-from-solr-5-to-solr-6.html Maybe a message there that, instead of linking to the CDCR page that it's been deprecated. And perhaps is not recommended? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10814) Solr RuleBasedAuthorization config doesn't work seamlessly with kerberos authentication
[ https://issues.apache.org/jira/browse/SOLR-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152884#comment-17152884 ] ASF subversion and git services commented on SOLR-10814: Commit d3f4b21deb0056098e9e888a6b9d72e0bf2d0834 in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d3f4b21 ] SOLR-10814 Add short-name feature to RuleBasedAuthz plugin Additional-Author: Hrishikesh Gadre > Solr RuleBasedAuthorization config doesn't work seamlessly with kerberos > authentication > --- > > Key: SOLR-10814 > URL: https://issues.apache.org/jira/browse/SOLR-10814 > Project: Solr > Issue Type: Bug >Affects Versions: 6.2 >Reporter: Hrishikesh Gadre >Priority: Major > Attachments: SOLR-10814.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Solr allows configuring roles to control user access to the system. This is > accomplished through rule-based permission definitions which are assigned to > users. > The authorization framework in Solr passes the information about the request > (to be authorized) using an instance of AuthorizationContext class. Currently > the only way to extract authenticated user is via getUserPrincipal() method > which returns an instance of java.security.Principal class. The > RuleBasedAuthorizationPlugin implementation invokes getName() method on the > Principal instance to fetch the list of associated roles. > https://github.com/apache/lucene-solr/blob/2271e73e763b17f971731f6f69d6ffe46c40b944/solr/core/src/java/org/apache/solr/security/RuleBasedAuthorizationPlugin.java#L156 > In case of basic authentication mechanism, the principal is the userName. > Hence it works fine. But in case of kerberos authentication, the user > principal also contains the RELM information e.g. instead of foo, it would > return f...@example.com. This means if the user changes the authentication > mechanism, he would also need to change the user-role mapping in > authorization section to use f...@example.com instead of foo. This is not > good from usability perspective. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14608) Faster sorting for the /export handler
[ https://issues.apache.org/jira/browse/SOLR-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned SOLR-14608: --- Assignee: Andrzej Bialecki > Faster sorting for the /export handler > -- > > Key: SOLR-14608 > URL: https://issues.apache.org/jira/browse/SOLR-14608 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Joel Bernstein >Assignee: Andrzej Bialecki >Priority: Major > > The largest cost of the export handler is the sorting. This ticket will > implement an improved algorithm for sorting that should greatly increase > overall throughput for the export handler. > *The current algorithm is as follows:* > Collect a bitset of matching docs. Iterate over that bitset and materialize > the top level oridinals for the sort fields in the document and add them to > priority queue of size 3. Then export the top 3 docs, turn off the > bits in the bit set and iterate again until all docs are sorted and sent. > There are two performance bottlenecks with this approach: > 1) Materializing the top level ordinals adds a huge amount of overhead to the > sorting process. > 2) The size of priority queue, 30,000, adds significant overhead to sorting > operations. > *The new algorithm:* > Has a top level *merge sort iterator* that wraps segment level iterators that > perform segment level priority queue sorts. > *Segment level:* > The segment level docset will be iterated and the segment level ordinals for > the sort fields will be materialized and added to a segment level priority > queue. As the segment level iterator pops docs from the priority queue the > top level ordinals for the sort fields are materialized. Because the top > level ordinals are materialized AFTER the sort, they only need to be looked > up when the segment level ordinal changes. This takes advantage of the sort > to limit the lookups into the top level ordinal structures. This also > eliminates redundant lookups of top level ordinals that occur during the > multiple passes over the matching docset. > The segment level priority queues can be kept smaller than 30,000 to improve > performance of the sorting operations because the overall batch size will > still be 30,000 or greater when all the segment priority queue sizes are > added up. This allows for batch sizes much larger then 30,000 without using a > single large priority queue. The increased batch size means fewer iterations > over the matching docset and the decreased priority queue size means faster > sorting operations. > *Top level:* > A top level iterator does a merge sort over the segment level iterators by > comparing the top level ordinals materialized when the segment level docs are > popped from the segment level priority queues. This requires no extra memory > and will be very performant. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9280) Add ability to skip non-competitive documents on field sort
[ https://issues.apache.org/jira/browse/LUCENE-9280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152843#comment-17152843 ] Michael McCandless commented on LUCENE-9280: Can this be resolved? Will we backport to 8.x? > Add ability to skip non-competitive documents on field sort > > > Key: LUCENE-9280 > URL: https://issues.apache.org/jira/browse/LUCENE-9280 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Priority: Minor > Time Spent: 18h 20m > Remaining Estimate: 0h > > Today collectors, once they collect enough docs, can instruct scorers to > update their iterators to skip non-competitive documents. This is applicable > only for a case when we need top docs by _score. > It would be nice to also have an ability to skip non-competitive docs when we > need top docs sorted by other fields different from _score. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1541: RegExp - add case insensitive matching option
jpountz commented on a change in pull request #1541: URL: https://github.com/apache/lucene-solr/pull/1541#discussion_r450944716 ## File path: lucene/core/src/java/org/apache/lucene/search/RegexpQuery.java ## @@ -96,16 +96,46 @@ public RegexpQuery(Term term, int flags, int maxDeterminizedStates) { * Constructs a query for terms matching term. * * @param term regular expression. - * @param flags optional RegExp features from {@link RegExp} + * @param maxDeterminizedStates maximum number of states that compiling the + * @param syntax_flags optional RegExp syntax features from {@link RegExp} + * automaton for the regexp can result in. Set higher to allow more complex + * queries and lower to prevent memory exhaustion. + * @param match_flags boolean 'or' of match behavior options such as case insensitivity + */ + public RegexpQuery(Term term, int maxDeterminizedStates, int syntax_flags, int match_flags) { +this(term, defaultProvider, maxDeterminizedStates, syntax_flags, match_flags); + } + + /** + * Constructs a query for terms matching term. + * + * @param term regular expression. + * @param syntax_flags optional RegExp features from {@link RegExp} * @param provider custom AutomatonProvider for named automata * @param maxDeterminizedStates maximum number of states that compiling the * automaton for the regexp can result in. Set higher to allow more complex * queries and lower to prevent memory exhaustion. */ - public RegexpQuery(Term term, int flags, AutomatonProvider provider, + public RegexpQuery(Term term, int syntax_flags, AutomatonProvider provider, int maxDeterminizedStates) { +this(term, provider, maxDeterminizedStates, syntax_flags, 0); + } + + /** + * Constructs a query for terms matching term. + * + * @param term regular expression. + * @param syntax_flags optional RegExp features from {@link RegExp} + * @param provider custom AutomatonProvider for named automata + * @param maxDeterminizedStates maximum number of states that compiling the + * automaton for the regexp can result in. Set higher to allow more complex + * queries and lower to prevent memory exhaustion. + * @param match_flags boolean 'or' of match behavior options such as case insensitivity + */ + public RegexpQuery(Term term, AutomatonProvider provider, + int maxDeterminizedStates, int syntax_flags, int match_flags) { Review comment: In my opinion, all constructors should always take parameters in the same order. The current longest constructor does `RegexpQuery(Term term, int syntaxFlags, AutomatonProvider provider, int maxDeterminizedStates)`, so I think that this one should be `RegexpQuery(Term term, int syntaxFlags, int matchFlags, AutomatonProvider provider, int maxDeterminizedStates)`. ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java ## @@ -499,10 +508,29 @@ public RegExp(String s) throws IllegalArgumentException { * regular expression */ public RegExp(String s, int syntax_flags) throws IllegalArgumentException { +this(s, syntax_flags, 0); + } + /** + * Constructs new RegExp from a string. + * + * @param s regexp string + * @param syntax_flags boolean 'or' of optional syntax constructs to be + * enabled + * @param match_flags boolean 'or' of match behavior options such as case insensitivity + * @exception IllegalArgumentException if an error occurred while parsing the + * regular expression + */ + public RegExp(String s, int syntax_flags, int match_flags) throws IllegalArgumentException { +// (for BWC reasons we don't validate invalid bits, just trim instead) +syntax_flags = syntax_flags & 0xff; Review comment: I don't think we need to maintain bw compat for this, is there any test that fails if you remove this line? ## File path: lucene/core/src/java/org/apache/lucene/search/RegexpQuery.java ## @@ -96,16 +96,46 @@ public RegexpQuery(Term term, int flags, int maxDeterminizedStates) { * Constructs a query for terms matching term. * * @param term regular expression. - * @param flags optional RegExp features from {@link RegExp} + * @param maxDeterminizedStates maximum number of states that compiling the + * @param syntax_flags optional RegExp syntax features from {@link RegExp} + * automaton for the regexp can result in. Set higher to allow more complex + * queries and lower to prevent memory exhaustion. + * @param match_flags boolean 'or' of match behavior options such as case insensitivity + */ + public RegexpQuery(Term term, int maxDeterminizedStates, int syntax_flags, int match_flags) { Review comment: I'd keep maxDeterminizedStates last so that all constructors take parameters in the same order. This is an
[jira] [Commented] (SOLR-14462) Autoscaling placement wrong with concurrent collection creations
[ https://issues.apache.org/jira/browse/SOLR-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152822#comment-17152822 ] ASF subversion and git services commented on SOLR-14462: Commit e65631e026c5bb5c8eeb6fd1351bf798c0c6985c in lucene-solr's branch refs/heads/branch_8x from Ilan Ginzburg [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e65631e ] SOLR-14462: adjust test so less sessions are used even if test runs slowly. fix synchronization issue. (#1657) cherry picked from 06b1f3e86694b35365fd569a0581b1f6fc2cadb3 > Autoscaling placement wrong with concurrent collection creations > > > Key: SOLR-14462 > URL: https://issues.apache.org/jira/browse/SOLR-14462 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: master (9.0), 8.6 >Reporter: Ilan Ginzburg >Assignee: Ilanm >Priority: Major > Fix For: 8.6 > > Attachments: PolicyHelperNewLogs.txt, policylogs.txt > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Under concurrent collection creation, wrong Autoscaling placement decisions > can lead to severely unbalanced clusters. > Sequential creation of the same collections is handled correctly and the > cluster is balanced. > *TL;DR;* under high load, the way sessions that cache future changes to > Zookeeper are managed cause placement decisions of multiple concurrent > Collection API calls to ignore each other, be based on identical “initial” > cluster state, possibly leading to identical placement decisions and as a > consequence cluster imbalance. > *Some context first* for those less familiar with how Autoscaling deals with > cluster state change: a PolicyHelper.Session is created with a snapshot of > the Zookeeper cluster state and is used to track already decided but not yet > persisted to Zookeeper cluster state changes so that Collection API commands > can make the right placement decisions. > A Collection API command either uses an existing cached Session (that > includes changes computed by previous command(s)) or creates a new Session > initialized from the Zookeeper cluster state (i.e. with only state changes > already persisted). > When a Collection API command requires a Session - and one is needed for any > cluster state update computation - if one exists but is currently in use, the > command can wait up to 10 seconds. If the session becomes available, it is > reused. Otherwise, a new one is created. > The Session lifecycle is as follows: it is created in COMPUTING state by a > Collection API command and is initialized with a snapshot of cluster state > from Zookeeper (does not require a Zookeeper read, this is running on > Overseer that maintains a cache of cluster state). The command has exclusive > access to the Session and can change the state of the Session. When the > command is done changing the Session, the Session is “returned” and its state > changes to EXECUTING while the command continues to run to persist the state > to Zookeeper and interact with the nodes, but no longer interacts with the > Session. Another command can then grab a Session in EXECUTING state, change > its state to COMPUTING to compute new changes taking into account previous > changes. When all commands having used the session have completed their work, > the session is “released” and destroyed (at this stage, Zookeeper contains > all the state changes that were computed using that Session). > The issue arises when multiple Collection API commands are executed at once. > A first Session is created and commands start using it one by one. In a > simple 1 shard 1 replica collection creation test run with 100 parallel > Collection API requests (see debug logs from PolicyHelper in file > policy.logs), this Session update phase (Session in COMPUTING status in > SessionWrapper) takes about 250-300ms (MacBook Pro). > This means that about 40 commands can run by using in turn the same Session > (45 in the sample run). The commands that have been waiting for too long time > out after 10 seconds, more or less all at the same time (at the rate at which > they have been received by the OverseerCollectionMessageHandler, approx one > per 100ms in the sample run) and most/all independently decide to create a > new Session. These new Sessions are based on Zookeeper state, they might or > might not include some of the changes from the first 40 commands (depending > on if these commands got their changes written to Zookeeper by the time of > the 10 seconds timeout, a few might have made it, see below). > These new Sessions (54 sessions in addition to the initial one) are based on > more or less the same state, so all remaining commands are making
[GitHub] [lucene-solr] murblanc merged pull request #1657: SOLR-14462: adjust test so less sessions are used even if test runs s…
murblanc merged pull request #1657: URL: https://github.com/apache/lucene-solr/pull/1657 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc opened a new pull request #1657: SOLR-14462: adjust test so less sessions are used even if test runs s…
murblanc opened a new pull request #1657: URL: https://github.com/apache/lucene-solr/pull/1657 adjust test so less sessions are used even if test runs slowly. fix synchronization issue. cherry picked from 06b1f3e86694b35365fd569a0581b1f6fc2cadb3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14462) Autoscaling placement wrong with concurrent collection creations
[ https://issues.apache.org/jira/browse/SOLR-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152812#comment-17152812 ] ASF subversion and git services commented on SOLR-14462: Commit 06b1f3e86694b35365fd569a0581b1f6fc2cadb3 in lucene-solr's branch refs/heads/master from Ilan Ginzburg [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=06b1f3e ] SOLR-14462: adjust test so less sessions are used even if test runs slowly. fix synchronization issue. (#1656) > Autoscaling placement wrong with concurrent collection creations > > > Key: SOLR-14462 > URL: https://issues.apache.org/jira/browse/SOLR-14462 > Project: Solr > Issue Type: Bug > Components: AutoScaling >Affects Versions: master (9.0), 8.6 >Reporter: Ilan Ginzburg >Assignee: Ilanm >Priority: Major > Fix For: 8.6 > > Attachments: PolicyHelperNewLogs.txt, policylogs.txt > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Under concurrent collection creation, wrong Autoscaling placement decisions > can lead to severely unbalanced clusters. > Sequential creation of the same collections is handled correctly and the > cluster is balanced. > *TL;DR;* under high load, the way sessions that cache future changes to > Zookeeper are managed cause placement decisions of multiple concurrent > Collection API calls to ignore each other, be based on identical “initial” > cluster state, possibly leading to identical placement decisions and as a > consequence cluster imbalance. > *Some context first* for those less familiar with how Autoscaling deals with > cluster state change: a PolicyHelper.Session is created with a snapshot of > the Zookeeper cluster state and is used to track already decided but not yet > persisted to Zookeeper cluster state changes so that Collection API commands > can make the right placement decisions. > A Collection API command either uses an existing cached Session (that > includes changes computed by previous command(s)) or creates a new Session > initialized from the Zookeeper cluster state (i.e. with only state changes > already persisted). > When a Collection API command requires a Session - and one is needed for any > cluster state update computation - if one exists but is currently in use, the > command can wait up to 10 seconds. If the session becomes available, it is > reused. Otherwise, a new one is created. > The Session lifecycle is as follows: it is created in COMPUTING state by a > Collection API command and is initialized with a snapshot of cluster state > from Zookeeper (does not require a Zookeeper read, this is running on > Overseer that maintains a cache of cluster state). The command has exclusive > access to the Session and can change the state of the Session. When the > command is done changing the Session, the Session is “returned” and its state > changes to EXECUTING while the command continues to run to persist the state > to Zookeeper and interact with the nodes, but no longer interacts with the > Session. Another command can then grab a Session in EXECUTING state, change > its state to COMPUTING to compute new changes taking into account previous > changes. When all commands having used the session have completed their work, > the session is “released” and destroyed (at this stage, Zookeeper contains > all the state changes that were computed using that Session). > The issue arises when multiple Collection API commands are executed at once. > A first Session is created and commands start using it one by one. In a > simple 1 shard 1 replica collection creation test run with 100 parallel > Collection API requests (see debug logs from PolicyHelper in file > policy.logs), this Session update phase (Session in COMPUTING status in > SessionWrapper) takes about 250-300ms (MacBook Pro). > This means that about 40 commands can run by using in turn the same Session > (45 in the sample run). The commands that have been waiting for too long time > out after 10 seconds, more or less all at the same time (at the rate at which > they have been received by the OverseerCollectionMessageHandler, approx one > per 100ms in the sample run) and most/all independently decide to create a > new Session. These new Sessions are based on Zookeeper state, they might or > might not include some of the changes from the first 40 commands (depending > on if these commands got their changes written to Zookeeper by the time of > the 10 seconds timeout, a few might have made it, see below). > These new Sessions (54 sessions in addition to the initial one) are based on > more or less the same state, so all remaining commands are making placement > decisions that do not take into account each oth
[GitHub] [lucene-solr] murblanc merged pull request #1656: SOLR-14462: test fix
murblanc merged pull request #1656: URL: https://github.com/apache/lucene-solr/pull/1656 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc opened a new pull request #1656: SOLR-14462: test fix
murblanc opened a new pull request #1656: URL: https://github.com/apache/lucene-solr/pull/1656 adjust test so less sessions are used even if test runs slowly. fix synchronization issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-14634: Priority: Minor (was: Blocker) > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Minor > Fix For: 8.7 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya resolved SOLR-14634. - Fix Version/s: 8.7 Resolution: Fixed > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Blocker > Fix For: 8.7 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14537) Improve performance of ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152798#comment-17152798 ] ASF subversion and git services commented on SOLR-14537: Commit f19057f5e55ba99da8344e1f8175714009e99e1b in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f19057f ] SOLR-14537: Fix inner class visibility, reduce diffs with branch_8x. > Improve performance of ExportWriter > --- > > Key: SOLR-14537 > URL: https://issues.apache.org/jira/browse/SOLR-14537 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.7 > > Time Spent: 20m > Remaining Estimate: 0h > > Retrieving, sorting and writing out documents in {{ExportWriter}} are three > aspects of the /export handler that can be further optimized. > SOLR-14470 introduced some level of caching in {{StringValue}}. Further > options for caching and speedups should be explored. > Currently the sort/retrieve and write operations are done sequentially, but > they could be parallelized, considering that they block on different channels > - the first is index reading & CPU bound, the other is bound by the receiving > end because it uses blocking IO. The sorting and retrieving of values could > be done in parallel with the operation of writing out the current batch of > results. > One possible approach here would be to use "double buffering" where one > buffered batch that is ready (already sorted and retrieved) is being written > out, while the other batch is being prepared in a background thread, and when > both are done the buffers are swapped. This wouldn't complicate the current > code too much but it should instantly give up to 2x higher throughput. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14301) Remove external commons-codec usage in gradle validateJarChecksums
[ https://issues.apache.org/jira/browse/SOLR-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Salamon updated SOLR-14301: -- Component/s: Build > Remove external commons-codec usage in gradle validateJarChecksums > -- > > Key: SOLR-14301 > URL: https://issues.apache.org/jira/browse/SOLR-14301 > Project: Solr > Issue Type: Improvement > Components: Build >Reporter: Andras Salamon >Priority: Minor > Attachments: SOLR-14301-01.patch > > > Right now gradle calculates SHA-1 checksums using an external > {{commons-codec}} library. We can calculate SHA-1 using Java 8 classes, no > need for commons-codec here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-14634: - Assignee: Noble Paul > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152742#comment-17152742 ] ASF subversion and git services commented on SOLR-14634: Commit 5ae0f600afaa2bb435ae6c502fcc646a9a1eb6ca in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ae0f60 ] SOLR-14634: Limit the HTTP security headers to "/solr" end point (#1655) > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152740#comment-17152740 ] ASF subversion and git services commented on SOLR-14634: Commit 5154b6008f54c9d096f5efe9ae347492c23dd780 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5154b60 ] SOLR-14634: Limit the HTTP security headers to "/solr" end point (#1655) > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul merged pull request #1655: SOLR-14634: Limit the HTTP security headers to "/solr" end point
noblepaul merged pull request #1655: URL: https://github.com/apache/lucene-solr/pull/1655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman commented on pull request #1655: SOLR-14634: Limit the HTTP security headers to "/solr" end point
chatman commented on pull request #1655: URL: https://github.com/apache/lucene-solr/pull/1655#issuecomment-654848585 +1, LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152736#comment-17152736 ] Ishan Chattopadhyaya commented on SOLR-14634: - +1, there should be no impact to security as the UI is hosted at /solr/*. > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Priority: Blocker > Time Spent: 10m > Remaining Estimate: 0h > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1655: SOLR-14634: Limit the HTTP security headers to "/solr" end point
noblepaul opened a new pull request #1655: URL: https://github.com/apache/lucene-solr/pull/1655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14634) Limit the HTTP security headers to /solr end point
[ https://issues.apache.org/jira/browse/SOLR-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14634: -- Summary: Limit the HTTP security headers to /solr end point (was: Limit the HTTP security header to /solr end point) > Limit the HTTP security headers to /solr end point > -- > > Key: SOLR-14634 > URL: https://issues.apache.org/jira/browse/SOLR-14634 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6 >Reporter: Noble Paul >Priority: Blocker > > Ideally the CSP headers and other security headers are only required for web > components such as html/js etc. There should be no need to send it out for a > {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. > The problem is our web UI content paths are not easy to differentiate from > other paths. But the v2 APIs do not need to pay that price and that can be > easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14634) Limit the HTTP security header to /solr end point
Noble Paul created SOLR-14634: - Summary: Limit the HTTP security header to /solr end point Key: SOLR-14634 URL: https://issues.apache.org/jira/browse/SOLR-14634 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 8.6 Reporter: Noble Paul Ideally the CSP headers and other security headers are only required for web components such as html/js etc. There should be no need to send it out for a {{json}} or{{ javabin}} response. It is unnecessary data that is being sent. The problem is our web UI content paths are not easy to differentiate from other paths. But the v2 APIs do not need to pay that price and that can be easily achieved by adding a pattern to the rules -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14066) Deprecate DIH and migrate to a community supported package
[ https://issues.apache.org/jira/browse/SOLR-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152650#comment-17152650 ] Krzysztof Debski commented on SOLR-14066: - DIH is deprecated, but I could not find what is the recommended way of importing from DB now. I think you should tell users what they should use instead. I use DIH for: # Populate initial data to Solr. We are adding Solr to existing systems and replacing DB search with Solr search, but we need to send data that is already in DB to Solr. # When we switch between primary and backup data centers, we need to populate Solrs in backup data center with data from DB (we replicate data on DB level). We have a few million entities that we need to load to Solr during 5 minutes window. > Deprecate DIH and migrate to a community supported package > -- > > Key: SOLR-14066 > URL: https://issues.apache.org/jira/browse/SOLR-14066 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.6 > > Attachments: image-2019-12-14-19-58-39-314.png > > Time Spent: 50m > Remaining Estimate: 0h > > DIH doesn't need to remain inside Solr anymore. Plan is to deprecate DIH in > 8.6, remove from 9.0. A community supported version of DIH (which can be used > with Solr's package manager) can be found here > https://github.com/rohitbemax/dataimporthandler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async
[ https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152629#comment-17152629 ] Noble Paul commented on SOLR-14354: --- {quote}HttpShardHandler still send requests in async manner, so no change from caller's view. {quote} I get it. But, what's the point of making an async request and waiting for response if the semantics of the request is synchronous? We are just adding the overhead of async call. {quote} If someplaces want to send its request in sync manner, it should use Http2SolrClient instead. And yes we should fix these places. {quote} Any code that makes a synchronous request should NEVER use the HttpShardHandler. Probably we should tackle it in a separate ticket. Glad to know that we are on the same page > HttpShardHandler send requests in async > --- > > Key: SOLR-14354 > URL: https://issues.apache.org/jira/browse/SOLR-14354 > Project: Solr > Issue Type: Improvement >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: master (9.0), 8.7 > > Attachments: image-2020-03-23-10-04-08-399.png, > image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png > > Time Spent: 4h > Remaining Estimate: 0h > > h2. 1. Current approach (problem) of Solr > Below is the diagram describe the model on how currently handling a request. > !image-2020-03-23-10-04-08-399.png! > The main-thread that handles the search requests, will submit n requests (n > equals to number of shards) to an executor. So each request will correspond > to a thread, after sending a request that thread basically do nothing just > waiting for response from other side. That thread will be swapped out and CPU > will try to handle another thread (this is called context switch, CPU will > save the context of the current thread and switch to another one). When some > data (not all) come back, that thread will be called to parsing these data, > then it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient on using threads.Basically we > want less threads and most of them must busy all the time, because threads > are not free as well as context switching. That is the main idea behind > everything, like executor > h2. 2. Async call of Jetty HttpClient > Jetty HttpClient offers async API like this. > {code:java} > httpClient.newRequest("http://domain.com/path";) > // Add request hooks > .onRequestQueued(request -> { ... }) > .onRequestBegin(request -> { ... }) > // Add response hooks > .onResponseBegin(response -> { ... }) > .onResponseHeaders(response -> { ... }) > .onResponseContent((response, buffer) -> { ... }) > .send(result -> { ... }); {code} > Therefore after calling {{send()}} the thread will return immediately without > any block. Then when the client received the header from other side, it will > call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not > all response) from the data it will call {{onContent(buffer)}} listeners. > When everything finished it will call {{onComplete}} listeners. One main > thing that will must notice here is all listeners should finish quick, if the > listener block, all further data of that request won’t be handled until the > listener finish. > h2. 3. Solution 1: Sending requests async but spin one thread per response > Jetty HttpClient already provides several listeners, one of them is > InputStreamResponseListener. This is how it is get used > {code:java} > InputStreamResponseListener listener = new InputStreamResponseListener(); > client.newRequest(...).send(listener); > // Wait for the response headers to arrive > Response response = listener.get(5, TimeUnit.SECONDS); > if (response.getStatus() == 200) { > // Obtain the input stream on the response content > try (InputStream input = listener.getInputStream()) { > // Read the response content > } > } {code} > In this case, there will be 2 thread > * one thread trying to read the response content from InputStream > * one thread (this is a short-live task) feeding content to above > InputStream whenever some byte[] is available. Note that if this thread > unable to feed data into InputStream, this thread will wait. > By using this one, the model of HttpShardHandler can be written into > something like this > {code:java} > handler.sendReq(req, (is) -> { > executor.submit(() -> > try (is) { > // Read the content from InputStream > } > ) > }) {code} > The first diagram will be changed into this > !image-2020-03-23-10-09-10-221.png! > Notice that although “sending req to shard1” is wide, it won’t take long time > since sending req is a very quick o
[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async
[ https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152611#comment-17152611 ] Cao Manh Dat commented on SOLR-14354: - Hi [~noble.paul] your comment is true but here are few things * In the past HttpShardHandler already send requests in async (spin a new thread and call request in sync manner) to prevent blocking the caller thread. After this commit HttpShardHandler still send requests in async manner, so no change from caller's view. * The difference here is in the past, HttpShardHandler based on the size of {{urls}} input will decide to use LBClient or HttpClient, now we only use LBClient regardless of cases. I think that make things clearer (you can see me comment on the PR for reasons) * I think HttpShardHandler should only be used in the case of sending multiple independence requests. If someplaces want to send its request in sync manner, it should use Http2SolrClient instead. And yes we should fix these places. > HttpShardHandler send requests in async > --- > > Key: SOLR-14354 > URL: https://issues.apache.org/jira/browse/SOLR-14354 > Project: Solr > Issue Type: Improvement >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: master (9.0), 8.7 > > Attachments: image-2020-03-23-10-04-08-399.png, > image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png > > Time Spent: 4h > Remaining Estimate: 0h > > h2. 1. Current approach (problem) of Solr > Below is the diagram describe the model on how currently handling a request. > !image-2020-03-23-10-04-08-399.png! > The main-thread that handles the search requests, will submit n requests (n > equals to number of shards) to an executor. So each request will correspond > to a thread, after sending a request that thread basically do nothing just > waiting for response from other side. That thread will be swapped out and CPU > will try to handle another thread (this is called context switch, CPU will > save the context of the current thread and switch to another one). When some > data (not all) come back, that thread will be called to parsing these data, > then it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient on using threads.Basically we > want less threads and most of them must busy all the time, because threads > are not free as well as context switching. That is the main idea behind > everything, like executor > h2. 2. Async call of Jetty HttpClient > Jetty HttpClient offers async API like this. > {code:java} > httpClient.newRequest("http://domain.com/path";) > // Add request hooks > .onRequestQueued(request -> { ... }) > .onRequestBegin(request -> { ... }) > // Add response hooks > .onResponseBegin(response -> { ... }) > .onResponseHeaders(response -> { ... }) > .onResponseContent((response, buffer) -> { ... }) > .send(result -> { ... }); {code} > Therefore after calling {{send()}} the thread will return immediately without > any block. Then when the client received the header from other side, it will > call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not > all response) from the data it will call {{onContent(buffer)}} listeners. > When everything finished it will call {{onComplete}} listeners. One main > thing that will must notice here is all listeners should finish quick, if the > listener block, all further data of that request won’t be handled until the > listener finish. > h2. 3. Solution 1: Sending requests async but spin one thread per response > Jetty HttpClient already provides several listeners, one of them is > InputStreamResponseListener. This is how it is get used > {code:java} > InputStreamResponseListener listener = new InputStreamResponseListener(); > client.newRequest(...).send(listener); > // Wait for the response headers to arrive > Response response = listener.get(5, TimeUnit.SECONDS); > if (response.getStatus() == 200) { > // Obtain the input stream on the response content > try (InputStream input = listener.getInputStream()) { > // Read the response content > } > } {code} > In this case, there will be 2 thread > * one thread trying to read the response content from InputStream > * one thread (this is a short-live task) feeding content to above > InputStream whenever some byte[] is available. Note that if this thread > unable to feed data into InputStream, this thread will wait. > By using this one, the model of HttpShardHandler can be written into > something like this > {code:java} > handler.sendReq(req, (is) -> { > executor.submit(() -> > try (is) { > // Read the content from InputStream > } > ) >
[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async
[ https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152540#comment-17152540 ] Noble Paul commented on SOLR-14354: --- A lot of requests made by HttpShardHandler are synchronous requests. Why do we need to do them asynchronously? Let's simplify them as synchronous requests > HttpShardHandler send requests in async > --- > > Key: SOLR-14354 > URL: https://issues.apache.org/jira/browse/SOLR-14354 > Project: Solr > Issue Type: Improvement >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > Fix For: master (9.0), 8.7 > > Attachments: image-2020-03-23-10-04-08-399.png, > image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png > > Time Spent: 4h > Remaining Estimate: 0h > > h2. 1. Current approach (problem) of Solr > Below is the diagram describe the model on how currently handling a request. > !image-2020-03-23-10-04-08-399.png! > The main-thread that handles the search requests, will submit n requests (n > equals to number of shards) to an executor. So each request will correspond > to a thread, after sending a request that thread basically do nothing just > waiting for response from other side. That thread will be swapped out and CPU > will try to handle another thread (this is called context switch, CPU will > save the context of the current thread and switch to another one). When some > data (not all) come back, that thread will be called to parsing these data, > then it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient on using threads.Basically we > want less threads and most of them must busy all the time, because threads > are not free as well as context switching. That is the main idea behind > everything, like executor > h2. 2. Async call of Jetty HttpClient > Jetty HttpClient offers async API like this. > {code:java} > httpClient.newRequest("http://domain.com/path";) > // Add request hooks > .onRequestQueued(request -> { ... }) > .onRequestBegin(request -> { ... }) > // Add response hooks > .onResponseBegin(response -> { ... }) > .onResponseHeaders(response -> { ... }) > .onResponseContent((response, buffer) -> { ... }) > .send(result -> { ... }); {code} > Therefore after calling {{send()}} the thread will return immediately without > any block. Then when the client received the header from other side, it will > call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not > all response) from the data it will call {{onContent(buffer)}} listeners. > When everything finished it will call {{onComplete}} listeners. One main > thing that will must notice here is all listeners should finish quick, if the > listener block, all further data of that request won’t be handled until the > listener finish. > h2. 3. Solution 1: Sending requests async but spin one thread per response > Jetty HttpClient already provides several listeners, one of them is > InputStreamResponseListener. This is how it is get used > {code:java} > InputStreamResponseListener listener = new InputStreamResponseListener(); > client.newRequest(...).send(listener); > // Wait for the response headers to arrive > Response response = listener.get(5, TimeUnit.SECONDS); > if (response.getStatus() == 200) { > // Obtain the input stream on the response content > try (InputStream input = listener.getInputStream()) { > // Read the response content > } > } {code} > In this case, there will be 2 thread > * one thread trying to read the response content from InputStream > * one thread (this is a short-live task) feeding content to above > InputStream whenever some byte[] is available. Note that if this thread > unable to feed data into InputStream, this thread will wait. > By using this one, the model of HttpShardHandler can be written into > something like this > {code:java} > handler.sendReq(req, (is) -> { > executor.submit(() -> > try (is) { > // Read the content from InputStream > } > ) > }) {code} > The first diagram will be changed into this > !image-2020-03-23-10-09-10-221.png! > Notice that although “sending req to shard1” is wide, it won’t take long time > since sending req is a very quick operation. With this operation, handling > threads won’t be spin up until first bytes are sent back. Notice that in this > approach we still have active threads waiting for more data from InputStream > h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread. > Jetty have another listener called BufferingResponseListener. This is how it > is get used > {code:java} > client.newRequest(...).send(new BufferingResponseListener() { > publi