[jira] [Updated] (SOLR-13511) For SearchHandler, expose "new ResponseBuilder()" to allow override
[ https://issues.apache.org/jira/browse/SOLR-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-13511: - Attachment: SOLR-13511.patch > For SearchHandler, expose "new ResponseBuilder()" to allow override > --- > > Key: SOLR-13511 > URL: https://issues.apache.org/jira/browse/SOLR-13511 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Ramsey Haddad >Priority: Trivial > Labels: easyfix > Attachments: SOLR-13511.patch > > > This change is all we want upstream. To use this from our plugins, we intend: > Extend ResponseBuilder to have additional state (and we think others might > want to as well). > Use an extended SearchHandler that simply creates our ResponseBuilder instead > of the standard one. > We also extend QueryComponent to do our extra behavior if it sees our > Response builder instead of the standard one. > We then change config to use our Search Handler for requestHandler with > name="/select" and our QueryComponent for searchComponent with name="query". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13511) For SearchHandler, expose "new ResponseBuilder()" to allow override
Ramsey Haddad created SOLR-13511: Summary: For SearchHandler, expose "new ResponseBuilder()" to allow override Key: SOLR-13511 URL: https://issues.apache.org/jira/browse/SOLR-13511 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: search Reporter: Ramsey Haddad This change is all we want upstream. To use this from our plugins, we intend: Extend ResponseBuilder to have additional state (and we think others might want to as well). Use an extended SearchHandler that simply creates our ResponseBuilder instead of the standard one. We also extend QueryComponent to do our extra behavior if it sees our Response builder instead of the standard one. We then change config to use our Search Handler for requestHandler with name="/select" and our QueryComponent for searchComponent with name="query". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Intervals vs Span guidance
We are building our needed customizations/extensions on Solr/Lucene 7.7 or 8.0 or later. We are unclear on whether/when to use Intervals vs Span. We know that Intervals is still maturing (new functionality in 8.0 and probably on-going for a while?) But what is the overall intention/guidance? "If you need X, then use Spans." "If you need Y, then use Intervals." "After the year 20xy, we expect everyone to be using Intervals." ?? Any opinions valued. Thanks, Ramsey.
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216636#comment-16216636 ] Ramsey Haddad edited comment on SOLR-11179 at 10/24/17 10:10 AM: - OK. And here's a refined patch that also adds {{jstack}} to the windows {{solr.cmd}} file. was (Author: rwhaddad): OK. And here's is refined patch that also adds {{jstack}} to the windows {{solr.cmd}} file. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216636#comment-16216636 ] Ramsey Haddad edited comment on SOLR-11179 at 10/24/17 10:09 AM: - OK. And here's is refined patch that also adds {{jstack}} to the windows {{solr.cmd}} file. was (Author: rwhaddad): OK. And here's is refined patch that also adds "jstack" to the windows solr.cmd file. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-11179: - Attachment: SOLR-11179.patch OK. And here's is refined patch that also adds "jstack" to the windows solr.cmd file. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130055#comment-16130055 ] Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:44 AM: --- Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for {{solr.cmd}} by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}/}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. was (Author: rwhaddad): Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for {{solr.cmd}} by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\} }} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130055#comment-16130055 ] Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:43 AM: --- Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for {{solr.cmd}} by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\} }} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. was (Author: rwhaddad): Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for {{solr.cmd}} by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130055#comment-16130055 ] Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:42 AM: --- Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for solr.cmd by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. was (Author: rwhaddad): Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for solr.cmd by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to stdout. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130055#comment-16130055 ] Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:42 AM: --- Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for {{solr.cmd}} by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. was (Author: rwhaddad): Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for solr.cmd by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to {{stdout}}. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-11179: - Attachment: SOLR-11179.patch Here is a patch that incorporates some of your suggestions. * We don't build/run Solr on Windows. I'd be happy to include changes for solr.cmd by someone in a position to test them. * Yes, I have added a {{-o}} flag as suggested. . * Yes, with the new design, if no output file is specified via {{-o}}, then the output will now go to stdout. * Yes, the {{jstack}} needs to be run on the same box, as do many of the other commands, including the stars of this {{bin/solr}} script: {{start}} and {{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}}, * Yes, documentation added to Solr Ref Guide. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch, SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11179) Ability to dump jstack
[ https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-11179: - Attachment: SOLR-11179.patch Here is the proposed addition. > Ability to dump jstack > -- > > Key: SOLR-11179 > URL: https://issues.apache.org/jira/browse/SOLR-11179 > Project: Solr > Issue Type: New Feature > Components: scripts and tools > Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-11179.patch > > > Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11179) Ability to dump jstack
Ramsey Haddad created SOLR-11179: Summary: Ability to dump jstack Key: SOLR-11179 URL: https://issues.apache.org/jira/browse/SOLR-11179 Project: Solr Issue Type: New Feature Components: scripts and tools Reporter: Ramsey Haddad Priority: Minor Add a "jstack" command to the "bin/solr" script to ease capture of jstacks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: SOLR-10962.patch Here is the Config API patch updated because of the int=>Long change in SOLR-11052 > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, > SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11052) reserveCommitDuration from Integer to Long
[ https://issues.apache.org/jira/browse/SOLR-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-11052: - Attachment: SOLR-11052.patch Small fix. > reserveCommitDuration from Integer to Long > -- > > Key: SOLR-11052 > URL: https://issues.apache.org/jira/browse/SOLR-11052 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Trivial > Attachments: SOLR-11052.patch > > > reserveCommitDuration gets created as a Long and then stored as an Integer. > It is used as a Long and hence get reconverted back from Integer to Long. > Let's just leave it as a Long the whole time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11052) reserveCommitDuration from Integer to Long
[ https://issues.apache.org/jira/browse/SOLR-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-11052: - Security: (was: Public) > reserveCommitDuration from Integer to Long > -- > > Key: SOLR-11052 > URL: https://issues.apache.org/jira/browse/SOLR-11052 > Project: Solr > Issue Type: Improvement > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Trivial > Attachments: SOLR-11052.patch > > > reserveCommitDuration gets created as a Long and then stored as an Integer. > It is used as a Long and hence get reconverted back from Integer to Long. > Let's just leave it as a Long the whole time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11052) reserveCommitDuration from Integer to Long
Ramsey Haddad created SOLR-11052: Summary: reserveCommitDuration from Integer to Long Key: SOLR-11052 URL: https://issues.apache.org/jira/browse/SOLR-11052 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: replication (java) Reporter: Ramsey Haddad Priority: Trivial reserveCommitDuration gets created as a Long and then stored as an Integer. It is used as a Long and hence get reconverted back from Integer to Long. Let's just leave it as a Long the whole time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: SOLR-10962.patch Here is what this looks like using the Config API. We initially tried to convert all the controls for ReplicationHandler, but it became too much of a mess. * partly messy because the args passed onto IndexFetcher can come from two places * partly because of the work to put in backward compatibility warnings So we only changed what we need at the moment. We ended up partitioning the Info structure work between SolrConfig and *Handler in a different way than UpdateHandler, because: * we wanted to still allow the legacy default "00:00:10" behavior * we wanted to keep various ReplicationHandler details local to that class Also, since the internals work in MilliSeconds, we thought it simpler to expose that to the user. > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, > SOLR-10962.patch, SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: SOLR-10962.patch This patch takes [~cpoerschke]'s patch and adds [~hossman]'s suggestion. I will look into [~shalinmangar]'s suggestion within the next week. > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, > SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066745#comment-16066745 ] Ramsey Haddad edited comment on SOLR-10962 at 6/28/17 3:48 PM: --- While I was initially trying to mimic the old structure, I agree that it is better to move to what Christine suggests. Here is the fixed patch. was (Author: rwhaddad): While I was initially trying to mimic the old structure, I agree that is better to move to what Christine suggests. Here is the fixed patch. > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) >Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch, SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: SOLR-10962.patch While I was initially trying to mimic the old structure, I agree that is better to move to what Christine suggests. Here is the fixed patch. > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch, SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: SOLR-10962.patch > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Minor > Attachments: SOLR-10962.patch > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: (was: patch.SOLR-10962) > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Minor > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10962: - Attachment: patch.SOLR-10962 > replicationHandler's reserveCommitDuration configurable in SolrCloud mode > - > > Key: SOLR-10962 > URL: https://issues.apache.org/jira/browse/SOLR-10962 > Project: Solr > Issue Type: New Feature > Components: replication (java) > Reporter: Ramsey Haddad >Priority: Minor > Attachments: patch.SOLR-10962 > > > With SolrCloud mode, when doing replication via IndexFetcher, we occasionally > see the Fetch fail and then get restarted from scratch in cases where an > Index file is deleted after fetch manifest is computed and before the fetch > actually transfers the file. The risk of this happening can be reduced with a > higher value of reserveCommitDuration. However, the current configuration > only allows this value to be adjusted for "master" mode. This change allows > the value to also be changed when using "SolrCloud" mode. > https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode
Ramsey Haddad created SOLR-10962: Summary: replicationHandler's reserveCommitDuration configurable in SolrCloud mode Key: SOLR-10962 URL: https://issues.apache.org/jira/browse/SOLR-10962 Project: Solr Issue Type: New Feature Components: replication (java) Reporter: Ramsey Haddad Priority: Minor With SolrCloud mode, when doing replication via IndexFetcher, we occasionally see the Fetch fail and then get restarted from scratch in cases where an Index file is deleted after fetch manifest is computed and before the fetch actually transfers the file. The risk of this happening can be reduced with a higher value of reserveCommitDuration. However, the current configuration only allows this value to be adjusted for "master" mode. This change allows the value to also be changed when using "SolrCloud" mode. https://lucene.apache.org/solr/guide/6_6/index-replication.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5127) Allow multiple wildcards in hl.fl
[ https://issues.apache.org/jira/browse/SOLR-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924077#comment-15924077 ] Ramsey Haddad commented on SOLR-5127: - This problem still exists in the code. But, the patch is fairly old and might need a minor tweak? Any reason to not have this fix? > Allow multiple wildcards in hl.fl > - > > Key: SOLR-5127 > URL: https://issues.apache.org/jira/browse/SOLR-5127 > Project: Solr > Issue Type: New Feature > Components: highlighter >Affects Versions: 3.6.1, 4.4 >Reporter: Sven-S. Porst > Attachments: highlight-wildcards.patch > > > When a wildcard is present in the hl.fl field, the field is not split up at > commas/spaces into components. As a consequence settings like > hl.fl=*_highlight,*_data do not work. > Splitting the string first and evaluating wildcards on each component > afterwards would be more powerful and consistent with the documentation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10112) Prevent DBQs from getting reordered
[ https://issues.apache.org/jira/browse/SOLR-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882638#comment-15882638 ] Ramsey Haddad commented on SOLR-10112: -- We don't do many DBQs and we only use them to garbage collect stories that are older than 10 days -- so, the type of race problems you are worried about are not relevant to our specific use of DBQs. But, still, I'm curious: do you see "Reordered DBQs detected" messages during regular use? We only see them as a side effect of the replaying operations during a PeerSync. Do you see them outside of PeerSyncs? > Prevent DBQs from getting reordered > --- > > Key: SOLR-10112 > URL: https://issues.apache.org/jira/browse/SOLR-10112 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > > Reordered DBQs are problematic for various reasons. We might be able to > prevent DBQs from getting re-ordered by making sure, at the leader, that all > updates before a DBQ have been written successfully on the replicas, and > block all updates after the DBQ until the DBQ is written successfully at the > replicas. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visibility
[ https://issues.apache.org/jira/browse/SOLR-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10173: - Summary: Enable extension/customization of HttpShardHandler by increasing visibility (was: Enable extension/customization of HttpShardHandler by increasing visability) > Enable extension/customization of HttpShardHandler by increasing visibility > --- > > Key: SOLR-10173 > URL: https://issues.apache.org/jira/browse/SOLR-10173 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-10173.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Increase visibility of 2 elements of HttpShardHandlerFactory from "private" > to "protected" to facilitate extension of the class. Make > ReplicaListTransformer "public" to enable implementation of the interface in > custom classes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visability
[ https://issues.apache.org/jira/browse/SOLR-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-10173: - Attachment: solr-10173.patch > Enable extension/customization of HttpShardHandler by increasing visability > --- > > Key: SOLR-10173 > URL: https://issues.apache.org/jira/browse/SOLR-10173 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-10173.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Increase visibility of 2 elements of HttpShardHandlerFactory from "private" > to "protected" to facilitate extension of the class. Make > ReplicaListTransformer "public" to enable implementation of the interface in > custom classes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visability
Ramsey Haddad created SOLR-10173: Summary: Enable extension/customization of HttpShardHandler by increasing visability Key: SOLR-10173 URL: https://issues.apache.org/jira/browse/SOLR-10173 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Ramsey Haddad Priority: Minor Increase visibility of 2 elements of HttpShardHandlerFactory from "private" to "protected" to facilitate extension of the class. Make ReplicaListTransformer "public" to enable implementation of the interface in custom classes. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership
[ https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-8760: Description: When we are doing rolling restarts of our Solr servers, we are sometimes hitting painfully long times without a shard leader. What happens is that a new leader is elected, but first needs to fully sync old updates before it assumes the leadership role and accepts new updates. The syncing process is taking unusually long because of an interaction between having one of our hourly garbage collection DBQs in the update logs and the replaying of old ADDs. If there is a single DBQ, and 1000 older ADDs that are getting replayed, then the DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. But, the thing that is easier to fix is that most of the ADDs getting replayed shouldn't need to get replayed in the first place, since they are older than ourLowThreshold. The problem can be fixed by eliminating or by modifying the way that the "completeList" term is used to effect the PeerSync lists. We propose two alternatives to fix this: FixA: Based on my possibly incomplete understanding of PeerSync, the completeList term should be eliminated. If updates older than ourLowThreshold need to replayed, then aren't all the prerequisities for PeerSync violated and hence we should fall back to SnapPull? (My gut suspects that a later bug fix to PeerSync fixed whatever issue completeList was trying to deal with.) FixB: The patch that added the completeList term mentions that it is needed for the replay of some DELETEs. Well, if that is true and we do need to replay some DELETEs older than ourLowThreshold, then there is still no need to replay any ADDs older than ourLowThreshold, right?? was: When we are doing rolling restarts of our Solr servers, we are sometimes hitting painfully long times without a shard leader. What happens is that a new leader is elected, but first needs to fully sync old updates before it assumes the leadership role and accepts new updates. The syncing process is taking unusually long because of an interaction between having one of our hourly garbage collection DBQs in the update logs and the replaying of old ADDs. If there is a single DBQ, and 1000 older ADDs that are getting replayed, then the DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. But, the thing that is easier to fix is that most of the ADDs getting replayed shouldn't need to get replayed in the first place, since they are older than ourLowThreshold. The problem can be fixed by eliminating or by modifying the way that the "completeList" term is used to effect the PeerSync lists. We propose two alternatives to fix this: FixA: Based on my possibly incomplete understanding of PeerSync, the completeList term should be eliminated. If updates older than ourLowThreshold need to replayed, then aren't all the prerequisities for PeerSync violated and hence we should fall back to SnapPull? (My gut suspects that a later bug fix to PeerSync fixed whatever issue completeList was trying to deal with.) FixB: The patch that added the ourLowThreshold term mentions that it is needed for the replay of some DELETEs. Well, if that is true and we do need to replay some DELETEs older than ourLowThreshold, then there is still no need to replay any ADDs older than ourLowThreshold, right?? > PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to > stall new leadership > > > Key: SOLR-8760 > URL: https://issues.apache.org/jira/browse/SOLR-8760 > Project: Solr > Issue Type: Bug >Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch > > > When we are doing rolling restarts of our Solr servers, we are sometimes > hitting painfully long times without a shard leader. What happens is that a > new leader is elected, but first needs to fully sync old updates before it > assumes the leadership role and accepts new updates. The syncing process is > taking unusually long because of an interaction between having one of our > hourly garbage collection DBQs in the update logs and the replaying of old > ADDs. If there is a single DBQ, and 1000 older ADDs that are getting > replayed, then the DBQ is replayed 1000 times, instead of once. This itself > may be hard to fix. But, the thing that is easier to fix is that most of the > ADDs getting replayed shouldn't need to get replayed in the first place, > since they are older than ourLowThreshold. > The problem can be fixed by eliminating or by modifying the way
[jira] [Commented] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership
[ https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171913#comment-15171913 ] Ramsey Haddad commented on SOLR-8760: - More details about the conditions leading up to this problem are in: http://mail-archives.apache.org/mod_mbox/lucene-dev/201602.mbox/%3ccac2x+z3at7ileypotx3xzrp5qysklaatgm-xtjn1a8zpxus...@mail.gmail.com%3E > PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to > stall new leadership > > > Key: SOLR-8760 > URL: https://issues.apache.org/jira/browse/SOLR-8760 > Project: Solr > Issue Type: Bug >Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch > > > When we are doing rolling restarts of our Solr servers, we are sometimes > hitting painfully long times without a shard leader. What happens is that a > new leader is elected, but first needs to fully sync old updates before it > assumes the leadership role and accepts new updates. The syncing process is > taking unusually long because of an interaction between having one of our > hourly garbage collection DBQs in the update logs and the replaying of old > ADDs. If there is a single DBQ, and 1000 older ADDs that are getting > replayed, then the DBQ is replayed 1000 times, instead of once. This itself > may be hard to fix. But, the thing that is easier to fix is that most of the > ADDs getting replayed shouldn't need to get replayed in the first place, > since they are older than ourLowThreshold. > The problem can be fixed by eliminating or by modifying the way that the > "completeList" term is used to effect the PeerSync lists. > We propose two alternatives to fix this: > FixA: Based on my possibly incomplete understanding of PeerSync, the > completeList term should be eliminated. If updates older than ourLowThreshold > need to replayed, then aren't all the prerequisities for PeerSync violated > and hence we should fall back to SnapPull? (My gut suspects that a later bug > fix to PeerSync fixed whatever issue completeList was trying to deal with.) > FixB: The patch that added the ourLowThreshold term mentions that it is > needed for the replay of some DELETEs. Well, if that is true and we do need > to replay some DELETEs older than ourLowThreshold, then there is still no > need to replay any ADDs older than ourLowThreshold, right?? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership
[ https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-8760: Attachment: solr-8760-fixB.patch solr-8760-fixA.patch > PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to > stall new leadership > > > Key: SOLR-8760 > URL: https://issues.apache.org/jira/browse/SOLR-8760 > Project: Solr > Issue Type: Bug > Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch > > > When we are doing rolling restarts of our Solr servers, we are sometimes > hitting painfully long times without a shard leader. What happens is that a > new leader is elected, but first needs to fully sync old updates before it > assumes the leadership role and accepts new updates. The syncing process is > taking unusually long because of an interaction between having one of our > hourly garbage collection DBQs in the update logs and the replaying of old > ADDs. If there is a single DBQ, and 1000 older ADDs that are getting > replayed, then the DBQ is replayed 1000 times, instead of once. This itself > may be hard to fix. But, the thing that is easier to fix is that most of the > ADDs getting replayed shouldn't need to get replayed in the first place, > since they are older than ourLowThreshold. > The problem can be fixed by eliminating or by modifying the way that the > "completeList" term is used to effect the PeerSync lists. > We propose two alternatives to fix this: > FixA: Based on my possibly incomplete understanding of PeerSync, the > completeList term should be eliminated. If updates older than ourLowThreshold > need to replayed, then aren't all the prerequisities for PeerSync violated > and hence we should fall back to SnapPull? (My gut suspects that a later bug > fix to PeerSync fixed whatever issue completeList was trying to deal with.) > FixB: The patch that added the ourLowThreshold term mentions that it is > needed for the replay of some DELETEs. Well, if that is true and we do need > to replay some DELETEs older than ourLowThreshold, then there is still no > need to replay any ADDs older than ourLowThreshold, right?? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership
Ramsey Haddad created SOLR-8760: --- Summary: PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership Key: SOLR-8760 URL: https://issues.apache.org/jira/browse/SOLR-8760 Project: Solr Issue Type: Bug Reporter: Ramsey Haddad Priority: Minor When we are doing rolling restarts of our Solr servers, we are sometimes hitting painfully long times without a shard leader. What happens is that a new leader is elected, but first needs to fully sync old updates before it assumes the leadership role and accepts new updates. The syncing process is taking unusually long because of an interaction between having one of our hourly garbage collection DBQs in the update logs and the replaying of old ADDs. If there is a single DBQ, and 1000 older ADDs that are getting replayed, then the DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. But, the thing that is easier to fix is that most of the ADDs getting replayed shouldn't need to get replayed in the first place, since they are older than ourLowThreshold. The problem can be fixed by eliminating or by modifying the way that the "completeList" term is used to effect the PeerSync lists. We propose two alternatives to fix this: FixA: Based on my possibly incomplete understanding of PeerSync, the completeList term should be eliminated. If updates older than ourLowThreshold need to replayed, then aren't all the prerequisities for PeerSync violated and hence we should fall back to SnapPull? (My gut suspects that a later bug fix to PeerSync fixed whatever issue completeList was trying to deal with.) FixB: The patch that added the ourLowThreshold term mentions that it is needed for the replay of some DELETEs. Well, if that is true and we do need to replay some DELETEs older than ourLowThreshold, then there is still no need to replay any ADDs older than ourLowThreshold, right?? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: PeerSync.java: why "completeList" in handleVersions()?
My co-worker, Christine Poerschke, pointed out that the "completeList" term was added in a change described as "restore old deletes via tlog so peersync won't reorder". If the goal was only the replay of deletes older than ourLowThreshold, then keeping that goal doesn't need to interfere with the performance fix we want. The code could be changed to: if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break; if (completeList && 0 < otherVersion && otherVersion < ourLowThreshold) continue; On Thu, Feb 25, 2016 at 3:24 PM, Ramsey Haddad wrote: > Does "!completeList" do anything necessary in the line: > > if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break; > > I think the line should simply be: > > if (Math.abs(otherVersion) < ourLowThreshold) break; > > - > The inclusion of "!completeList" in this conditional would seem to > only cause some minor performance penalty: replaying a bunch of ADDs > that the syncing replica already has ADDed. > > BUT: in our set-up this is causing a noticeable problem. In > particular, we use a large value of nUpdates and we have an hourly DBQ > for garbage collection. If we do rolling restarts of our replicas, > then the second restart can leave us leaderless for a long span of > time. > > This happens as follows: > * Replica1 is leader. Replica1 goes down. > * Leadership goes to Replica2. It resyncs with all replicas except Replica1. > * Replica1 returns and resyncs. > * Replica2 is leader. Replica2 goes down. > * Leadership goes to Replica3. It resyncs with all replicas except Replica2. > > At this point, Replica1 has a longer updatelog (less trimmed -- more > old updates) than the other replicas. We will refer to these as the > "ancient" updates. > Replica3 does a getVersion from Replica1 and Replica4 and receives > replies from them. The ancient updates will not be contained in > ourUpdateSet. While the ancient updates are older than > ourLowThreshold, the check is skipped because of the "completeList" > term that make no sense to me. So Replica3 replays the ancient ADDs. > Say that 1000 of these ADDs are older than a DBQ in Replica3's update > log? Then the DBQ gets replayed 1000 times ... once after each ADD is > replayed. Fixing the replay mechanism to only replay the DBQ once > looks hard because of the code structure. However, these ADDs (and > hence the DBQ) shouldn't have even been replayed at all! > > After the leader Replica3 is synced. It asks Replica 1 and Replica4 to > sync to it. The ancient ADDs have now been merged back unto Replica3's > update log and so when Replica4 is syncing with Replica3, then > Replica4 also ends up replaying the ancient ADDs and replaying the DBQ > 1000 times. > > Only when all of this finally completes can Replica3 finally perform > its role as leader and accept new updates. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
PeerSync.java: why "completeList" in handleVersions()?
Does "!completeList" do anything necessary in the line: if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break; I think the line should simply be: if (Math.abs(otherVersion) < ourLowThreshold) break; - The inclusion of "!completeList" in this conditional would seem to only cause some minor performance penalty: replaying a bunch of ADDs that the syncing replica already has ADDed. BUT: in our set-up this is causing a noticeable problem. In particular, we use a large value of nUpdates and we have an hourly DBQ for garbage collection. If we do rolling restarts of our replicas, then the second restart can leave us leaderless for a long span of time. This happens as follows: * Replica1 is leader. Replica1 goes down. * Leadership goes to Replica2. It resyncs with all replicas except Replica1. * Replica1 returns and resyncs. * Replica2 is leader. Replica2 goes down. * Leadership goes to Replica3. It resyncs with all replicas except Replica2. At this point, Replica1 has a longer updatelog (less trimmed -- more old updates) than the other replicas. We will refer to these as the "ancient" updates. Replica3 does a getVersion from Replica1 and Replica4 and receives replies from them. The ancient updates will not be contained in ourUpdateSet. While the ancient updates are older than ourLowThreshold, the check is skipped because of the "completeList" term that make no sense to me. So Replica3 replays the ancient ADDs. Say that 1000 of these ADDs are older than a DBQ in Replica3's update log? Then the DBQ gets replayed 1000 times ... once after each ADD is replayed. Fixing the replay mechanism to only replay the DBQ once looks hard because of the code structure. However, these ADDs (and hence the DBQ) shouldn't have even been replayed at all! After the leader Replica3 is synced. It asks Replica 1 and Replica4 to sync to it. The ancient ADDs have now been merged back unto Replica3's update log and so when Replica4 is syncing with Replica3, then Replica4 also ends up replaying the ancient ADDs and replaying the DBQ 1000 times. Only when all of this finally completes can Replica3 finally perform its role as leader and accept new updates. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8656) PeerSync should use same nUpdates everywhere
[ https://issues.apache.org/jira/browse/SOLR-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramsey Haddad updated SOLR-8656: Attachment: solr-8656.patch > PeerSync should use same nUpdates everywhere > > > Key: SOLR-8656 > URL: https://issues.apache.org/jira/browse/SOLR-8656 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: Trunk, 5.4.1 > Reporter: Ramsey Haddad >Priority: Minor > Attachments: solr-8656.patch > > > PeerSync requests information on the most recent nUpdates updates from > another instance to determine whether PeerSync can succeed. The value of > nUpdates can be customized in solrconfig.xml: > UpdateHandler.UpdateLog.NumRecordsToKeep. > PeerSync can be initiated in a number of different paths. One path to start > PeerSync (leader-initiated sync) is incorrectly still using a hard-coded > value of nUpdates=100. > This change fixes leader-initiated-sync code path to also pick up the value > of nUpdates from the customized/configured value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8656) PeerSync should use same nUpdates everywhere
Ramsey Haddad created SOLR-8656: --- Summary: PeerSync should use same nUpdates everywhere Key: SOLR-8656 URL: https://issues.apache.org/jira/browse/SOLR-8656 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.4.1, Trunk Reporter: Ramsey Haddad Priority: Minor PeerSync requests information on the most recent nUpdates updates from another instance to determine whether PeerSync can succeed. The value of nUpdates can be customized in solrconfig.xml: UpdateHandler.UpdateLog.NumRecordsToKeep. PeerSync can be initiated in a number of different paths. One path to start PeerSync (leader-initiated sync) is incorrectly still using a hard-coded value of nUpdates=100. This change fixes leader-initiated-sync code path to also pick up the value of nUpdates from the customized/configured value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org