[jira] [Comment Edited] (LUCENE-8263) Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more aggressive merging
[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545788#comment-16545788 ] Marc Morissette edited comment on LUCENE-8263 at 7/16/18 10:02 PM: --- {quote}I've gone back and forth on this. Now that optimize and forceMerge respect maxSegmentSize I've been thinking that those operations would suffice for those real-world edge cases. forceMergeDeletes (expungeDeletes) has a maximum percent of deletes allowed per segment for instance that must be between 0 and 100. 0 is roughly equivalent to forceMerge/optimize at this point. And will not create any segments over maxSegmentSizeMB. {quote} I hadn't considered using forceMergeDeletes to address these edge cases but the more I think about it, the more I like it. Consider me convinced. My only remaining concern with forceMergeDeletes as it is currently designed (and if I'm reading the code correctly) is that if enough segments somehow end up having a delete % above forceMergeDeletesPctAllowed, then it is possible for it to use a lot of disk space. Perhaps we could find a way to configure an upper limit on the number of merges that forceMergeDeletes can perform per call? When configured this way, each forceMergeDeletes could only claim a maximum amount of disk space before returning. Repeated calls would be necessary to "clean" an entire index but if each one were accompanied by a soft commit, then the amount of free disk space required to perform the entire operation would be more predictable. was (Author: marc.morissette): {quote}I've gone back and forth on this. Now that optimize and forceMerge respect maxSegmentSize I've been thinking that those operations would suffice for those real-world edge cases. forceMergeDeletes (expungeDeletes) has a maximum percent of deletes allowed per segment for instance that must be between 0 and 100. 0 is roughly equivalent to forceMerge/optimize at this point. And will not create any segments over maxSegmentSizeMB. {quote} I hadn't considered using forceMergeDeletes to address these edge cases but the more I think about it, the more I like it. Consider me convinced. My only remaining concern with forceMergeDeletes as it is currently designed (and if I'm reading the code correctly) is that if enough segments somehow end up having a delete % above forceMergeDeletesPctAllowed, then it is possible for it to use a lot of disk space. Perhaps we could find a way to configure an upper limit on the number of merges that forceMergeDeletes can perform per call? When configured this way, each forceMergeDeletes could only claim a maximum amount of disk space before returning. Repeated calls would be necessary to "clean" an entire index but if each one were accompanied by a soft commit, then the amount of free disk space required to perform the operation would be more predictable. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the
[jira] [Commented] (LUCENE-8263) Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more aggressive merging
[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545788#comment-16545788 ] Marc Morissette commented on LUCENE-8263: - {quote}I've gone back and forth on this. Now that optimize and forceMerge respect maxSegmentSize I've been thinking that those operations would suffice for those real-world edge cases. forceMergeDeletes (expungeDeletes) has a maximum percent of deletes allowed per segment for instance that must be between 0 and 100. 0 is roughly equivalent to forceMerge/optimize at this point. And will not create any segments over maxSegmentSizeMB. {quote} I hadn't considered using forceMergeDeletes to address these edge cases but the more I think about it, the more I like it. Consider me convinced. My only remaining concern with forceMergeDeletes as it is currently designed (and if I'm reading the code correctly) is that if enough segments somehow end up having a delete % above forceMergeDeletesPctAllowed, then it is possible for it to use a lot of disk space. Perhaps we could find a way to configure an upper limit on the number of merges that forceMergeDeletes can perform per call? When configured this way, each forceMergeDeletes could only claim a maximum amount of disk space before returning. Repeated calls would be necessary to "clean" an entire index but if each one were accompanied by a soft commit, then the amount of free disk space required to perform the operation would be more predictable. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the merging to hover > around the value specified for this parameter. > My proposal for <3> would be to have the parameter do double-duty. Assuming > my preliminary results hold, you specify this parameter at, say, 20% and once > the index hits that % deleted docs it hovers right around there, even if > you've forceMerged earlier down to 1 segment. This seems in line with what > I'd expect and adding another parameter seems excessively complicated to no > good purpose. We could always add something like that later if we wanted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8263) Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more aggressive merging
[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544392#comment-16544392 ] Marc Morissette commented on LUCENE-8263: - {quote}the above simulations suggest around 2.1x more merging with 10% of allowed deletes but I wouldn't be surprised that it could be much worse in practice in production under certain conditions.{quote} I understand why you would rather not give users another way to shoot themselves in the foot but I think you may underestimate how diverse and idiosyncratic some use cases can get. There are many real world situations where a setting lower than 20% might be very appropriate * Super large indexes that are not updated often i.e. where size is way more important than IO * Indexes where large documents are updated more often than small documents which skews TieredMergePolicy's estimate of delete% * Query-heavy update-light indexes where update IO is a tiny fraction of query IO Users who will be looking to alter deletesPctAllowed will presumably be doing so because the default is inappropriate for their use case. I feel that 20-50% might be too narrow a range for some significant percentage of these use cases. I think documenting the danger of setting too low a value and letting users do their own experiments is the better course of action. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the merging to hover > around the value specified for this parameter. > My proposal for <3> would be to have the parameter do double-duty. Assuming > my preliminary results hold, you specify this parameter at, say, 20% and once > the index hits that % deleted docs it hovers right around there, even if > you've forceMerged earlier down to 1 segment. This seems in line with what > I'd expect and adding another parameter seems excessively complicated to no > good purpose. We could always add something like that later if we wanted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8263) Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more aggressive merging
[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543616#comment-16543616 ] Marc Morissette edited comment on LUCENE-8263 at 7/13/18 7:37 PM: -- I would like to argue against a 20% floor. Some indexes contain documents of wildly different sizes with the larger documents experiencing much higher turnover. I have seen indexes with around 20% deletions that were more than 2x their optimized size because of this phenomenon. I such situations, deletesPctAllowed around 10-15% would make a lot of sense. I say keep the floor at 10%. Or maybe simply issue a warning instead? was (Author: marc.morissette): I would like to argue against a 20% floor. Some indexes contain documents of wildly different sizes with the larger documents experiencing much higher turnover. I have seen indexes with around 20% deletions that were more than 2x their optimized size because of this phenomenon. I such situations, deletesPctAllowed around 10-15% would make a lot of sense. I say keep the floor at 10%. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the merging to hover > around the value specified for this parameter. > My proposal for <3> would be to have the parameter do double-duty. Assuming > my preliminary results hold, you specify this parameter at, say, 20% and once > the index hits that % deleted docs it hovers right around there, even if > you've forceMerged earlier down to 1 segment. This seems in line with what > I'd expect and adding another parameter seems excessively complicated to no > good purpose. We could always add something like that later if we wanted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8263) Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more aggressive merging
[ https://issues.apache.org/jira/browse/LUCENE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543616#comment-16543616 ] Marc Morissette commented on LUCENE-8263: - I would like to argue against a 20% floor. Some indexes contain documents of wildly different sizes with the larger documents experiencing much higher turnover. I have seen indexes with around 20% deletions that were more than 2x their optimized size because of this phenomenon. I such situations, deletesPctAllowed around 10-15% would make a lot of sense. I say keep the floor at 10%. > Add indexPctDeletedTarget as a parameter to TieredMergePolicy to control more > aggressive merging > > > Key: LUCENE-8263 > URL: https://issues.apache.org/jira/browse/LUCENE-8263 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-8263.patch > > > Spinoff of LUCENE-7976 to keep the two issues separate. > The current TMP allows up to 50% deleted docs, which can be wasteful on large > indexes. This parameter will do more aggressive merging of segments with > deleted documents when the _total_ percentage of deleted docs in the entire > index exceeds it. > Setting this to 50% should approximate current behavior. Setting it to 20% > caused the first cut at this to increase I/O roughly 10%. Setting it to 10% > caused about a 50% increase in I/O. > I was conflating the two issues, so I'll change 7976 and comment out the bits > that reference this new parameter. After it's checked in we can bring this > back. That should be less work than reconstructing this later. > Among the questions to be answered: > 1> what should the default be? I propose 20% as it results in significantly > less space wasted and helps control heap usage for a modest increase in I/O. > 2> what should the floor be? I propose 10% with _strong_ documentation > warnings about not setting it below 20%. > 3> should there be two parameters? I think this was discussed somewhat in > 7976. The first cut at this used this number for two purposes: > 3a> the total percentage of deleted docs index-wide to trip this trigger > 3b> the percentage of an _individual_ segment that had to be deleted if the > segment was over maxSegmentSize/2 bytes in order to be eligible for merging. > Empirically, using the same percentage for both caused the merging to hover > around the value specified for this parameter. > My proposal for <3> would be to have the parameter do double-duty. Assuming > my preliminary results hold, you specify this parameter at, say, 20% and once > the index hits that % deleted docs it hovers right around there, even if > you've forceMerged earlier down to 1 segment. This seems in line with what > I'd expect and adding another parameter seems excessively complicated to no > good purpose. We could always add something like that later if we wanted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12550) ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize
[ https://issues.apache.org/jira/browse/SOLR-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated SOLR-12550: --- Description: We're in a situation where we need to optimize some of our collections. These optimizations are done with waitSearcher=true as a simple throttling mechanism to prevent too many collections from being optimized at once. We're seeing these optimize commands return without error after 10 minutes but well before the end of the operation. Our Solr logs show errors with socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value has no effect. See the links section for my patch. It turns out that ConcurrentUpdateSolrClient delegates commit and optimize commands to a private HttpSolrClient but fails to pass along its builder's timeouts to that client. was: We're in a situation where we need to optimize some of our collections. These optimizations are done with waitSearcher=true as a simple throttling mechanism to prevent too many collections from being optimized at once. We're seeing these optimize commands return without error after 10 minutes but well before the end of the operation. Our Solr logs show errors with socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value has no effect. It turns out that ConcurrentUpdateSolrClient delegates commit and optimize commands to a private HttpSolrClient but fails to pass along its builder's timeouts to that client. Environment: [~elyograg] I am going to assume you didn't see that a patch with a unit test is attached to this bug (It's in the links section. It looks Github has stopped adding comments when a new pull request is detected). Also, maybe I wasn't clear in my description but we don't use ConcurrentUpdateSolrClient in our client code. The issue is in SolrCloud itself where timeouts may occur in the ConcurrentUpdateSolrClient Solr uses to relay commit and optimize commands to its shards. > ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize > > > Key: SOLR-12550 > URL: https://issues.apache.org/jira/browse/SOLR-12550 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: [~elyograg] I am going to assume you didn't see that a > patch with a unit test is attached to this bug (It's in the links section. It > looks Github has stopped adding comments when a new pull request is detected). > Also, maybe I wasn't clear in my description but we don't use > ConcurrentUpdateSolrClient in our client code. The issue is in SolrCloud > itself where timeouts may occur in the ConcurrentUpdateSolrClient Solr uses > to relay commit and optimize commands to its shards. >Reporter: Marc Morissette >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We're in a situation where we need to optimize some of our collections. These > optimizations are done with waitSearcher=true as a simple throttling > mechanism to prevent too many collections from being optimized at once. > We're seeing these optimize commands return without error after 10 minutes > but well before the end of the operation. Our Solr logs show errors with > socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value > has no effect. > See the links section for my patch. > It turns out that ConcurrentUpdateSolrClient delegates commit and optimize > commands to a private HttpSolrClient but fails to pass along its builder's > timeouts to that client. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12550) ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize
[ https://issues.apache.org/jira/browse/SOLR-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated SOLR-12550: --- Environment: (was: [~elyograg] I am going to assume you didn't see that a patch with a unit test is attached to this bug (It's in the links section. It looks Github has stopped adding comments when a new pull request is detected). Also, maybe I wasn't clear in my description but we don't use ConcurrentUpdateSolrClient in our client code. The issue is in SolrCloud itself where timeouts may occur in the ConcurrentUpdateSolrClient Solr uses to relay commit and optimize commands to its shards.) Description: We're in a situation where we need to optimize some of our collections. These optimizations are done with waitSearcher=true as a simple throttling mechanism to prevent too many collections from being optimized at once. We're seeing these optimize commands return without error after 10 minutes but well before the end of the operation. Our Solr logs show errors with socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value has no effect. See the links section for my patch. It turns out that ConcurrentUpdateSolrClient delegates commit and optimize commands to a private HttpSolrClient but fails to pass along its builder's timeouts to that client. A patch is attached in the links section. was: We're in a situation where we need to optimize some of our collections. These optimizations are done with waitSearcher=true as a simple throttling mechanism to prevent too many collections from being optimized at once. We're seeing these optimize commands return without error after 10 minutes but well before the end of the operation. Our Solr logs show errors with socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value has no effect. See the links section for my patch. It turns out that ConcurrentUpdateSolrClient delegates commit and optimize commands to a private HttpSolrClient but fails to pass along its builder's timeouts to that client. [~elyograg] I am going to assume you didn't see that a patch with a unit test is attached to this bug (It's in the links section. It looks Github has stopped adding comments when a new pull request is detected). Also, maybe I wasn't clear in my description but we don't use ConcurrentUpdateSolrClient in our client code. The issue is in SolrCloud itself where timeouts may occur in the ConcurrentUpdateSolrClient Solr uses to relay commit and optimize commands to its shards. > ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize > > > Key: SOLR-12550 > URL: https://issues.apache.org/jira/browse/SOLR-12550 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We're in a situation where we need to optimize some of our collections. These > optimizations are done with waitSearcher=true as a simple throttling > mechanism to prevent too many collections from being optimized at once. > We're seeing these optimize commands return without error after 10 minutes > but well before the end of the operation. Our Solr logs show errors with > socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value > has no effect. > See the links section for my patch. > It turns out that ConcurrentUpdateSolrClient delegates commit and optimize > commands to a private HttpSolrClient but fails to pass along its builder's > timeouts to that client. > A patch is attached in the links section. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12550) ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize
[ https://issues.apache.org/jira/browse/SOLR-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542261#comment-16542261 ] Marc Morissette commented on SOLR-12550: By the way, I have not investigated why the optimize() command returns without an error despite the fact that it did not complete normally. > ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize > > > Key: SOLR-12550 > URL: https://issues.apache.org/jira/browse/SOLR-12550 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We're in a situation where we need to optimize some of our collections. These > optimizations are done with waitSearcher=true as a simple throttling > mechanism to prevent too many collections from being optimized at once. > We're seeing these optimize commands return without error after 10 minutes > but well before the end of the operation. Our Solr logs show errors with > socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value > has no effect. > It turns out that ConcurrentUpdateSolrClient delegates commit and optimize > commands to a private HttpSolrClient but fails to pass along its builder's > timeouts to that client. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12550) ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize
Marc Morissette created SOLR-12550: -- Summary: ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize Key: SOLR-12550 URL: https://issues.apache.org/jira/browse/SOLR-12550 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Marc Morissette We're in a situation where we need to optimize some of our collections. These optimizations are done with waitSearcher=true as a simple throttling mechanism to prevent too many collections from being optimized at once. We're seeing these optimize commands return without error after 10 minutes but well before the end of the operation. Our Solr logs show errors with socketTimeout stack traces. Setting distribUpdateSoTimeout to a higher value has no effect. It turns out that ConcurrentUpdateSolrClient delegates commit and optimize commands to a private HttpSolrClient but fails to pass along its builder's timeouts to that client. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8365) ArrayIndexOutOfBoundsException in UnifiedHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517729#comment-16517729 ] Marc Morissette commented on LUCENE-8365: - The fix is in Github > ArrayIndexOutOfBoundsException in UnifiedHighlighter > > > Key: LUCENE-8365 > URL: https://issues.apache.org/jira/browse/LUCENE-8365 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 7.3.1 >Reporter: Marc Morissette >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter > in our production logs from time to time: > {code} > java.lang.ArrayIndexOutOfBoundsException > at java.base/java.lang.System.arraycopy(Native Method) > at > org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386) > at > org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341) > at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121) > at > org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149) > at > org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171) > at > org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120) > at > org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261) > ... > {code} > It turns out that there is an "off by one" error in the UnifiedHighlighter's > code that, as far as I can tell, is only triggered when two nested > SpanNearQueries contain the same term. > The resulting behaviour depends on the content of the highlighted document. > Either, some highlighted terms go missing or an > ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8365) ArrayIndexOutOfBoundsException in UnifiedHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated LUCENE-8365: Description: We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time: {code} java.lang.ArrayIndexOutOfBoundsException at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386) at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341) at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121) at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149) at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171) at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120) at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261) ... {code} It turns out that there is an "off by one" error in the UnifiedHighlighter's code that, as far as I can tell, is only triggered when two nested SpanNearQueries contain the same term. The resulting behaviour depends on the content of the highlighted document. Either, some highlighted terms go missing or an ArrayIndexOutOfBoundsException is thrown. was: We see an ArrayOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time: {code} java.lang.ArrayIndexOutOfBoundsException at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386) at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341) at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121) at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149) at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171) at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120) at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261) ... {code} It turns out that there is an "off by one" error in UnifiedHighlighter code that, as far as I can tell, is currently only invoked when two nested SpanNearQueries contain the same term. The behaviour depends on the highlighted document. In most cases, some terms will fail to be highlighted. In others, an Exception is thrown. > ArrayIndexOutOfBoundsException in UnifiedHighlighter > > > Key: LUCENE-8365 > URL: https://issues.apache.org/jira/browse/LUCENE-8365 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 7.3.1 >Reporter: Marc Morissette >Priority: Major > > We see ArrayIndexOutOfBoundsExceptions coming out of the UnifiedHighlighter > in our production logs from time to time: > {code} > java.lang.ArrayIndexOutOfBoundsException > at java.base/java.lang.System.arraycopy(Native Method) > at > org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386) > at > org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341) > at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121) > at > org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149) > at > org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171) > at > org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120) > at > org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261) > ... > {code} > It turns out that there is an "off by one" error in the UnifiedHighlighter's > code that, as far as I can tell, is only triggered when two nested > SpanNearQueries contain the same term. > The resulting behaviour depends on the content of the highlighted document. > Either, some highlighted terms go missing or an > ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8365) ArrayIndexOutOfBoundsException in UnifiedHighlighter
Marc Morissette created LUCENE-8365: --- Summary: ArrayIndexOutOfBoundsException in UnifiedHighlighter Key: LUCENE-8365 URL: https://issues.apache.org/jira/browse/LUCENE-8365 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 7.3.1 Reporter: Marc Morissette We see an ArrayOutOfBoundsExceptions coming out of the UnifiedHighlighter in our production logs from time to time: {code} java.lang.ArrayIndexOutOfBoundsException at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.search.uhighlight.PhraseHelper$SpanCollectedOffsetsEnum.add(PhraseHelper.java:386) at org.apache.lucene.search.uhighlight.PhraseHelper$OffsetSpanCollector.collectLeaf(PhraseHelper.java:341) at org.apache.lucene.search.spans.TermSpans.collect(TermSpans.java:121) at org.apache.lucene.search.spans.NearSpansOrdered.collect(NearSpansOrdered.java:149) at org.apache.lucene.search.spans.NearSpansUnordered.collect(NearSpansUnordered.java:171) at org.apache.lucene.search.spans.FilterSpans.collect(FilterSpans.java:120) at org.apache.lucene.search.uhighlight.PhraseHelper.createOffsetsEnumsForSpans(PhraseHelper.java:261) ... {code} It turns out that there is an "off by one" error in UnifiedHighlighter code that, as far as I can tell, is currently only invoked when two nested SpanNearQueries contain the same term. The behaviour depends on the highlighted document. In most cases, some terms will fail to be highlighted. In others, an Exception is thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427982#comment-16427982 ] Marc Morissette commented on LUCENE-7976: - [~erickerickson] Thanks for tackling this. Regarding singleton merges: if I read your code correctly and am right about how Lucene works, I think that, on a large enough collection, your patch could generate ~50% more reads/writes when re-indexing the whole collection: * I think new documents are typically flushed once and merged 2-3 times before ending up in a large segment. * With a 20% delete threshold, old documents would, on average, be singleton merged 4 times before being expunged vs only one merge at a 50% delete threshold. In Latex notation: {code:java} 20% deleted docs threshold: \sum_{n=1}^\infnty (1 - 0.2)^n = (1 / (1 - (1 - 0.2))) - 1 = 4 50% deleted docs threshold: \sum_{n=1}^\infnty (1 - 0.5)^n = (1 / (1 - (1 - 0.5))) - 1 = 1{code} On the odd chance that my math bears any resemblance to reality, I would suggest that you disable singleton merges when the short term deletion rate of a segment is above a certain threshold (say 0.5% per hour). This should prevent performance degradations during heavy re-indexation while maintaining the desired behaviour on seldom updated indexes. > Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of > very large segments > - > > Key: LUCENE-7976 > URL: https://issues.apache.org/jira/browse/LUCENE-7976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-7976.patch, LUCENE-7976.patch > > > We're seeing situations "in the wild" where there are very large indexes (on > disk) handled quite easily in a single Lucene index. This is particularly > true as features like docValues move data into MMapDirectory space. The > current TMP algorithm allows on the order of 50% deleted documents as per a > dev list conversation with Mike McCandless (and his blog here: > https://www.elastic.co/blog/lucenes-handling-of-deleted-documents). > Especially in the current era of very large indexes in aggregate, (think many > TB) solutions like "you need to distribute your collection over more shards" > become very costly. Additionally, the tempting "optimize" button exacerbates > the issue since once you form, say, a 100G segment (by > optimizing/forceMerging) it is not eligible for merging until 97.5G of the > docs in it are deleted (current default 5G max segment size). > The proposal here would be to add a new parameter to TMP, something like > (no, that's not serious name, suggestions > welcome) which would default to 100 (or the same behavior we have now). > So if I set this parameter to, say, 20%, and the max segment size stays at > 5G, the following would happen when segments were selected for merging: > > any segment with > 20% deleted documents would be merged or rewritten NO > > MATTER HOW LARGE. There are two cases, > >> the segment has < 5G "live" docs. In that case it would be merged with > >> smaller segments to bring the resulting segment up to 5G. If no smaller > >> segments exist, it would just be rewritten > >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). > >> It would be rewritten into a single segment removing all deleted docs no > >> matter how big it is to start. The 100G example above would be rewritten > >> to an 80G segment for instance. > Of course this would lead to potentially much more I/O which is why the > default would be the same behavior we see now. As it stands now, though, > there's no way to recover from an optimize/forceMerge except to re-index from > scratch. We routinely see 200G-300G Lucene indexes at this point "in the > wild" with 10s of shards replicated 3 or more times. And that doesn't even > include having these over HDFS. > Alternatives welcome! Something like the above seems minimally invasive. A > new merge policy is certainly an alternative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) Make coreRootDirectory configurable via an environment variable (SOLR_CORE_HOME)
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277849#comment-16277849 ] Marc Morissette commented on SOLR-11508: [~elyograg] This is an interesting idea but I'm not sure how this solves the problem. It would be nice if Solr could start without solr.xml but it would condemn cloud mode users to choose between sticking to the default settings or mixing their configuration and data. It's either that or we would need to externalize every configuration parameter available in solr.xml (and there are a lot). > Make coreRootDirectory configurable via an environment variable > (SOLR_CORE_HOME) > > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > (Heavily edited) > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful when running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > While this works well in standalone mode, it doesn't in Cloud mode as the > core.properties automatically created by Solr are still stored in > coreRootDirectory and cores created that way disappear when the Solr Docker > container is redeployed. > The solution is to configure coreRootDirectory to an empty directory that can > be mounted outside the Docker container. > The incoming patch makes this easier to do by allowing coreRootDirectory to > be configured via a solr.core.home system property and SOLR_CORE_HOME > environment variable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) Make coreRootDirectory configurable via an environment variable (SOLR_CORE_HOME)
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277362#comment-16277362 ] Marc Morissette commented on SOLR-11508: I've started work on a patch that adds the ability to set coreRootDirectory via an environment variable and command line option: https://github.com/morissm/lucene-solr/commit/95cbd1410fb4bdf97fd9ffec8737117a7931054d I'm starting to have second thoughts though. Solr already has a steep learning curve and I'm loathe to add yet another option if there is a way to avoid it. What if core.properties files were stored in SOLR_DATA_HOME only when Solr is in cloud mode? Unless I'm mistaken, all configuration is stored in Zookeeper in cloud mode so that is the only file that matters. As I've argued earlier, core.properties files in cloud mode are mostly an implementation detail and belong with the data. The only issue would be how to handle the transition for people who have set SOLR_DATA_HOME in cloud mode pre 7.2. I've thought of many automated ways to handle the transition but this might not be easy to accomplish without introducing some potential unintended behaviours. Comments? > Make coreRootDirectory configurable via an environment variable > (SOLR_CORE_HOME) > > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > (Heavily edited) > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful when running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > While this works well in standalone mode, it doesn't in Cloud mode as the > core.properties automatically created by Solr are still stored in > coreRootDirectory and cores created that way disappear when the Solr Docker > container is redeployed. > The solution is to configure coreRootDirectory to an empty directory that can > be mounted outside the Docker container. > The incoming patch makes this easier to do by allowing coreRootDirectory to > be configured via a solr.core.home system property and SOLR_CORE_HOME > environment variable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11508) Make coreRootDirectory configurable via an environment variable (SOLR_CORE_HOME)
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated SOLR-11508: --- Description: (Heavily edited) Since Solr 7, it is possible to store Solr cores in separate disk locations using solr.data.home (see SOLR-6671). This is very useful when running Solr in Docker where data must be stored in a directory which is independent from the rest of the container. While this works well in standalone mode, it doesn't in Cloud mode as the core.properties automatically created by Solr are still stored in coreRootDirectory and cores created that way disappear when the Solr Docker container is redeployed. The solution is to configure coreRootDirectory to an empty directory that can be mounted outside the Docker container. The incoming patch makes this easier to do by allowing coreRootDirectory to be configured via a solr.core.home system property and SOLR_CORE_HOME environment variable. was: Since Solr 7, it is possible to store Solr cores in separate disk locations using solr.data.home (see SOLR-6671). This is very useful where running Solr in Docker where data must be stored in a directory which is independent from the rest of the container. Unfortunately, while core data is stored in {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in {{$\{solr.solr.home}/$\{core.name}/core.properties}}. Reading SOLR-6671 comments, I think this was the expected behaviour but I don't think it is the correct one. In addition to being inelegant and counterintuitive, this has the drawback of stripping a core of its metadata and breaking core discovery when a Solr installation is redeployed, whether in Docker or not. core.properties is mostly metadata and although it contains some configuration, this configuration is specific to the core it accompanies. I believe it should be stored in solr.data.home, with the rest of the data it describes. Summary: Make coreRootDirectory configurable via an environment variable (SOLR_CORE_HOME) (was: core.properties should be stored $solr.data.home/$core.name) > Make coreRootDirectory configurable via an environment variable > (SOLR_CORE_HOME) > > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > (Heavily edited) > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful when running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > While this works well in standalone mode, it doesn't in Cloud mode as the > core.properties automatically created by Solr are still stored in > coreRootDirectory and cores created that way disappear when the Solr Docker > container is redeployed. > The solution is to configure coreRootDirectory to an empty directory that can > be mounted outside the Docker container. > The incoming patch makes this easier to do by allowing coreRootDirectory to > be configured via a solr.core.home system property and SOLR_CORE_HOME > environment variable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274784#comment-16274784 ] Marc Morissette commented on SOLR-11508: [~elyograg], unfortunately what you propose is not really compatible with Docker. In Docker, configuration remains part of the image and users customize that configuration by either extending base images, mapping configuration files during deployment or configuring environment variables. Data must go in a separate directory, ideally one that can be empty without adverse effects. SOLR_HOME is thus not a good solution because it contains configsets and solr.xml. SOLR_DATA_HOME is a good solution for people who use Solr in standalone mode and I will readily admit my patch addresses this use case poorly. I did not completely understand this variable's purpose at first and thought it was somehow "wrong" but it's not. I'm not arguing any change to it anymore. In Cloud mode however, we deal with collections. Cores are more of an implementation detail. In Cloud Mode, I'd argue individual core.properties are closer to segment descriptors in their purpose which is why it makes more sense to keep them with the rest of the data. This is why I believe coreRootDirectory is the best way to separate configuration from data in Cloud mode. To summarize, after reading everyone's viewpoint, I believe all 3 configuration variables are necessary as they address different use cases. [~dsmiley] and I are simply arguing for an easier way to configure coreRootDirectory. If no one sees an objection to that, I'll change the description of this bug as it's getting pretty stale and I'll find some time to work on a new patch to address that. > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273899#comment-16273899 ] Marc Morissette commented on SOLR-11508: [~dsmiley] I was thinking the same thing. What should the environment variable be called? * SOLR_CORE_HOME fits well with SOLR_HOME and SOLR_DATA_HOME * SOLR_CORE_ROOT_DIRECTORY is most similar to coreRootDirectory. I think I like SOLR_CORE_HOME a little bit better. What should the behaviour be if coreRootDirectory is already defined in solr.xml? Should the environment variable override solr.xml or vice-versa? I guess environment variables/command line parameters usually override configuration files? > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273577#comment-16273577 ] Marc Morissette edited comment on SOLR-11508 at 11/30/17 10:47 PM: --- As to Erick's question, I believe: * solr.solr.home contains the server-wide config i.e. solr.xml and the configsets. * coreRootDirectory is where core discovery happens. It contains the core.properties files and conf directories. Defaults to solr.solr.home. * solr.data.home is where core data is stored. It's a directory structure that is completely parallel to the one that contains the core.properties (see Core Discovery documentation). Defaults to coreRootDirectory. The issue here is that the doc says: {quote} -t Sets the solr.data.home system property, where Solr will store data (index). If not set, Solr uses solr.solr.home for config and data.{quote} The doc suggests that the core config will be stored in the directory indicated by -t. It's currently not the case but I think it should be. coreRootDirectory has been there for a long time because it makes sense for people to want to store their cores away from their server configuration (1). solr.data.home addresses what I think might be a less popular requirement: to store core config away from core data (2). The problem is that since 7.0, the command line options and defaults now make it quite easy to think you're addressing need (1) when you're in reality configuring for need (2). was (Author: marc.morissette): As to Erick's question, I believe: * solr.solr.home contains the server config i.e. solr.xml and the configsets * coreRootDirectory is where core discovery happens. It contains the core.properties files and conf directories. Defaults to solr.solr.home. * solr.data.home is where the core data is stored. It's a directory structure that is parallel to the one that contains the core.properties. Defaults to coreRootDirectory. > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273577#comment-16273577 ] Marc Morissette commented on SOLR-11508: As to Erick's question, I believe: * solr.solr.home contains the server config i.e. solr.xml and the configsets * coreRootDirectory is where core discovery happens. It contains the core.properties files and conf directories. Defaults to solr.solr.home. * solr.data.home is where the core data is stored. It's a directory structure that is parallel to the one that contains the core.properties. Defaults to coreRootDirectory. > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273556#comment-16273556 ] Marc Morissette commented on SOLR-11508: I think there might be a way to minimize problems with existing Solr installations. Instead of changing coreRootDirectory's default behaviour, the vanilla solr.xml could be modified to contain $\{solr.data.home:} Users with existing installations that have used the service installation scripts would typically remain on the old solr.xml. I'd venture that the subset of users who define SOLR_DATA_HOME and use the default SOLR_HOME and default solr.xml is probably quite small. > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273530#comment-16273530 ] Marc Morissette commented on SOLR-11508: I've created this bug because a lot of documentation (including the command-line help) indicates that SOLR_DATA_HOME is how you store your data outside the installation. It's true but quite misleading because a lot of what is needed to load that data remains in coreRootDirectory. Core.properties and the conf directory is not just config but metadata. If you delete a core's directory, you would expect the metadata to follow. If you download a new version of Solr and point it to your solr.data.home, you would expect Solr to be able to load your cores without a sweat. Cores are databases and their individual configuration should lie with them, not with the server (except for configsets). Now, I understand why this makes less sense to Solr veterans who have known Solr for a long time but please understand how inintuitive this feels to SolrCloud and less experimented users. My patch does not add or remove any feature. You can still configure different values for SOLR_DATA_HOME and coreRootDirectory. I've simply changed the defaults to something I consider more intuitive (God knows Solr could use a little more of that). Yes, changing the default could break some installations (those that have defined SOLR_DATA_HOME but not coreRootDirectory) but that is why I've added the release note. I feel this is acceptable as long as it makes Solr easier to use. Believe me, I'm not the first one to be tripped up by this issue. > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
Marc Morissette created SOLR-11508: -- Summary: core.properties should be stored $solr.data.home/$core.name Key: SOLR-11508 URL: https://issues.apache.org/jira/browse/SOLR-11508 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Marc Morissette Since Solr 7, it is possible to store Solr cores in separate disk locations using solr.data.home (see SOLR-6671). This is very useful where running Solr in Docker where data must be stored in a directory which is independent from the rest of the container. Unfortunately, while core data is stored in {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in {{$\{solr.solr.home}/$\{core.name}/core.properties}}. Reading SOLR-6671 comments, I think this was the expected behaviour but I don't think it is the correct one. In addition to being inelegant and counterintuitive, this has the drawback of stripping a core of its metadata and breaking core discovery when a Solr installation is redeployed, whether in Docker or not. core.properties is mostly metadata and although it contains some configuration, this configuration is specific to the core it accompanies. I believe it should be stored in solr.data.home, with the rest of the data it describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11508) core.properties should be stored $solr.data.home/$core.name
[ https://issues.apache.org/jira/browse/SOLR-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209794#comment-16209794 ] Marc Morissette commented on SOLR-11508: Are there any objection before I begin work on a patch? > core.properties should be stored $solr.data.home/$core.name > --- > > Key: SOLR-11508 > URL: https://issues.apache.org/jira/browse/SOLR-11508 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Marc Morissette > > Since Solr 7, it is possible to store Solr cores in separate disk locations > using solr.data.home (see SOLR-6671). This is very useful where running Solr > in Docker where data must be stored in a directory which is independent from > the rest of the container. > Unfortunately, while core data is stored in > {{$\{solr.data.home}/$\{core.name}/index/...}}, core.properties is stored in > {{$\{solr.solr.home}/$\{core.name}/core.properties}}. > Reading SOLR-6671 comments, I think this was the expected behaviour but I > don't think it is the correct one. > In addition to being inelegant and counterintuitive, this has the drawback of > stripping a core of its metadata and breaking core discovery when a Solr > installation is redeployed, whether in Docker or not. > core.properties is mostly metadata and although it contains some > configuration, this configuration is specific to the core it accompanies. I > believe it should be stored in solr.data.home, with the rest of the data it > describes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-11399) UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR
[ https://issues.apache.org/jira/browse/SOLR-11399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated SOLR-11399: --- Comment: was deleted (was: I've created a pull request that fixes this issue: https://github.com/apache/lucene-solr/pull/253) > UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR > > > Key: SOLR-11399 > URL: https://issues.apache.org/jira/browse/SOLR-11399 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Reporter: Marc Morissette > > The UnifiedHighlighter always acts as if hl.fragsize=-1 when > hl.bs.type=SEPARATOR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11399) UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR
[ https://issues.apache.org/jira/browse/SOLR-11399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179869#comment-16179869 ] Marc Morissette commented on SOLR-11399: I've created a pull request that fixes this issue: https://github.com/apache/lucene-solr/pull/253 > UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR > > > Key: SOLR-11399 > URL: https://issues.apache.org/jira/browse/SOLR-11399 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Reporter: Marc Morissette > > The UnifiedHighlighter always acts as if hl.fragsize=-1 when > hl.bs.type=SEPARATOR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11399) UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR
Marc Morissette created SOLR-11399: -- Summary: UnifiedHighlighter ignores hl.fragsize value if hl.bs.type=SEPARATOR Key: SOLR-11399 URL: https://issues.apache.org/jira/browse/SOLR-11399 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: highlighter Reporter: Marc Morissette The UnifiedHighlighter always acts as if hl.fragsize=-1 when hl.bs.type=SEPARATOR. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900371#comment-15900371 ] Marc Morissette commented on SOLR-10059: [~hossman] It might be that existing parameters are not descriptive enough to handle every use case. We could add a new parameter to CommonParams: "handler.chain" or "distrib.call.stack" or something similar. It would be a comma delimited list of all the handlers that were involved in a distributed operation and that have forwarded their parameters to the current RequestHandler. A handler would be identified by Collection or Core Name followed by /RequestHandler. e.g. distrib.call.stack=MyCollection/MyHandler,MyCollection2/MyHandler2,... RequestHandlerBase could use this parameter to determine whether defaults, appends and initParams were already applied by the same handler up the chain. It would not handle the case of appends in initParams that apply to different handlers in the same call chain but I would assume this rarely occurs in practice. I'd rather not add more parameters to Solr given how messy the current parameter namespace already is but I don't see a better solution. What do you think? > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.4.0 >Reporter: Marc Morissette > Labels: performance > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
Title: Message Title Marc Morissette updated an issue Solr / SOLR-10059 In SolrCloud, every fq added via is computed twice. Change By: Marc Morissette While researching another issue, I noticed that parameters appended to a query via SearchHandler'sare added to the query twice in SolrCloud: once on the aggregator and again on the shard.The FacetComponent corrects this automatically by removing duplicates. Field queries added in this fashion are however computed twice and that seriously hinders performance on large data sets filter queries that aren't simple bitsets such as those produced by the CollapsingQueryParser .To reproduce the issue, simply test this handler on a large enough collection, then replace "appends" with "defaults" , you . You 'll notice significant performance improvements.{code} {! tag collapse field = myField routingKey hint=top_fc } myValue {code} Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
Title: Message Title Marc Morissette commented on SOLR-10059 Re: In SolrCloud, every fq added via is computed twice. I am willing to work on a patch but I'd like some guidance. I see two ways to solve this: Eliminate duplicate filter queries. Other parameters might however suffer from the same duplication issue so it seems like too narrow a solution. Disable RequestHandler "appends" when ShardParams.IS_SHARD is true. This seems like the better solution since the appended parameters should already have been added to the query by the aggregating node. I don't know if there are some corner cases that I haven't considered though. I'd appreciate some feedback. Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
Title: Message Title Marc Morissette created an issue Solr / SOLR-10059 In SolrCloud, every fq added via is computed twice. Issue Type: Bug Affects Versions: 6.4.0 Assignee: Unassigned Components: SolrCloud Created: 31/Jan/17 04:30 Labels: performance Priority: Major Reporter: Marc Morissette Security Level: Public (Default Security Level. Issues are Public) While researching another issue, I noticed that parameters appended to a query via SearchHandler's are added to the query twice in SolrCloud: once on the aggregator and again on the shard. The FacetComponent corrects this automatically by removing duplicates. Field queries added in this
[jira] [Commented] (LUCENE-7431) Allow negative pre/post values in SpanNotQuery
[ https://issues.apache.org/jira/browse/LUCENE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648417#comment-15648417 ] Marc Morissette commented on LUCENE-7431: - Thanks David! > Allow negative pre/post values in SpanNotQuery > -- > > Key: LUCENE-7431 > URL: https://issues.apache.org/jira/browse/LUCENE-7431 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Marc Morissette >Assignee: David Smiley >Priority: Minor > Fix For: 6.4 > > Attachments: LUCENE-7431.patch > > > I need to be able to specify a certain range of allowed overlap between the > include and exclude parameters of SpanNotQuery. > Since this behaviour is the inverse of the behaviour implemented by the pre > and post constructor arguments, I suggest that this be implemented with > negative pre and post values. > Patch incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7431) Allow negative pre/post values in SpanNotQuery
[ https://issues.apache.org/jira/browse/LUCENE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552928#comment-15552928 ] Marc Morissette edited comment on LUCENE-7431 at 10/6/16 7:30 PM: -- Can I get a review of this patch please? It's rather small and includes tests. was (Author: marc.morissette): Can I get a review of this patch please? It's rather small and code complete. > Allow negative pre/post values in SpanNotQuery > -- > > Key: LUCENE-7431 > URL: https://issues.apache.org/jira/browse/LUCENE-7431 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Marc Morissette >Priority: Minor > Attachments: LUCENE-7431.patch > > > I need to be able to specify a certain range of allowed overlap between the > include and exclude parameters of SpanNotQuery. > Since this behaviour is the inverse of the behaviour implemented by the pre > and post constructor arguments, I suggest that this be implemented with > negative pre and post values. > Patch incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7431) Allow negative pre/post values in SpanNotQuery
[ https://issues.apache.org/jira/browse/LUCENE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552928#comment-15552928 ] Marc Morissette commented on LUCENE-7431: - Can I get a review of this patch please? It's rather small and code complete. > Allow negative pre/post values in SpanNotQuery > -- > > Key: LUCENE-7431 > URL: https://issues.apache.org/jira/browse/LUCENE-7431 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Marc Morissette >Priority: Minor > Attachments: LUCENE-7431.patch > > > I need to be able to specify a certain range of allowed overlap between the > include and exclude parameters of SpanNotQuery. > Since this behaviour is the inverse of the behaviour implemented by the pre > and post constructor arguments, I suggest that this be implemented with > negative pre and post values. > Patch incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7431) Allow negative pre/post values in SpanNotQuery
[ https://issues.apache.org/jira/browse/LUCENE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Morissette updated LUCENE-7431: Attachment: LUCENE-7431.patch > Allow negative pre/post values in SpanNotQuery > -- > > Key: LUCENE-7431 > URL: https://issues.apache.org/jira/browse/LUCENE-7431 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Marc Morissette >Priority: Minor > Attachments: LUCENE-7431.patch > > > I need to be able to specify a certain range of allowed overlap between the > include and exclude parameters of SpanNotQuery. > Since this behaviour is the inverse of the behaviour implemented by the pre > and post constructor arguments, I suggest that this be implemented with > negative pre and post values. > Patch incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7431) Allow negative pre/post values in SpanNotQuery
Marc Morissette created LUCENE-7431: --- Summary: Allow negative pre/post values in SpanNotQuery Key: LUCENE-7431 URL: https://issues.apache.org/jira/browse/LUCENE-7431 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Marc Morissette Priority: Minor I need to be able to specify a certain range of allowed overlap between the include and exclude parameters of SpanNotQuery. Since this behaviour is the inverse of the behaviour implemented by the pre and post constructor arguments, I suggest that this be implemented with negative pre and post values. Patch incoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org