[jira] [Updated] (SOLR-13727) V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" instead of using regex pattern
[ https://issues.apache.org/jira/browse/SOLR-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13727: Fix Version/s: 8.3 master (9.0) Resolution: Fixed Status: Resolved (was: Patch Available) > V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" > instead of using regex pattern > > > Key: SOLR-13727 > URL: https://issues.apache.org/jira/browse/SOLR-13727 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: clients - java, v2 API >Affects Versions: 8.2 >Reporter: Megan Carey >Priority: Major > Labels: easyfix, patch > Fix For: master (9.0), 8.3 > > Attachments: SOLR-13727.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When the HttpSolrClient is formatting a V2Request, it needs to change the > endpoint from the default "/solr/..." to "/api/...". It does so by simply > calling String.replace, which replaces the first instance of "/solr" in the > URL with "/api". > > In the case where the host's address starts with "solr" and the HTTP protocol > is appended, this call changes the address for the request. Example: > if baseUrl is "http://solr-host.com/8983/solr";, this call will change to > "http:/api-host.com:8983/solr" > > We should use a regex pattern to ensure that we're replacing the correct > portion of the URL. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13727) V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" instead of using regex pattern
[ https://issues.apache.org/jira/browse/SOLR-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921677#comment-16921677 ] Yonik Seeley commented on SOLR-13727: - Changes look good to me! I'll commit soon unless anyone else sees an issue with this approach. > V2Requests: HttpSolrClient replaces first instance of "/solr" with "/api" > instead of using regex pattern > > > Key: SOLR-13727 > URL: https://issues.apache.org/jira/browse/SOLR-13727 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: clients - java, v2 API >Affects Versions: 8.2 >Reporter: Megan Carey >Priority: Major > Labels: easyfix, patch > Time Spent: 40m > Remaining Estimate: 0h > > When the HttpSolrClient is formatting a V2Request, it needs to change the > endpoint from the default "/solr/..." to "/api/...". It does so by simply > calling String.replace, which replaces the first instance of "/solr" in the > URL with "/api". > > In the case where the host's address starts with "solr" and the HTTP protocol > is appended, this call changes the address for the request. Example: > if baseUrl is "http://solr-host.com/8983/solr";, this call will change to > "http:/api-host.com:8983/solr" > > We should use a regex pattern to ensure that we're replacing the correct > portion of the URL. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907581#comment-16907581 ] Yonik Seeley commented on SOLR-13695: - Was the SPLITSHARD asynchronous? I'm wondering if maybe the DELETESHARD happened before the SPLITSHARD completed. > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903418#comment-16903418 ] Yonik Seeley commented on SOLR-13399: - Ah, yep... spltiByPrefix definitely should not be defaulting to true! It ended up normally doing nothing (since id_prefix was normally not populated), but that changed when the last commit to use the indexed "if" field was added. I'll fix the default to be false. > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch, > SOLR-13399_testfix.patch, SOLR-13399_useId.patch, > ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903395#comment-16903395 ] Yonik Seeley commented on SOLR-13399: - Weird... I don't know how that commit could have caused a failure in ShardSplitTest, but I'll investigate. > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch, > SOLR-13399_testfix.patch, SOLR-13399_useId.patch, > ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13399: Attachment: SOLR-13399_useId.patch Status: Reopened (was: Reopened) Here's an enhancement that uses the "id" field for histogram generation if there is nothing found in the "id_prefix" field. > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch, > SOLR-13399_testfix.patch, SOLR-13399_useId.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13399: Attachment: SOLR-13399_testfix.patch Status: Reopened (was: Reopened) Attaching patch to fix the test bug by explicitly forcing the number of bits in the test when using tri-level ids "foo/16!bar!doc1" > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch, > SOLR-13399_testfix.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895494#comment-16895494 ] Yonik Seeley commented on SOLR-13399: - OK, figured out the issue... It turns out that if you have foo!, foo!bar! will normally not nest under it. The number of bits used for the first part of the hash is dynamic depending on the number of levels in the composite hash ID. That's unfortunate for a number of reasons. It also breaks the initial bi-level hash that guaranteed that you could just add a prefix to any document id without any escaping (i.e. if your ID happens to contain "!", it can cause the document hash to fall outside of the parent hash prefix.) It looks like is working as designed (according to SOLR-5320), but it was certainly surprising since it prevents hash routing from working out-of-the-box in conjunction with tri-level ids without explicitly specifying bits with the "/" notation. > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892321#comment-16892321 ] Yonik Seeley commented on SOLR-13399: - Thanks for the heads up, I'll investigate. bq. Also: it's really not cool to be adding new end user features/params w/o at least adding a one line summary of the new param to the relevant ref-guide page. Sure, I had planned on doing so before 8.3 (unless you mean we've generally moved to doing doc it as part of the initial commit? If so, I missed that.) > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11266) V2 API returning wrong content-type
[ https://issues.apache.org/jira/browse/SOLR-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889155#comment-16889155 ] Yonik Seeley commented on SOLR-11266: - > One can't say that we are serving valid JSON Perhaps not a valid HTTP JSON response, but a valid text response containing valid JSON. It was deliberate and still standards conforming, and no longer needed. For more context, some of our previous tutorials embedded hyperlinks that users were supposed to click on and see results in their browsers (which resulted in a very poor experience when a browser couldn't handle the content-type by default) > V2 API returning wrong content-type > --- > > Key: SOLR-11266 > URL: https://issues.apache.org/jira/browse/SOLR-11266 > Project: Solr > Issue Type: Bug > Components: v2 API >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-11266.patch > > > The content-type of the returned value is wrong in many places. It should > return "application/json", but instead returns "application/text-plan". > Here's an example: > {code} > [ishan@t430 ~] $ curl -v > "http://localhost:8983/api/collections/products/select?q=*:*&rows=0"; > * Trying 127.0.0.1... > * TCP_NODELAY set > * Connected to localhost (127.0.0.1) port 8983 (#0) > > GET /api/collections/products/select?q=*:*&rows=0 HTTP/1.1 > > Host: localhost:8983 > > User-Agent: curl/7.51.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Content-Type: text/plain;charset=utf-8 > < Content-Length: 184 > < > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":260,"start":0,"docs":[] > }} > * Curl_http_done: called premature == 0 > * Connection #0 to host localhost left intact > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11266) V2 API returning wrong content-type
[ https://issues.apache.org/jira/browse/SOLR-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889096#comment-16889096 ] Yonik Seeley commented on SOLR-11266: - > I'm not of the opinion that users looking at a response in a browser are our > main target audience. More than someone trying to write a context-free Solr client I'd say ;) I think most people wanted application/json out of a misguided sense of correctness (but it's not incorrect to have json formatted text in a plain text HTTP response, and I disagree that this issue should be categorized as a bug.) Although one can argue that application/json is *more* appropriate given that it's more specific. That said, I just tried out the current versions of chrome, safari, and firefox and they all now work when application/json is used, so I'm fine with using "application/json" by default going forward. When this was previously decided, it was the case that no major browsers supported that content-type. > V2 API returning wrong content-type > --- > > Key: SOLR-11266 > URL: https://issues.apache.org/jira/browse/SOLR-11266 > Project: Solr > Issue Type: Bug > Components: v2 API >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-11266.patch > > > The content-type of the returned value is wrong in many places. It should > return "application/json", but instead returns "application/text-plan". > Here's an example: > {code} > [ishan@t430 ~] $ curl -v > "http://localhost:8983/api/collections/products/select?q=*:*&rows=0"; > * Trying 127.0.0.1... > * TCP_NODELAY set > * Connected to localhost (127.0.0.1) port 8983 (#0) > > GET /api/collections/products/select?q=*:*&rows=0 HTTP/1.1 > > Host: localhost:8983 > > User-Agent: curl/7.51.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < Content-Type: text/plain;charset=utf-8 > < Content-Length: 184 > < > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":260,"start":0,"docs":[] > }} > * Curl_http_done: called premature == 0 > * Connection #0 to host localhost left intact > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-13399. - Resolution: Fixed Fix Version/s: 8.3 > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-13399: --- Assignee: Yonik Seeley > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888048#comment-16888048 ] Yonik Seeley commented on SOLR-13399: - Final patch attached, I plan on committing soon. Some implementation notes: - this only takes into account 2-level prefix keys, not tri-level yet (that can be a followup JIRA) - we currently only split into 2 ranges (again, can be extended in a followup JIRA) - if "id_prefix" has no values/data then no "ranges" split recommendation is returned and the split proceeds as if splitByPrefix had not been specified. - in the future we could use the "id" field as a slower version - Split within a prefix is only done if there are not multiple prefix buckets in the shard (i.e. no allowedSizeDifference implemented in this issue) > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Major > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13399: Attachment: SOLR-13399.patch > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Major > Attachments: SOLR-13399.patch, SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882321#comment-16882321 ] Yonik Seeley edited comment on SOLR-13399 at 7/10/19 6:43 PM: -- Here's a draft patch (no tests yet) for feedback. This adds a parameter "splitByPrefix" to SPLITSHARD. When the overseer sees this parameter, it sends an additional SPLIT request with the "getRanges" parameter set. This causes SPLIT (SplitOp.java) to calculate the ranges based on the prefix field "id_prefix" and return the recommended split string in the response in the "ranges" parameter. SPLITSHARD in the overseer then proceeds as if that ranges string had been passed in by the user. "id_prefix" is currently populated via a copyField in the schema: {code} {code} The prefix field is currently always "id_prefix" (convention / implicit). Not sure if it adds value to make it configurable via a "field" parameter on the SPLITSHARD command. was (Author: ysee...@gmail.com): Here's a draft patch (no tests yet) for feedback. This adds a parameter "splitByPrefix" to SPLITSHARD. When the overseer sees this parameter, it sends an additional SPLIT request with the "getRanges" parameter set. This causes SPLIT (SplitOp.java) to calculate the ranges based on the prefix field "id_prefix" and return the recommended split string in the response in the "ranges" parameter. SPLITSHARD in the overseer then proceeds as if that ranges string had been passed in by the user. "id_prefix" is currently populated via a copyField in the schema: {code} {code} The field "id_prefix" is currently hard-coded. Perhaps this should be made configurable via a "field" parameter on the SPLITSHARD command? > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Major > Attachments: SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13399) compositeId support for shard splitting
[ https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13399: Attachment: SOLR-13399.patch Status: Open (was: Open) Here's a draft patch (no tests yet) for feedback. This adds a parameter "splitByPrefix" to SPLITSHARD. When the overseer sees this parameter, it sends an additional SPLIT request with the "getRanges" parameter set. This causes SPLIT (SplitOp.java) to calculate the ranges based on the prefix field "id_prefix" and return the recommended split string in the response in the "ranges" parameter. SPLITSHARD in the overseer then proceeds as if that ranges string had been passed in by the user. "id_prefix" is currently populated via a copyField in the schema: {code} {code} The field "id_prefix" is currently hard-coded. Perhaps this should be made configurable via a "field" parameter on the SPLITSHARD command? > compositeId support for shard splitting > --- > > Key: SOLR-13399 > URL: https://issues.apache.org/jira/browse/SOLR-13399 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Priority: Major > Attachments: SOLR-13399.patch > > > Shard splitting does not currently have a way to automatically take into > account the actual distribution (number of documents) in each hash bucket > created by using compositeId hashing. > We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* > command that would look at the number of docs sharing each compositeId prefix > and use that to create roughly equal sized buckets by document count rather > than just assuming an equal distribution across the entire hash range. > Like normal shard splitting, we should bias against splitting within hash > buckets unless necessary (since that leads to larger query fanout.) . Perhaps > this warrants a parameter that would control how much of a size mismatch is > tolerable before resorting to splitting within a bucket. > *allowedSizeDifference*? > To more quickly calculate the number of docs in each bucket, we could index > the prefix in a different field. Iterating over the terms for this field > would quickly give us the number of docs in each (i.e lucene keeps track of > the doc count for each term already.) Perhaps the implementation could be a > flag on the *id* field... something like *indexPrefixes* and poly-fields that > would cause the indexing to be automatically done and alleviate having to > pass in an additional field during indexing and during the call to > *SPLITSHARD*. This whole part is an optimization though and could be split > off into its own issue if desired. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search
[ https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840549#comment-16840549 ] Yonik Seeley commented on SOLR-13350: - In general, it seems like an executor for parallel searches would be more useful at the CoreContainer level. If the executor is per-searcher, then picking a high enough pool size for good concurrency for a single core means that one would get way to many threads if one has tons of cores per node (not that unusual) We should also audit all Weight classes in Solr for thread safety (if it hasn't been done yet.) . Relying on existing tests to catch stuff like that won't work that well for catching race conditions. > Explore collector managers for multi-threaded search > > > Key: SOLR-13350 > URL: https://issues.apache.org/jira/browse/SOLR-13350 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13350.patch, SOLR-13350.patch, SOLR-13350.patch > > Time Spent: 10m > Remaining Estimate: 0h > > AFAICT, SolrIndexSearcher can be used only to search all the segments of an > index in series. However, using CollectorManagers, segments can be searched > concurrently and result in reduced latency. Opening this issue to explore the > effectiveness of using CollectorManagers in SolrIndexSearcher from latency > and throughput perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13437) fork noggit code to Solr
[ https://issues.apache.org/jira/browse/SOLR-13437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839566#comment-16839566 ] Yonik Seeley commented on SOLR-13437: - I'm fine with forking... I haven't had a chance to do anything with noggit recently. It might make things easier to keep the same namespace though (for anyone in Solr who uses the noggit APIs directly) > fork noggit code to Solr > > > Key: SOLR-13437 > URL: https://issues.apache.org/jira/browse/SOLR-13437 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We rely on noggit for all our JSON encoding/decoding needs.The main project > is not actively maintained . We cannot easily switch to another parser > because it may cause backward incompatibility and we have advertised the > ability to use flexible JSON and we also use noggit internally in many classes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit
[ https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839437#comment-16839437 ] Yonik Seeley commented on LUCENE-8753: -- Thanks Bruno, awesome stuff! A single FST for multiple fields is an important optimization. > New PostingFormat - UniformSplit > > > Key: LUCENE-8753 > URL: https://issues.apache.org/jira/browse/LUCENE-8753 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 8.0 >Reporter: Bruno Roustant >Assignee: David Smiley >Priority: Major > Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt > > Time Spent: 20m > Remaining Estimate: 0h > > This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 > objectives: > - Clear design and simple code. > - Easily extensible, for both the logic and the index format. > - Light memory usage with a very compact FST. > - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance. > (the pdf attached explains visually the technique in more details) > The principle is to split the list of terms into blocks and use a FST to > access the block, but not as a prefix trie, rather with a seek-floor pattern. > For the selection of the blocks, there is a target average block size (number > of terms), with an allowed delta variation (10%) to compare the terms and > select the one with the minimal distinguishing prefix. > There are also several optimizations inside the block to make it more > compact and speed up the loading/scanning. > The performance obtained is interesting with the luceneutil benchmark, > comparing UniformSplit with BlockTree. Find it in the first comment and also > attached for better formatting. > Although the precise percentages vary between runs, three main points: > - TermQuery and PhraseQuery are improved. > - PrefixQuery and WildcardQuery are ok. > - Fuzzy queries are clearly less performant, because BlockTree is so > optimized for them. > Compared to BlockTree, FST size is reduced by 15%, and segment writing time > is reduced by 20%. So this PostingsFormat scales to lots of docs, as > BlockTree. > This initial version passes all Lucene tests. Use “ant test > -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat. > Subjectively, we think we have fulfilled our goal of code simplicity. And we > have already exercised this PostingsFormat extensibility to create a > different flavor for our own use-case. > Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method
[ https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835597#comment-16835597 ] Yonik Seeley commented on LUCENE-8796: -- Hmmm, that looks like it's searching the whole space each time instead of starting that the current point? Presumably this: {code} while(bound < length && docs[bound] < target) { {code} Should be something like this: {code} while(i+bound < length && docs[i+bound] < target) { {code} And also adjust the bounds of the following binary search to match as well. > Use exponential search in IntArrayDocIdSet advance method > - > > Key: LUCENE-8796 > URL: https://issues.apache.org/jira/browse/LUCENE-8796 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Luca Cavanna >Priority: Minor > > Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making > its advance method use exponential search instead of binary search. This > should help performance of queries including conjunctions: given that > ConjunctionDISI uses leap frog, it advances through doc ids in small steps, > hence exponential search should be faster when advancing on average compared > to binary search. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13320) add a param ignoreVersionConflicts=true to updates to not overwrite existing docs
[ https://issues.apache.org/jira/browse/SOLR-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833460#comment-16833460 ] Yonik Seeley commented on SOLR-13320: - Hmmm, when I read "ignoreVersionConflicts" I assumed the wrong behavior... go ahead and add even if there is a version conflict. We aren't really ignoring it, but rather continuing on to the next update/doc in the batch after it happened? I'm not sure if I can think if a better name though... thinking along the lines of [~gus_heck], maybe something like "continueOnVersionConflict" (or "continueOnError" for the general case)? > add a param ignoreVersionConflicts=true to updates to not overwrite existing > docs > - > > Key: SOLR-13320 > URL: https://issues.apache.org/jira/browse/SOLR-13320 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Attachments: SOLR-13320.patch, SOLR-13320.patch > > > Updates should have an option to ignore duplicate documents and drop them if > an option {{ignoreDuplicates=true}} is specified -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13431) Efficient updates with shared storage
[ https://issues.apache.org/jira/browse/SOLR-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13431: Description: h2. Background & problem statement: With shared storage support, data durability is handled by the storage layer (e.g. S3 or HDFS) and replicas are not needed for durability. This changes the nature of how a single update (say adding a document) must be handled. The local transaction log does not help... a node can go down and never come back. The implication is that *a commit must be done for any updates to be considered durable.* The problem is also more complex than just batching updates and adding a commit at the end of a batch. Consider indexing documents A,B,C,D followed by a commit: 1) documents A,B sent to leader1 and indexed 2) leader1 fails, leader2 is elected 3) documents C,D sent to leader2 and indexed 4) commit After this sequence of events, documents A,B are actually lost because a commit was not done on leader1 before it failed. Adding a commit for every single update would fix the problem of data loss, but would obviously be too expensive (and each commit will be more expensive We can still do batches if we *disable transparent failover* for a batch. - all updates in a batch (for a specific shard) should be indexed on the *same leader*... any change in leadership should result in a failure at the low level instead of any transparent failover or forwarding. - in the event of a failure, *all updates since the last commit must be replayed* (we can't just retry the failure itself), or the failure will need to be bubbled up to a higher layer to retry from the beginning. h2. Indexing scenario 1: CSV upload If SolrCloud is loading a large CSV file, The receiving Solr node will forward updates to the correct leaders. This happens in the DistributedUpdateProcessor via SolrCmdDistributor, which ends up using a ConcurrentUpdateHttp2SolrClient subclass. Fixing this scenario for shared storage in the simplest way would entail adding a commit to every update, which would be way to slow. The forward-to-replica use case here is quite different than the forward-to-correct-leader (the latter has the current solr node acting much more like an external client.). To simpliify development, we may want to separate these cases and continue using the existing code for forward-to-replica. h2. Indexing scenario 2: SolrJ bulk indexing In this scenario, a client is trying to do a large amount of indexing and can use batches or streaming. For this scenario, we could just require that a commit be added for each batch and then fail a batch on any leader change. This is problematic for a couple of reasons: - larger batches add latency to build, hurting throughput - doesn't scale well - as a collection grows, the number of shards grow and the chance that any shard leader goes down (or the shard is split) goes up. Requiring that the entire batch (all shards) be replayed when this happens is wasteful and gets worse with collection growth. h2. Proposed Solution: a SolrJ cloud aware streaming client - something like ConcurrentUpdateHttp2SolrClient that can stream and know about cloud layout - track when last commit happened for each shard leader - buffer updates per-shard since the last commit happened -- doesn't have to be exact... assume idempotent updates here, so overlap is fine -- buffering would also be triggered by the replica type of the collection (so this class could be used for both shared storage and normal NRT replicas) - a parameter would be passed that would disallow any forwarding (since we're handling buffering/failover at this level) - on a failure because of a leader going down or loss of leadership, wait until a new leader has been elected and then replay updates since the last commit - insert commits where necessary to prevent buffers from growing too large -- inserted commits should be able to proceed in parallel... we shouldn't need to block and wait for a commit before resuming to send documents to that leader. -- it would be nice if there was a way we could get notified if a commit happened via some other mechanism (like an autoCommit being triggered) --- assuming we can't get this, perhaps we should pass a flag that disables triggering auto-commits for these batch updates? - handle splits (not only can a shard leader change, but a shard could split... buffered updates may need to be re-slotted) - need to handle a leader "bounce" like a change in leadership (assuming we're skipping using the transaction log) - multi-threaded - all updates to a leader regardless of thread are managed as a single update stream -- this perhaps provides a way to coalesce incremental/realtime updates - OPTIONAL: ability to have multiple channels to a single leader? -- we would need to avoid reordering updates to the same ID -- an alterna
[jira] [Updated] (SOLR-13431) Efficient updates with shared storage
[ https://issues.apache.org/jira/browse/SOLR-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13431: Description: h2. Background & problem statement: With shared storage support, data durability is handled by the storage layer (e.g. S3 or HDFS) and replicas are not needed for durability. This changes the nature of how a single update (say adding a document) must be handled. The local transaction log does not help... a node can go down and never come back. The implication is that *a commit must be done for any updates to be considered durable.* The problem is also more complex than just batching updates and adding a commit at the end of a batch. Consider indexing documents A,B,C,D followed by a commit: 1) documents A,B sent to leader1 and indexed 2) leader1 fails, leader2 is elected 3) documents C,D sent to leader2 and indexed 4) commit After this sequence of events, documents A,B are actually lost because a commit was not done on leader1 before it failed. Adding a commit for every single update would fix the problem of data loss, but would obviously be too expensive (and each commit will be more expensive We can still do batches if we *disable transparent failover* for a batch. - all updates in a batch (for a specific shard) should be indexed on the *same leader*... any change in leadership should result in a failure at the low level instead of any transparent failover or forwarding. - in the event of a failure, *all updates since the last commit must be replayed* (we can't just retry the failure itself), or the failure will need to be bubbled up to a higher layer to retry from the beginning. h2. Indexing scenario 1: CSV upload If SolrCloud is loading a large CSV file, The receiving Solr node will forward updates to the correct leaders. This happens in the DistributedUpdateProcessor via SolrCmdDistributor, which ends up using a ConcurrentUpdateHttp2SolrClient subclass. The forward-to-replica use case here is quite different than the forward-to-correct-leader (the latter has the current solr node acting much more like an external client.). To simpliify development, we may want to separate these cases and continue using the existing code for forward-to-replica. h2. Indexing scenario 2: SolrJ bulk indexing In this scenario, a client is trying to do a large amount of indexing and can use batches or streaming. For this scenario, we could just require that a commit be added for each batch and then fail a batch on any leader change. This is problematic for a couple of reasons: - larger batches add latency to build, hurting throughput - doesn't scale well - as a collection grows, the number of shards grow and the chance that any shard leader goes down (or the shard is split) goes up. Requiring that the entire batch (all shards) be replayed when this happens is wasteful and gets worse with collection growth. h2. Proposed Solution: a SolrJ cloud aware streaming client - something like ConcurrentUpdateHttp2SolrClient that can stream and know about cloud layout - track when last commit happened for each shard leader - buffer updates per-shard since the last commit happened -- doesn't have to be exact... assume idempotent updates here, so overlap is fine -- buffering would also be triggered by the replica type of the collection (so this class could be used for both shared storage and normal NRT replicas) - a parameter would be passed that would disallow any forwarding (since we're handling buffering/failover at this level) - on a failure because of a leader going down or loss of leadership, wait until a new leader has been elected and then replay updates since the last commit - insert commits where necessary to prevent buffers from growing too large -- inserted commits should be able to proceed in parallel... we shouldn't need to block and wait for a commit before resuming to send documents to that leader. -- it would be nice if there was a way we could get notified if a commit happened via some other mechanism (like an autoCommit being triggered) --- assuming we can't get this, perhaps we should pass a flag that disables triggering auto-commits for these batch updates? - handle splits (not only can a shard leader change, but a shard could split... buffered updates may need to be re-slotted) - need to handle a leader "bounce" like a change in leadership (assuming we're skipping using the transaction log) - multi-threaded - all updates to a leader regardless of thread are managed as a single update stream -- this perhaps provides a way to coalesce incremental/realtime updates - OPTIONAL: ability to have multiple channels to a single leader? -- we would need to avoid reordering updates to the same ID -- an alternative to attempting to create more parallelism-per-shard on the client side is to do it on the server side. was: h2. Background & pro
[jira] [Created] (SOLR-13431) Efficient updates with shared storage
Yonik Seeley created SOLR-13431: --- Summary: Efficient updates with shared storage Key: SOLR-13431 URL: https://issues.apache.org/jira/browse/SOLR-13431 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: Yonik Seeley h2. Background & problem statement: With shared storage support, data durability is handled by the storage layer (e.g. S3 or HDFS) and replicas are not needed for durability. This changes the nature of how a single update (say adding a document) must be handled. The local transaction log does not help... a node can go down and never come back. The implication is that *a commit must be done for any updates to be considered durable.* The problem is also more complex than just batching updates and adding a commit at the end of a batch. Consider indexing documents A,B,C,D followed by a commit: 1) documents A,B sent to leader1 and indexed 2) leader1 fails, leader2 is elected 3) documents C,D sent to leader2 and indexed 4) commit After this sequence of events, documents A,B are actually lost because a commit was not done on leader1 before it failed. Adding a commit for every single update would fix the problem of data loss, but would obviously be too expensive (and each commit will be more expensive We can still do batches if we *disable transparent failover* for a batch. - all updates in a batch (for a specific shard) should be indexed on the *same leader*... any change in leadership should result in a failure at the low level instead of any transparent failover or forwarding. - in the event of a failure, *all updates since the last commit must be replayed* (we can't just retry the failure itself), or the failure will need to be bubbled up to a higher layer to retry from the beginning. h2. Indexing scenario 1: CSV upload If SolrCloud is loading a large CSV file, The receiving Solr node will forward updates to the correct leaders. This happens in the DistributedUpdateProcessor via SolrCmdDistributor, which ends up using a ConcurrentUpdateHttp2SolrClient subclass. h2. Indexing scenario 2: SolrJ bulk indexing In this scenario, a client is trying to do a large amount of indexing and can use batches or streaming. For this scenario, we could just require that a commit be added for each batch and then fail a batch on any leader change. This is problematic for a couple of reasons: - larger batches add latency to build, hurting throughput - doesn't scale well - as a collection grows, the number of shards grow and the chance that any shard leader goes down (or the shard is split) goes up. Requiring that the entire batch (all shards) be replayed when this happens is wasteful and gets worse with collection growth. h2. Proposed Solution: a SolrJ cloud aware streaming client - something like ConcurrentUpdateHttp2SolrClient that can stream and know about cloud layout - track when last commit happened for each shard leader - buffer updates per-shard since the last commit happened -- doesn't have to be exact... assume idempotent updates here, so overlap is fine -- buffering would also be triggered by the replica type of the collection (so this class could be used for both shared storage and normal NRT replicas) - a parameter would be passed that would disallow any forwarding (since we're handling buffering/failover at this level) - on a failure because of a leader going down or loss of leadership, wait until a new leader has been elected and then replay updates since the last commit - insert commits where necessary to prevent buffers from growing too large -- inserted commits should be able to proceed in parallel... we shouldn't need to block and wait for a commit before resuming to send documents to that leader. -- it would be nice if there was a way we could get notified if a commit happened via some other mechanism (like an autoCommit being triggered) --- assuming we can't get this, perhaps we should pass a flag that disables triggering auto-commits for these batch updates? - handle splits (not only can a shard leader change, but a shard could split... buffered updates may need to be re-slotted) - need to handle a leader "bounce" like a change in leadership (assuming we're skipping using the transaction log) - multi-threaded - all updates to a leader regardless of thread are managed as a single update stream -- this perhaps provides a way to coalesce incremental/realtime updates - OPTIONAL: ability to have multiple channels to a single leader? -- we would need to avoid reordering updates to the same ID -- an alternative to attempting to create more parallelism-per-shard on the client side is to do it on the server side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsub
[jira] [Commented] (SOLR-13405) Support 1 or 0 replicas per shard
[ https://issues.apache.org/jira/browse/SOLR-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818312#comment-16818312 ] Yonik Seeley commented on SOLR-13405: - 0 replica support thoughts: The idea of bringing up another replica if 1 replica seems down can naturally be extended to include 0 replica support. The idea can be recast as requesting a new replica on demand if all existing replicas (including 0) seem down to a client. One area where this is a little different is the indexing side... there would need to be code in the indexing paths that recognize 0 replicas configured and bring one up on demand. After a certain period of inactivity, we'd want to return to 0 replicas. This could probably be split off into a different JIRA. > Support 1 or 0 replicas per shard > - > > Key: SOLR-13405 > URL: https://issues.apache.org/jira/browse/SOLR-13405 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > > When multiple replicas per shard are not needed for data durability (because > of shared storage support on HDFS or S3, etc), other cluster configurations > suddenly make sense like allowing 1 or even 0 replicas per shard (primarily > to lower costs.) > One big issue with a single replica per shard is that zookeeper (and thus the > overseer) waits for a session timeout before marking the node as down. > Instead of queries having to wait this long (~30 sec), if a SolrJ query > client detects that a node died, it can ask the overseer to quickly bring up > another replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13405) Support 1 or 0 replicas per shard
[ https://issues.apache.org/jira/browse/SOLR-13405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818162#comment-16818162 ] Yonik Seeley commented on SOLR-13405: - Some design considerations / thoughts: - the node/replica should not be marked down in ZK based on client detection... it should only cause a temporary new replica to be quickly brought up for querying. - this will have no effect on who is the leader... hence this only helps query side (which is normally much more latency sensitive). - overseer should dedup requests since multiple clients detecting a node going down will all request new replicas. -- to aid in this deduplication, client should include in its request which replica it detected as down - Node vs Core (replica) down detection? To lessen the impact of false down detection, and to speed completion of the current query, only request new replicas for the shards that are being queried (as opposed to all shards on the node that went down) - Return to normal state - at some point, we should return to the normal number of replicas. Use autoscale framework for this? > Support 1 or 0 replicas per shard > - > > Key: SOLR-13405 > URL: https://issues.apache.org/jira/browse/SOLR-13405 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > > When multiple replicas per shard are not needed for data durability (because > of shared storage support on HDFS or S3, etc), other cluster configurations > suddenly make sense like allowing 1 or even 0 replicas per shard (primarily > to lower costs.) > One big issue with a single replica per shard is that zookeeper (and thus the > overseer) waits for a session timeout before marking the node as down. > Instead of queries having to wait this long (~30 sec), if a SolrJ query > client detects that a node died, it can ask the overseer to quickly bring up > another replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13405) Support 1 or 0 replicas per shard
Yonik Seeley created SOLR-13405: --- Summary: Support 1 or 0 replicas per shard Key: SOLR-13405 URL: https://issues.apache.org/jira/browse/SOLR-13405 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: Yonik Seeley When multiple replicas per shard are not needed for data durability (because of shared storage support on HDFS or S3, etc), other cluster configurations suddenly make sense like allowing 1 or even 0 replicas per shard (primarily to lower costs.) One big issue with a single replica per shard is that zookeeper (and thus the overseer) waits for a session timeout before marking the node as down. Instead of queries having to wait this long (~30 sec), if a SolrJ query client detects that a node died, it can ask the overseer to quickly bring up another replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13399) compositeId support for shard splitting
Yonik Seeley created SOLR-13399: --- Summary: compositeId support for shard splitting Key: SOLR-13399 URL: https://issues.apache.org/jira/browse/SOLR-13399 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: Yonik Seeley Shard splitting does not currently have a way to automatically take into account the actual distribution (number of documents) in each hash bucket created by using compositeId hashing. We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* command that would look at the number of docs sharing each compositeId prefix and use that to create roughly equal sized buckets by document count rather than just assuming an equal distribution across the entire hash range. Like normal shard splitting, we should bias against splitting within hash buckets unless necessary (since that leads to larger query fanout.) . Perhaps this warrants a parameter that would control how much of a size mismatch is tolerable before resorting to splitting within a bucket. *allowedSizeDifference*? To more quickly calculate the number of docs in each bucket, we could index the prefix in a different field. Iterating over the terms for this field would quickly give us the number of docs in each (i.e lucene keeps track of the doc count for each term already.) Perhaps the implementation could be a flag on the *id* field... something like *indexPrefixes* and poly-fields that would cause the indexing to be automatically done and alleviate having to pass in an additional field during indexing and during the call to *SPLITSHARD*. This whole part is an optimization though and could be split off into its own issue if desired. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13272) Interval facet support for JSON faceting
[ https://issues.apache.org/jira/browse/SOLR-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814833#comment-16814833 ] Yonik Seeley commented on SOLR-13272: - bq. why it's a separate type and not just optional property to type:range? I agree it would probably be nicer to just have it as part of a range facet... that way other range parameters like "other", "include", etc could be (eventually) supported / reused. > Interval facet support for JSON faceting > > > Key: SOLR-13272 > URL: https://issues.apache.org/jira/browse/SOLR-13272 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Apoorv Bhawsar >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Interval facet is supported in classical facet component but has no support > in json facet requests. > In cases of block join and aggregations, this would be helpful > Assuming request format - > {code:java} > json.facet={pubyear:{type : interval,field : > pubyear_i,intervals:[{key:"2000-2200",value:"[2000,2200]"}]}} > {code} > > PR https://github.com/apache/lucene-solr/pull/597 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11
[ https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814705#comment-16814705 ] Yonik Seeley commented on LUCENE-8738: -- bq. I think the Observable/Observer is uncritical. I agree. Pluggable transient core cache is super-expert level (almost more like internals) and if anyone actually uses it they can adapt when upgrading. I did a quick scan of the related changes and they look fine. > Bump minimum Java version requirement to 11 > --- > > Key: LUCENE-8738 > URL: https://issues.apache.org/jira/browse/LUCENE-8738 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Adrien Grand >Priority: Minor > Labels: Java11 > Fix For: master (9.0) > > > See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13323) Remove org.apache.solr.internal.csv.writer.CSVWriter (and related classes)
[ https://issues.apache.org/jira/browse/SOLR-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800307#comment-16800307 ] Yonik Seeley commented on SOLR-13323: - bq. Is there any reason to believe from it's past history (of which I know nothing) A quick history is that Solr needed a non-official commons-csv release and so the source was copied (but apparently all of the source and not just what was needed.) No deprecations are necessary for removal. > Remove org.apache.solr.internal.csv.writer.CSVWriter (and related classes) > -- > > Key: SOLR-13323 > URL: https://issues.apache.org/jira/browse/SOLR-13323 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Gus Heck >Priority: Minor > > This class appears to only be used in the test for itself. It's also easily > confused with org.apache.solr.response.CSVWriter > I propose we remove this class entirely. Is there any reason to believe from > it's past history (of which I know nothing) that it might be depended upon by > outside code and require a deprecation cycle? > Presently it contains a System.out.println and a eclipse generated catch > block that precommit won't like if we enable checking for System.out.println, > which is why this ticket is a sub-task. If we do need to deprecate it then I > propose we remove the print and simply re-throw the exception as a > RuntimeException -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6237) An option to have only leaders write and replicas read when using a shared file system with SolrCloud.
[ https://issues.apache.org/jira/browse/SOLR-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792202#comment-16792202 ] Yonik Seeley commented on SOLR-6237: bq. Thanks for the pointers Yonik! Based on the linked presentation, there is a working prototype in place at SalesForce. Is there a way I can help in the implementation or testing? The code/impl referenced in the presentation is only for Solr stand-alone (not SolrCloud.) Hopefully we'll have something (rough proof-of-concept stuff) to share in come coming weeks though. In the meantime feel free to share your thoughts on the linked issues. > An option to have only leaders write and replicas read when using a shared > file system with SolrCloud. > -- > > Key: SOLR-6237 > URL: https://issues.apache.org/jira/browse/SOLR-6237 > Project: Solr > Issue Type: New Feature > Components: hdfs, SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > Attachments: 0001-unified.patch, SOLR-6237.patch, Unified Replication > Design.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6237) An option to have only leaders write and replicas read when using a shared file system with SolrCloud.
[ https://issues.apache.org/jira/browse/SOLR-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788704#comment-16788704 ] Yonik Seeley commented on SOLR-6237: Hi Peter, I opened SOLR-13101 and SOLR-13102 recently... I had lost track of this issue until you commented on it yesterday. > An option to have only leaders write and replicas read when using a shared > file system with SolrCloud. > -- > > Key: SOLR-6237 > URL: https://issues.apache.org/jira/browse/SOLR-6237 > Project: Solr > Issue Type: New Feature > Components: hdfs, SolrCloud >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > Attachments: 0001-unified.patch, SOLR-6237.patch, Unified Replication > Design.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9682) Ability to specify a query with a parameter name (in facet filter)
[ https://issues.apache.org/jira/browse/SOLR-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759171#comment-16759171 ] Yonik Seeley commented on SOLR-9682: > What it someone make a typo when attempting to filter out some explicit > content? If someone adds a filter and it doesn't work, the filter (and how it's specified via param) will be the first thing they look at (hence a typo should be easy to debug). Removing a feature to allow detecting of one very specific typo doesn't seem like a good trade-off in this specific scenario. It's a common scenario to want to filter if one is provided. It makes it easier to have a request that doesn't have to be modified as much based on the absence/presence of other parameters. Also, "Multi-valued parameters should be supported." was part of the objective. So the parameter refers to a list of filters... and allowing "0 or more" for a list is more flexible than "you're not allowed to have a 0 length list". > Ability to specify a query with a parameter name (in facet filter) > -- > > Key: SOLR-9682 > URL: https://issues.apache.org/jira/browse/SOLR-9682 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 6.4, 7.0 > > Attachments: SOLR-9682.patch > > > Currently, "filter" only supports query strings (examples at > http://yonik.com/solr-json-request-api/ ) > It would be nice to be able to reference a param that would be parsed as a > lucene/solr query. Multi-valued parameters should be supported. > We should keep in mind (and leave room for) a future "JSON Query Syntax" and > chose labels appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754041#comment-16754041 ] Yonik Seeley commented on SOLR-13101: - Thinking about how to kick this off... At the most basic level, looking at the HDFS layout scheme we see this ("test" is the name of the collection): {code} local_file_system://.../node1/test_shard1_replica_n1/core.properties hdfs://.../data/test/core_node2/data/ {code} And core.properties looks like: {code} numShards=1 collection.configName=conf1 name=test_shard1_replica_n1 replicaType=NRT shard=shard1 collection=test coreNodeName=core_node2 {code} It seems like the most basic desirable change would be to the naming scheme for collections with shared storage. Instead of ...///data it should be ...///data since there is only one canonical index per shard. > Shared storage support in SolrCloud > --- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Yonik Seeley >Priority: Major > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down >- durability not linked to number of replcias.. a single replica could be > common for write workloads >- could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log >- don't pay for what you don't need >- a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13165) enabling docValues on a tdate field and searching on the field is very slow
[ https://issues.apache.org/jira/browse/SOLR-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751303#comment-16751303 ] Yonik Seeley commented on SOLR-13165: - Are you sure that the field was indexed both times? As long as the tdate field is indexed, that index should be used for queries, regardless of if it has docValues. > enabling docValues on a tdate field and searching on the field is very slow > --- > > Key: SOLR-13165 > URL: https://issues.apache.org/jira/browse/SOLR-13165 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Sheeba Dhanaraj >Priority: Major > > when we enable docValues on a tdate field and search on the field response > time is very slow. when we remove docValues from the field performance is > significantly improved. Is this by design? should we not enable docValues for > tdate fields -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13156) Limiting field facet with certain terms via {!terms} not taking into account sorting
[ https://issues.apache.org/jira/browse/SOLR-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749886#comment-16749886 ] Yonik Seeley commented on SOLR-13156: - Interesting IIRC, this wasn't a public API, and was only used internally for facet refinement (hence no need for sorting.) It looks like at some point it got documented as a public API, so I guess it is now. > Limiting field facet with certain terms via {!terms} not taking into account > sorting > > > Key: SOLR-13156 > URL: https://issues.apache.org/jira/browse/SOLR-13156 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Konstantin Perikov >Priority: Major > > When I'm doing limiting facet keys with \{!terms} it doesn't take into > account sorting. > First query not limiting the facet keys: > {{facet.field=title&facet.sort=count&facet=on&q=*:*}} > Response as expected: > {{"facet_counts":\{ "facet_queries":{}, "facet_fields":\{ "title":[ > "book2",3, "book1",2, "book3",1]}, "facet_ranges":{}, "facet_intervals":{}, > "facet_heatmaps":{} > > When doing it with limiting: > {{facet.field=\{!terms=Book3,Book2,Book1}title&facet.sort=count&facet=on&q=*:*}} > I'm getting the exact order of how I list terms: > {{"facet_counts":\{ "facet_queries":{}, "facet_fields":\{ "title":[ > "Book3",1, "Book2",3, "Book1",2]}, "facet_ranges":{}, "facet_intervals":{}, > "facet_heatmaps":{} > I've looked at the code, and it's clearly an issue there: > > org.apache.solr.request.SimpleFacets#getListedTermCounts > > {{for (String term : terms) {}} > {{ int count = searcher.numDocs(ft.getFieldQuery(null, sf, term), > parsed.docs);}} > {{ res.add(term, count);}} > {{}}} > > it's just basically iterating over terms and don't do any sorting at all. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13102) Shared storage Directory implementation
[ https://issues.apache.org/jira/browse/SOLR-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13102: Description: We need a general strategy (and probably a general base class) that can work with shared storage and not corrupt indexes from multiple writers. One strategy that is used on local disk is to use locks. This doesn't extend well to remote / shared filesystems when the locking is not tied into the object store itself since a process can lose the lock (a long GC or whatever) and then immediately try to write a file and there is no way to stop it. An alternate strategy ditches the use of locks and simply avoids overwriting files by some algorithmic mechanism. One of my colleagues outlined one way to do this: https://www.youtube.com/watch?v=UeTFpNeJ1Fo That strategy uses random looking filenames and then writes a "core.metadata" file that maps between the random names and the original names. The problem is then reduced to overwriting "core.metadata" when you lose the lock. One way to fix this is to version "core.metadata". Since the new leader election code was implemented, each shard as a monotonically increasing "leader term", and we can use that as part of the filename. When a reader goes to open an index, it can use the latest file from the directory listing, or even use the term obtained from ZK if we can't trust the directory listing to be up to date. Additionally, we don't need random filenames to avoid collisions... a simple unique prefix or suffix would work fine (such as the leader term again) was: We need a general strategy (and probably a general base class) that can work with shared storage and not corrupt indexes from multiple writers. One strategy that is used on local disk is to use locks. This doesn't extend well to remote / shared filesystems when the locking is not tied into the object store itself since a process can lose the lock (a long GC or whatever) and then immediately try to write a file and there is no way to stop it. An alternate strategy ditches the use of locks and simply avoids overwriting files by some algorithmic mechanism. One of my colleagues outlined one way to do this: https://www.youtube.com/watch?v=UeTFpNeJ1Fo That strategy uses random looking filenames and then writes a "core.metadata" file that maps between the random names and the original names. The problem is then reduced to overwriting "core.metadata" when you lose the lock. One way to fix this is to version "core.metadata". Since the new leader election code was implemented, each shard as a monotonically increasing "leader term", and we can use that as part of the filename. When a reader goes to open an index, it can use the latest file from the directory listing, or even use the term obtained from ZK if we can't trust the directory listing to be up to date. > Shared storage Directory implementation > --- > > Key: SOLR-13102 > URL: https://issues.apache.org/jira/browse/SOLR-13102 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > > We need a general strategy (and probably a general base class) that can work > with shared storage and not corrupt indexes from multiple writers. > One strategy that is used on local disk is to use locks. This doesn't extend > well to remote / shared filesystems when the locking is not tied into the > object store itself since a process can lose the lock (a long GC or whatever) > and then immediately try to write a file and there is no way to stop it. > An alternate strategy ditches the use of locks and simply avoids overwriting > files by some algorithmic mechanism. > One of my colleagues outlined one way to do this: > https://www.youtube.com/watch?v=UeTFpNeJ1Fo > That strategy uses random looking filenames and then writes a "core.metadata" > file that maps between the random names and the original names. The problem > is then reduced to overwriting "core.metadata" when you lose the lock. One > way to fix this is to version "core.metadata". Since the new leader election > code was implemented, each shard as a monotonically increasing "leader term", > and we can use that as part of the filename. When a reader goes to open an > index, it can use the latest file from the directory listing, or even use the > term obtained from ZK if we can't trust the directory listing to be up to > date. Additionally, we don't need random filenames to avoid collisions... a > simple unique prefix or suffix would work fine (such as the leader term again) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-u
[jira] [Created] (SOLR-13102) Shared storage Directory implementation
Yonik Seeley created SOLR-13102: --- Summary: Shared storage Directory implementation Key: SOLR-13102 URL: https://issues.apache.org/jira/browse/SOLR-13102 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Reporter: Yonik Seeley We need a general strategy (and probably a general base class) that can work with shared storage and not corrupt indexes from multiple writers. One strategy that is used on local disk is to use locks. This doesn't extend well to remote / shared filesystems when the locking is not tied into the object store itself since a process can lose the lock (a long GC or whatever) and then immediately try to write a file and there is no way to stop it. An alternate strategy ditches the use of locks and simply avoids overwriting files by some algorithmic mechanism. One of my colleagues outlined one way to do this: https://www.youtube.com/watch?v=UeTFpNeJ1Fo That strategy uses random looking filenames and then writes a "core.metadata" file that maps between the random names and the original names. The problem is then reduced to overwriting "core.metadata" when you lose the lock. One way to fix this is to version "core.metadata". Since the new leader election code was implemented, each shard as a monotonically increasing "leader term", and we can use that as part of the filename. When a reader goes to open an index, it can use the latest file from the directory listing, or even use the term obtained from ZK if we can't trust the directory listing to be up to date. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13101) Shared storage support in SolrCloud
Yonik Seeley created SOLR-13101: --- Summary: Shared storage support in SolrCloud Key: SOLR-13101 URL: https://issues.apache.org/jira/browse/SOLR-13101 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Yonik Seeley Solr should have first-class support for shared storage (blob/object stores like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, etc). The key component will likely be a new replica type for shared storage. It would have many of the benefits of the current "pull" replicas (not indexing on all replicas, all shards identical with no shards getting out-of-sync, etc), but would have additional benefits: - Any shard could become leader (the blob store always has the index) - Better elasticity scaling down - durability not linked to number of replcias.. a single replica could be common for write workloads - could drop to 0 replicas for a shard when not needed (blob store always has index) - Allow for higher performance write workloads by skipping the transaction log - don't pay for what you don't need - a commit will be necessary to flush to stable storage (blob store) - A lot of the complexity and failure modes go away An additional component a Directory implementation that will work well with blob stores. We probably want one that treats local disk as a cache since the latency to remote storage is so large. I think there are still some "locking" issues to be solved here (ensuring that more than one writer to the same index won't corrupt it). This should probably be pulled out into a different JIRA issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13040) Harden TestSQLHandler.
[ https://issues.apache.org/jira/browse/SOLR-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719139#comment-16719139 ] Yonik Seeley commented on SOLR-13040: - It's pretty strange... that error message "can not sort on a field..." is from a schema check and has nothing to do with what is in the index. I tried looping the test overnight but couldn't reproduce it. If I were to guess, it might be an issue in the test framework occasionally picking up the wrong schema or something? > Harden TestSQLHandler. > -- > > Key: SOLR-13040 > URL: https://issues.apache.org/jira/browse/SOLR-13040 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mark Miller >Assignee: Joel Bernstein >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8374) Reduce reads for sparse DocValues
[ https://issues.apache.org/jira/browse/LUCENE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708972#comment-16708972 ] Yonik Seeley commented on LUCENE-8374: -- bq. as for turning on optionally, then it was part of my first patch as a static global switch That sounds like a good compromise... just make it expert/experimental so it can be removed later. One nice thing about search-time is that it doesn't introduce any index format back compat issues - it can be evolved or removed partially or entirely when the index format improves. > Reduce reads for sparse DocValues > - > > Key: LUCENE-8374 > URL: https://issues.apache.org/jira/browse/LUCENE-8374 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Affects Versions: 7.5, master (8.0) >Reporter: Toke Eskildsen >Priority: Major > Labels: performance > Attachments: LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, > LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, LUCENE-8374.patch, > LUCENE-8374_branch_7_3.patch, LUCENE-8374_branch_7_3.patch.20181005, > LUCENE-8374_branch_7_4.patch, LUCENE-8374_branch_7_5.patch, > LUCENE-8374_part_1.patch, LUCENE-8374_part_2.patch, LUCENE-8374_part_3.patch, > LUCENE-8374_part_4.patch, entire_index_logs.txt, > image-2018-10-24-07-30-06-663.png, image-2018-10-24-07-30-56-962.png, > single_vehicle_logs.txt, > start-2018-10-24-1_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png, > > start-2018-10-24_snapshot___Users_tim_Snapshots__-_YourKit_Java_Profiler_2017_02-b75_-_64-bit.png > > > The {{Lucene70DocValuesProducer}} has the internal classes > {{SparseNumericDocValues}} and {{BaseSortedSetDocValues}} (sparse code path), > which again uses {{IndexedDISI}} to handle the docID -> value-ordinal lookup. > The value-ordinal is the index of the docID assuming an abstract tightly > packed monotonically increasing list of docIDs: If the docIDs with > corresponding values are {{[0, 4, 1432]}}, their value-ordinals will be {{[0, > 1, 2]}}. > h2. Outer blocks > The lookup structure of {{IndexedDISI}} consists of blocks of 2^16 values > (65536), where each block can be either {{ALL}}, {{DENSE}} (2^12 to 2^16 > values) or {{SPARSE}} (< 2^12 values ~= 6%). Consequently blocks vary quite a > lot in size and ordinal resolving strategy. > When a sparse Numeric DocValue is needed, the code first locates the block > containing the wanted docID flag. It does so by iterating blocks one-by-one > until it reaches the needed one, where each iteration requires a lookup in > the underlying {{IndexSlice}}. For a common memory mapped index, this > translates to either a cached request or a read operation. If a segment has > 6M documents, worst-case is 91 lookups. In our web archive, our segments has > ~300M values: A worst-case of 4577 lookups! > One obvious solution is to use a lookup-table for blocks: A long[]-array with > an entry for each block. For 6M documents, that is < 1KB and would allow for > direct jumping (a single lookup) in all instances. Unfortunately this > lookup-table cannot be generated upfront when the writing of values is purely > streaming. It can be appended to the end of the stream before it is closed, > but without knowing the position of the lookup-table the reader cannot seek > to it. > One strategy for creating such a lookup-table would be to generate it during > reads and cache it for next lookup. This does not fit directly into how > {{IndexedDISI}} currently works (it is created anew for each invocation), but > could probably be added with a little work. An advantage to this is that this > does not change the underlying format and thus could be used with existing > indexes. > h2. The lookup structure inside each block > If {{ALL}} of the 2^16 values are defined, the structure is empty and the > ordinal is simply the requested docID with some modulo and multiply math. > Nothing to improve there. > If the block is {{DENSE}} (2^12 to 2^16 values are defined), a bitmap is used > and the number of set bits up to the wanted index (the docID modulo the block > origo) are counted. That bitmap is a long[1024], meaning that worst case is > to lookup and count all set bits for 1024 longs! > One known solution to this is to use a [rank > structure|[https://en.wikipedia.org/wiki/Succinct_data_structure]]. I > [implemented > it|[https://github.com/tokee/lucene-solr/blob/solr5894/solr/core/src/java/org/apache/solr/search/sparse/count/plane/RankCache.java]] > for a related project and with that (), the rank-overhead for a {{DENSE}} > block would be long[32] and would ensure a maximum of 9 lookups. It is not > trivial to build the rank-structure and caching it (assuming all blocks are
[jira] [Commented] (SOLR-12839) add a 'resort' option to JSON faceting
[ https://issues.apache.org/jira/browse/SOLR-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705178#comment-16705178 ] Yonik Seeley commented on SOLR-12839: - Yeah, I think this is OK - my main objection was going to be the name "approximate" which highly suggests that an estimate is fine. "prelim_sort" seems fine. > add a 'resort' option to JSON faceting > -- > > Key: SOLR-12839 > URL: https://issues.apache.org/jira/browse/SOLR-12839 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12839.patch, SOLR-12839.patch, SOLR-12839.patch, > SOLR-12839.patch > > > As discusssed in SOLR-9480 ... > bq. Similar to how the {{rerank}} request param allows people to collect & > score documents using a "cheap" query, and then re-score the top N using a > ore expensive query, I think it would be handy if JSON Facets supported a > {{resort}} option that could be used on any FacetRequestSorted instance right > along side the {{sort}} param, using the same JSON syntax, so that clients > could have Solr internaly sort all the facet buckets by something simple > (like count) and then "Re-Sort" the top N=limit (or maybe ( > N=limit+overrequest ?) using a more expensive function like skg() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-13024) ValueSourceAugmenter - avoid creating new FunctionValues per doc
[ https://issues.apache.org/jira/browse/SOLR-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13024: Summary: ValueSourceAugmenter - avoid creating new FunctionValues per doc (was: ValueSourceAugmenter ) > ValueSourceAugmenter - avoid creating new FunctionValues per doc > - > > Key: SOLR-13024 > URL: https://issues.apache.org/jira/browse/SOLR-13024 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.0 >Reporter: Yonik Seeley >Priority: Major > > The cutover to iterators in LUCENE-7407 caused ValueSourceAugmenter (which > handles functions in the "fl" param along side other fields) resulted in > FunctionValues being re-retrieved for every document. > Caching could cut that in half, but we should really retrieve a window at a > time in order for best performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13024) ValueSourceAugmenter
[ https://issues.apache.org/jira/browse/SOLR-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704054#comment-16704054 ] Yonik Seeley commented on SOLR-13024: - The change from LUCENE-7407: {code} git show f7aa200d40 ./solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java commit f7aa200d406dbd05a35d6116198302d90b92cb29 Author: Mike McCandless Date: Wed Sep 21 09:41:41 2016 -0400 LUCENE-7407: switch doc values usage to an iterator API, based on DocIdSetIterator, instead of random acces, freeing codecs for future diff --git a/solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java b/solr/core/src/java/org/apache/solr/response index 9edf826e2c..c37dd80bfb 100644 --- a/solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java +++ b/solr/core/src/java/org/apache/solr/response/transform/ValueSourceAugmenter.java @@ -65,7 +65,6 @@ public class ValueSourceAugmenter extends DocTransformer try { searcher = context.getSearcher(); readerContexts = searcher.getIndexReader().leaves(); - docValuesArr = new FunctionValues[readerContexts.size()]; fcontext = ValueSource.newContext(searcher); this.valueSource.createWeight(fcontext, searcher); } catch (IOException e) { @@ -76,7 +75,6 @@ public class ValueSourceAugmenter extends DocTransformer Map fcontext; SolrIndexSearcher searcher; List readerContexts; - FunctionValues docValuesArr[]; @Override public void transform(SolrDocument doc, int docid, float score) { @@ -87,11 +85,7 @@ public class ValueSourceAugmenter extends DocTransformer // TODO: calculate this stuff just once across diff functions int idx = ReaderUtil.subIndex(docid, readerContexts); LeafReaderContext rcontext = readerContexts.get(idx); - FunctionValues values = docValuesArr[idx]; - if (values == null) { -docValuesArr[idx] = values = valueSource.getValues(fcontext, rcontext); - } - + FunctionValues values = valueSource.getValues(fcontext, rcontext); int localId = docid - rcontext.docBase; setValue(doc,values.objectVal(localId)); {code} > ValueSourceAugmenter > - > > Key: SOLR-13024 > URL: https://issues.apache.org/jira/browse/SOLR-13024 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 7.0 >Reporter: Yonik Seeley >Priority: Major > > The cutover to iterators in LUCENE-7407 caused ValueSourceAugmenter (which > handles functions in the "fl" param along side other fields) resulted in > FunctionValues being re-retrieved for every document. > Caching could cut that in half, but we should really retrieve a window at a > time in order for best performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13024) ValueSourceAugmenter
Yonik Seeley created SOLR-13024: --- Summary: ValueSourceAugmenter Key: SOLR-13024 URL: https://issues.apache.org/jira/browse/SOLR-13024 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: search Affects Versions: 7.0 Reporter: Yonik Seeley The cutover to iterators in LUCENE-7407 caused ValueSourceAugmenter (which handles functions in the "fl" param along side other fields) resulted in FunctionValues being re-retrieved for every document. Caching could cut that in half, but we should really retrieve a window at a time in order for best performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13013) Change export to extract DocValues in docID order
[ https://issues.apache.org/jira/browse/SOLR-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699102#comment-16699102 ] Yonik Seeley commented on SOLR-13013: - bq. Are you thinking about making something generic? Maybe a bulk request wrapper for doc values, that temporarily re-sorts internally? Maybe a bulk request wrapper for doc values, that temporarily re-sorts internally? Yep. Something that collects out-of-order docids along with other value sources that should be internally retrieved mostly in-order. It shouldn't slow up this issue though. I just bring it up to get it on other people's radar (it's been on my TODO list for years...) and because it's related to this issue. > Change export to extract DocValues in docID order > - > > Key: SOLR-13013 > URL: https://issues.apache.org/jira/browse/SOLR-13013 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer >Affects Versions: 7.5, master (8.0) >Reporter: Toke Eskildsen >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-13013_proof_of_concept.patch, > SOLR-13013_proof_of_concept.patch > > > The streaming export writer uses a sliding window of 30,000 documents for > paging through the result set in a given sort order. Each time a window has > been calculated, the values for the export fields are retrieved from the > underlying DocValues structures in document sort order and delivered. > The iterative DocValues API introduced in Lucene/Solr 7 does not support > random access. The current export implementation bypasses this by creating a > new DocValues-iterator for each individual value to retrieve. This slows down > export as the iterator has to seek to the given docID from start for each > value. The slowdown scales with shard size (see LUCENE-8374 for details). An > alternative is to extract the DocValues in docID-order, with re-use of > DocValues-iterators. The idea is as follows: > # Change the FieldWriters for export to re-use the DocValues-iterators if > subsequent requests are for docIDs higher than the previous ones > # Calculate the sliding window of SortDocs as usual > # Take a note of the order of the SortDocs in the sliding window > # Re-sort the SortDocs in docID-order > # Extract the DocValues to a temporary on-heap structure > # Re-sort the extracted values to the original sliding window order > Deliver the values > One big difference from the current export code is of course the need to hold > the whole sliding window scaled result set in memory. This might well be a > showstopper as there is no real limit to how large this partial result set > can be. Maybe such an optimization could be requested explicitly if the user > knows that there is enough memory? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13013) Change export to extract DocValues in docID order
[ https://issues.apache.org/jira/browse/SOLR-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698258#comment-16698258 ] Yonik Seeley commented on SOLR-13013: - Great results! Retrieving results in order in batches has also been a TODO for augmenters (specifically, the ability to retrieve function query results along side field results) since they were added to Solr since some function queries needed to be accessed in order to be efficient. With the changes to iterators for docvalues, and the ability to retrieve stored fields using document values, this becomes even more important. > Change export to extract DocValues in docID order > - > > Key: SOLR-13013 > URL: https://issues.apache.org/jira/browse/SOLR-13013 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Export Writer >Affects Versions: 7.5, master (8.0) >Reporter: Toke Eskildsen >Priority: Major > Fix For: master (8.0) > > Attachments: SOLR-13013_proof_of_concept.patch > > > The streaming export writer uses a sliding window of 30,000 documents for > paging through the result set in a given sort order. Each time a window has > been calculated, the values for the export fields are retrieved from the > underlying DocValues structures in document sort order and delivered. > The iterative DocValues API introduced in Lucene/Solr 7 does not support > random access. The current export implementation bypasses this by creating a > new DocValues-iterator for each individual value to retrieve. This slows down > export as the iterator has to seek to the given docID from start for each > value. The slowdown scales with shard size (see LUCENE-8374 for details). An > alternative is to extract the DocValues in docID-order, with re-use of > DocValues-iterators. The idea is as follows: > # Change the FieldWriters for export to re-use the DocValues-iterators if > subsequent requests are for docIDs higher than the previous ones > # Calculate the sliding window of SortDocs as usual > # Take a note of the order of the SortDocs in the sliding window > # Re-sort the SortDocs in docID-order > # Extract the DocValues to a temporary on-heap structure > # Re-sort the extracted values to the original sliding window order > Deliver the values > One big difference from the current export code is of course the need to hold > the whole sliding window scaled result set in memory. This might well be a > showstopper as there is no real limit to how large this partial result set > can be. Maybe such an optimization could be requested explicitly if the user > knows that there is enough memory? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12074) Add numeric typed equivalents to StrField
[ https://issues.apache.org/jira/browse/SOLR-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691097#comment-16691097 ] Yonik Seeley commented on SOLR-12074: - bq. It'd be nifty if PointField could additionally have a Terms index for these full-precision terms instead of requiring a separate field in the schema. +1, it's important for it to be the same field in the schema for both usability, and so that solr knows how to optimize single-valued lookups. If we could turn back time, I'd argue for keeping "indexed=true" in the schema to mean normal full-text index, and then use another name for the BKD structure (rangeIndexed=true? pointIndexed=true?), but I guess that ship has sailed. So what should the name of the new flag for the schema be? valueIndexed? termIndexed? > Add numeric typed equivalents to StrField > - > > Key: SOLR-12074 > URL: https://issues.apache.org/jira/browse/SOLR-12074 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Reporter: David Smiley >Priority: Major > Labels: newdev, numeric-tries-to-points > > There ought to be numeric typed equivalents to StrField in the schema. The > TrieField types can be configured to do this with precisionStep=0, but the > TrieFields are deprecated and slated for removal in 8.0. PointFields may be > adequate for some use cases but, unlike TrieField, it's not as efficient for > simple field:value lookup queries. They probably should use the same > internal sortable full-precision term format that TrieField uses (details > currently in {{LegacyNumericUtils}} (which are used by the deprecated Trie > fields). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12632) Completely remove Trie fields
[ https://issues.apache.org/jira/browse/SOLR-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686838#comment-16686838 ] Yonik Seeley commented on SOLR-12632: - If docValues are enabled, hopefully current point fields aren't slower for things like statistics. But I could see them being slower for faceting (which uses single-value lookups for things like refinement, or calculating the domain for a sub-facet) > Completely remove Trie fields > - > > Key: SOLR-12632 > URL: https://issues.apache.org/jira/browse/SOLR-12632 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: master (8.0) > > > Trie fields were deprecated in Solr 7.0. We should remove them completely > before we release Solr 8.0. > Unresolved points-related issues: > [https://jira.apache.org/jira/issues/?jql=project=SOLR+AND+labels=numeric-tries-to-points+AND+resolution=unresolved] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12632) Completely remove Trie fields
[ https://issues.apache.org/jira/browse/SOLR-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686768#comment-16686768 ] Yonik Seeley commented on SOLR-12632: - The performance hit seems more important than exactly when deprecated functionality is removed. We should have a superior single numeric field that is better at both range queries and single value matches before we remove the existing field (trie) that can do both well. > Completely remove Trie fields > - > > Key: SOLR-12632 > URL: https://issues.apache.org/jira/browse/SOLR-12632 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: master (8.0) > > > Trie fields were deprecated in Solr 7.0. We should remove them completely > before we release Solr 8.0. > Unresolved points-related issues: > [https://jira.apache.org/jira/issues/?jql=project=SOLR+AND+labels=numeric-tries-to-points+AND+resolution=unresolved] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12638) Support atomic updates of nested/child documents for nested-enabled schema
[ https://issues.apache.org/jira/browse/SOLR-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661581#comment-16661581 ] Yonik Seeley commented on SOLR-12638: - Somewhat related: perhaps it should be best practice to include the parent document id in the child document id (with a "!" separator). Things should just then work for anyone following this convention with the default compositeRouter. For example, "id:mybook!myreview". The ability to specify _route_ explicitly should always be there of course. > Support atomic updates of nested/child documents for nested-enabled schema > -- > > Key: SOLR-12638 > URL: https://issues.apache.org/jira/browse/SOLR-12638 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-delete-old-block-no-commit.patch, > SOLR-12638-nocommit.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > I have been toying with the thought of using this transformer in conjunction > with NestedUpdateProcessor and AtomicUpdate to allow SOLR to completely > re-index the entire nested structure. This is just a thought, I am still > thinking about implementation details. Hopefully I will be able to post a > more concrete proposal soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7996) Should we require positive scores?
[ https://issues.apache.org/jira/browse/LUCENE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649548#comment-16649548 ] Yonik Seeley commented on LUCENE-7996: -- Ah, I see. Thanks for the pointer! > Should we require positive scores? > -- > > Key: LUCENE-7996 > URL: https://issues.apache.org/jira/browse/LUCENE-7996 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: master (8.0) > > Attachments: LUCENE-7996.patch, LUCENE-7996.patch, LUCENE-7996.patch > > > Having worked on MAXSCORE recently, things would be simpler if we required > that scores are positive. Practically, this would mean > - forbidding/fixing similarities that may produce negative scores (we have > some of them) > - forbidding things like negative boosts > So I'd be curious to have opinions whether this would be a sane requirement > or whether we need to be able to cope with negative scores eg. because some > similarities that we want to support produce negative scores by design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12711) Count dominating child field values
[ https://issues.apache.org/jira/browse/SOLR-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649537#comment-16649537 ] Yonik Seeley commented on SOLR-12711: - Could think of it like a block limit I guess. One way to specify would be a sort and a limit (i.e. you could select the 3 latest child documents). This could also be extended beyond blocks to buckets/domains. > Count dominating child field values > --- > > Key: SOLR-12711 > URL: https://issues.apache.org/jira/browse/SOLR-12711 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Mikhail Khludnev >Priority: Major > > h2. Context > {{uniqueBlock(_root_)}} which was introduced in SOLR-8998 allows to count > child field facet grouping hits by parents, ie hitting every parent only once. > h2. Problem > How to count only dominating child field value. ie if a product has 5 Red > skus and 2 Blue, it contributes {{Red(1)}}, {{Blue(0)}} > h2. Suggestion > Introduce {{dominatingBlock(_root_)}} which aggregate hits per parent, > chooses the dominating one and incs only it. > h2. Further Work > Judge dominating value not by number of child hits, but by the given function > value. Like pick the most popular, best selling, random child field value as > dominating. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7996) Should we require positive scores?
[ https://issues.apache.org/jira/browse/LUCENE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649521#comment-16649521 ] Yonik Seeley commented on LUCENE-7996: -- bq. If we don't require non-negative scores, then we would need some way for scorers to tell whether they may produce negative scores I assumed we already had logic to disable the optimizations for certain scorers. For example, isn't it true that if I embed an arbitrary function query today (even one with all positive scores), these optimizations are already disabled? > Should we require positive scores? > -- > > Key: LUCENE-7996 > URL: https://issues.apache.org/jira/browse/LUCENE-7996 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: master (8.0) > > Attachments: LUCENE-7996.patch, LUCENE-7996.patch, LUCENE-7996.patch > > > Having worked on MAXSCORE recently, things would be simpler if we required > that scores are positive. Practically, this would mean > - forbidding/fixing similarities that may produce negative scores (we have > some of them) > - forbidding things like negative boosts > So I'd be curious to have opinions whether this would be a sane requirement > or whether we need to be able to cope with negative scores eg. because some > similarities that we want to support produce negative scores by design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12839) add a 'resort' option to JSON faceting
[ https://issues.apache.org/jira/browse/SOLR-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649387#comment-16649387 ] Yonik Seeley commented on SOLR-12839: - bq. if "foo desc" is the primary sort, and "bar asc" is the tiebreaker, then what is being resorted on? "foo desc, bar asc 50" was an example of a single sort with tiebreak and a limit (no resort). If one wanted a single string version ";" would be the divider. For example adding a resort with a tiebreak: "foo desc, bar asc 50; baz desc, qux asc 10" bq. why/how/when would it make sense to resort multiple times? If there are use cases for starting with N sorted things and reducing that to K with a different sort, then it's just sort of recursive. Why would there be use cases for one resort and not two resorts? One use case that comes to mind are stock screens I've seen that consist of multiple sorting and "take top N" steps. Example: Sort by current dividend yield and take the top 100, then sort those by low PE and take the top 50, then sort those by total return 1 year and take the top 10. bq. or how it could work practically given the 2 phrase nature of distributed facet refinement. Hmm, good point. Over the long term I'd always imagined the number of phases could be variable, so It's more of a current implementation detail (albeit a very major one). It would currently kill the usefulness in distributed though. Anyway we don't have to worry about multiple resorts now as long as we can unambiguously upgrade if desired later (i.e. whatever the resort spec looks like, if we can unambiguously wrap an array around it later and specify multiple of them, then we're good) > add a 'resort' option to JSON faceting > -- > > Key: SOLR-12839 > URL: https://issues.apache.org/jira/browse/SOLR-12839 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12839.patch, SOLR-12839.patch > > > As discusssed in SOLR-9480 ... > bq. Similar to how the {{rerank}} request param allows people to collect & > score documents using a "cheap" query, and then re-score the top N using a > ore expensive query, I think it would be handy if JSON Facets supported a > {{resort}} option that could be used on any FacetRequestSorted instance right > along side the {{sort}} param, using the same JSON syntax, so that clients > could have Solr internaly sort all the facet buckets by something simple > (like count) and then "Re-Sort" the top N=limit (or maybe ( > N=limit+overrequest ?) using a more expensive function like skg() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12839) add a 'resort' option to JSON faceting
[ https://issues.apache.org/jira/browse/SOLR-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649244#comment-16649244 ] Yonik Seeley commented on SOLR-12839: - We should perhaps think about how to extend to N sorts instead of 2. Also keeping in mind that sort should be able to have tiebreaks someday. Brainstorming syntax: Maybe just append a number to our existing sort syntax, so we would get something like "foo desc, bar asc 50" (bar would be a tiebreak in this case) So two resorts in a row could be "field1 asc 100; field2 desc 10" or a slightly more decomposed array ["field1 asc 100","field2 desc 10"] Or given that this is just an extension of the sort syntax, it could even just go in the "sort" param itself and not bother with "resort" sort:"count desc 5" could be a synonym for sort:"count desc",limit:5 It's late and my slides for Activate aren't done take it for what it's worth ;-) > add a 'resort' option to JSON faceting > -- > > Key: SOLR-12839 > URL: https://issues.apache.org/jira/browse/SOLR-12839 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12839.patch, SOLR-12839.patch > > > As discusssed in SOLR-9480 ... > bq. Similar to how the {{rerank}} request param allows people to collect & > score documents using a "cheap" query, and then re-score the top N using a > ore expensive query, I think it would be handy if JSON Facets supported a > {{resort}} option that could be used on any FacetRequestSorted instance right > along side the {{sort}} param, using the same JSON syntax, so that clients > could have Solr internaly sort all the facet buckets by something simple > (like count) and then "Re-Sort" the top N=limit (or maybe ( > N=limit+overrequest ?) using a more expensive function like skg() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649083#comment-16649083 ] Yonik Seeley commented on SOLR-12325: - Yep, this should be pretty easy to do, following the same type of strategy as uniqueBlock. I wish we had named parameters for the function parser already... then we could use uniqueBlock(parents=type:product) and avoid another function name. > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Mikhail Khludnev >Priority: Major > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7996) Should we require positive scores?
[ https://issues.apache.org/jira/browse/LUCENE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645147#comment-16645147 ] Yonik Seeley commented on LUCENE-7996: -- bq. WAND and other optimizations were the reason why I opened this issue and moved it forward I understand why we wouldn't want to produce negative scores by default, as that would complicate or prevent such optimizations by default. What I don't understand is what we gain by prohibiting negative scores across the board. We can only do these optimizations in certain cases anyway, so we don't gain anything by prohibiting a function query (for example) from producing negative values. This would seem to limit the use cases without any corresponding gain in optimization opportunities. > Should we require positive scores? > -- > > Key: LUCENE-7996 > URL: https://issues.apache.org/jira/browse/LUCENE-7996 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: master (8.0) > > Attachments: LUCENE-7996.patch, LUCENE-7996.patch, LUCENE-7996.patch > > > Having worked on MAXSCORE recently, things would be simpler if we required > that scores are positive. Practically, this would mean > - forbidding/fixing similarities that may produce negative scores (we have > some of them) > - forbidding things like negative boosts > So I'd be curious to have opinions whether this would be a sane requirement > or whether we need to be able to cope with negative scores eg. because some > similarities that we want to support produce negative scores by design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7996) Should we require positive scores?
[ https://issues.apache.org/jira/browse/LUCENE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641096#comment-16641096 ] Yonik Seeley commented on LUCENE-7996: -- {quote}Agreed some users are going to be annoyed by the impact of this change. I wouldn't have considered it if it wasn't a requirement to get speedups in the order of what we are observing on LUCENE-4100 and LUCENE-7993. {quote} But maxscore/impact optimizations can only be used in certain circumstances anyway, right? Given that we need fallback to score-all for things that aren't supported, falling back rather than prohibiting negative scores would avoid the back compat breaks. > Should we require positive scores? > -- > > Key: LUCENE-7996 > URL: https://issues.apache.org/jira/browse/LUCENE-7996 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: master (8.0) > > Attachments: LUCENE-7996.patch, LUCENE-7996.patch, LUCENE-7996.patch > > > Having worked on MAXSCORE recently, things would be simpler if we required > that scores are positive. Practically, this would mean > - forbidding/fixing similarities that may produce negative scores (we have > some of them) > - forbidding things like negative boosts > So I'd be curious to have opinions whether this would be a sane requirement > or whether we need to be able to cope with negative scores eg. because some > similarities that we want to support produce negative scores by design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12820) Auto pick method:dvhash based on thresholds
[ https://issues.apache.org/jira/browse/SOLR-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636921#comment-16636921 ] Yonik Seeley commented on SOLR-12820: - bq. // Trying to find the cardinality for the matchingDocs would be expensive. The heuristic I had in mind would just use the cardinality of the whole field in conjunction with fcontext.base.size() For example, if one is faceting on US states (50 values) you're pretty much always going to want to use the array approach. Comparing to maxDoc isn't too meaningful here. Even though it may not be implemented yet, we should also keep multi-valued fields in mind when thinking about the API access/control for this. > Auto pick method:dvhash based on thresholds > --- > > Key: SOLR-12820 > URL: https://issues.apache.org/jira/browse/SOLR-12820 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Varun Thacker >Priority: Major > > I've worked with two users last week where explicitly using method:dvhash > improved the faceting speeds drastically. > The common theme in both the use-cases were: One collection hosting data for > multiple users. We always filter documents for one user ( therby limiting > the number of documents drastically ) and then perfoming a complex nested > JSON facet. > Both use-cases fit perfectly in this criteria that [~yo...@apache.org] > mentioed on SOLR-9142 > {quote}faceting on a string field with a high cardinality compared to it's > domain is less efficient than it could be. > {quote} > And DVHASH was the perfect optimization for these use-cases. > We are using the facet stream expression in one of the use-cases which > doesn't expose the method param. We could expose the method param to facet > stream but I feel the better approach to solve this problem would be to > address this TODO in the code withing the JSON Facet Module > {code:java} > if (mincount > 0 && prefix == null && (ntype != null || method == > FacetMethod.DVHASH)) { > // TODO can we auto-pick for strings when term cardinality is much > greater than DocSet cardinality? > // or if we don't know cardinality but DocSet size is very small > return new FacetFieldProcessorByHashDV(fcontext, this, sf);{code} > I thought about this a little and this was the approach I am thinking > currently to tackle this problem > {code:java} > int matchingDocs = fcontext.base.size(); > int totalDocs = fcontext.searcher.getIndexReader().maxDoc(); > //if matchingDocs is close to the totalDocs then we aren't filtering many > documents. > //that means the array approach would probably be better than the dvhash > approach > //Trying to find the cardinality for the matchingDocs would be expensive. > //Also for totalDocs we don't have a global cardinality present at index time > but we have a per segment cardinality > //So using the number of matches as an alternate heuristic would do the job > here?{code} > Any thoughts if this approach makes sense? it could be I'm thinking of this > approach just because both the users I worked with last week fell in this > cateogory. > > cc [~dsmiley] [~joel.bernstein] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8335) HdfsLockFactory does not allow core to come up after a node was killed
[ https://issues.apache.org/jira/browse/SOLR-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624023#comment-16624023 ] Yonik Seeley commented on SOLR-8335: OK, so for this attached patch, it looks like keeping the lock requires touching it periodically (like a lease). I'm not enough of an expert on HDFS intricacies to know if this is the best approach, but this patch has gone a year w/ no feedback. Anyone have anything to add around if this is the right approach or not? It's probably best not to introduce new dependencies (hamcrest) along with a patch unless they are really necessary though. > HdfsLockFactory does not allow core to come up after a node was killed > -- > > Key: SOLR-8335 > URL: https://issues.apache.org/jira/browse/SOLR-8335 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.2.1, 5.3, 5.3.1 >Reporter: Varun Thacker >Assignee: Mark Miller >Priority: Major > Attachments: SOLR-8335.patch > > > When using HdfsLockFactory if a node gets killed instead of a graceful > shutdown the write.lock file remains in HDFS . The next time you start the > node the core doesn't load up because of LockObtainFailedException . > I was able to reproduce this in all 5.x versions of Solr . The problem wasn't > there when I tested it in 4.10.4 > Steps to reproduce this on 5.x > 1. Create directory in HDFS : {{bin/hdfs dfs -mkdir /solr}} > 2. Start Solr: {{bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory > -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://localhost:9000/solr > -Dsolr.updatelog=hdfs://localhost:9000/solr}} > 3. Create core: {{./bin/solr create -c test -n data_driven}} > 4. Kill solr > 5. The lock file is there in HDFS and is called {{write.lock}} > 6. Start Solr again and you get a stack trace like this: > {code} > 2015-11-23 13:28:04.287 ERROR (coreLoadExecutor-6-thread-1) [ x:test] > o.a.s.c.CoreContainer Error creating core [test]: Index locked for write for > core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > org.apache.solr.common.SolrException: Index locked for write for core 'test'. > Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please > verify locks manually! > at org.apache.solr.core.SolrCore.(SolrCore.java:820) > at org.apache.solr.core.SolrCore.(SolrCore.java:659) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked > for write for core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528) > at org.apache.solr.core.SolrCore.(SolrCore.java:761) > ... 9 more > 2015-11-23 13:28:04.289 ERROR (coreContainerWorkExecutor-2-thread-1) [ ] > o.a.s.c.CoreContainer Error waiting for SolrCore to be created > java.util.concurrent.ExecutionException: > org.apache.solr.common.SolrException: Unable to create core [test] > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:472) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.solr.common.SolrException: Unable to create core [test] > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:737) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434) > ... 5 more > Caused by: org
[jira] [Commented] (LUCENE-8511) MultiFields.getIndexedFields can be optimized to not use getMergedFieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622786#comment-16622786 ] Yonik Seeley commented on LUCENE-8511: -- Looks good, +1 to avoiding getMergedFieldInfos() here! > MultiFields.getIndexedFields can be optimized to not use getMergedFieldInfos > > > Key: LUCENE-8511 > URL: https://issues.apache.org/jira/browse/LUCENE-8511 > Project: Lucene - Core > Issue Type: Improvement >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-8511.patch, LUCENE-8511.patch > > > MultiFields.getIndexedFields calls getMergedFieldInfos. But > getMergedFieldInfos is kinda heavy, doing all sorts of stuff that > getIndexedFields doesn't care about. It can simply loop the leaf readers and > collect the results into a Set. Java 8 streams should make easy work of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11836) Use -1 in bucketSizeLimit to get all facets, analogous to the JSON facet API
[ https://issues.apache.org/jira/browse/SOLR-11836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611275#comment-16611275 ] Yonik Seeley commented on SOLR-11836: - limit:-1 should work fine for JSON Facets. bq. Also when I sent -1 directly to the JSON facet API it didn't return results. I'll need to dig into why. Perhaps other code in the middle (i.e. before it gets to the JSON Facet code) manipulates that value and messes it up? TestJsonFacets randomly specifies limit:-1 so this should be well tested too: https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/search/facet/TestJsonFacets.java#L935 > Use -1 in bucketSizeLimit to get all facets, analogous to the JSON facet API > > > Key: SOLR-11836 > URL: https://issues.apache.org/jira/browse/SOLR-11836 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Alfonso Muñoz-Pomer Fuentes >Priority: Major > Labels: facet, streaming > Attachments: SOLR-11836.patch > > > Currently, to retrieve all buckets using the streaming expressions facet > function, the {{bucketSizeLimit}} parameter must have a high enough value so > that all results will be included. Compare this with the JSON facet API, > where you can use {{"limit": -1}} to achieve this. It would help if such a > possibility existed. > [Issue 11236|https://issues.apache.org/jira/browse/SOLR-11236] is related. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11598) Export Writer needs to support more than 4 Sort fields - Say 10, ideally it should not be bound at all, but 4 seems to really short sell the StreamRollup capabilities.
[ https://issues.apache.org/jira/browse/SOLR-11598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546851#comment-16546851 ] Yonik Seeley commented on SOLR-11598: - In general, we shouldn't have limits at all on stuff like this. If the performance degradation and memory use is linear, there is no trap waiting to bite someone (except for the arbitrary limit itself). > Export Writer needs to support more than 4 Sort fields - Say 10, ideally it > should not be bound at all, but 4 seems to really short sell the StreamRollup > capabilities. > --- > > Key: SOLR-11598 > URL: https://issues.apache.org/jira/browse/SOLR-11598 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Affects Versions: 6.6.1, 7.0 >Reporter: Aroop >Assignee: Varun Thacker >Priority: Major > Labels: patch > Attachments: SOLR-11598-6_6-streamtests, SOLR-11598-6_6.patch, > SOLR-11598-master.patch, SOLR-11598.patch, SOLR-11598.patch, > SOLR-11598.patch, SOLR-11598.patch, SOLR-11598.patch, SOLR-11598.patch, > streaming-export reports.xlsx > > > I am a user of Streaming and I am currently trying to use rollups on an 10 > dimensional document. > I am unable to get correct results on this query as I am bounded by the > limitation of the export handler which supports only 4 sort fields. > I do not see why this needs to be the case, as it could very well be 10 or 20. > My current needs would be satisfied with 10, but one would want to ask why > can't it be any decent integer n, beyond which we know performance degrades, > but even then it should be caveat emptor. > [~varunthacker] > Code Link: > https://github.com/apache/lucene-solr/blob/19db1df81a18e6eb2cce5be973bf2305d606a9f8/solr/core/src/java/org/apache/solr/handler/ExportWriter.java#L455 > Error > null:java.io.IOException: A max of 4 sorts can be specified > at > org.apache.solr.handler.ExportWriter.getSortDoc(ExportWriter.java:452) > at org.apache.solr.handler.ExportWriter.writeDocs(ExportWriter.java:228) > at > org.apache.solr.handler.ExportWriter.lambda$null$1(ExportWriter.java:219) > at > org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:664) > at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:333) > at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:223) > at org.apache.solr.common.util.JavaBinCodec$1.put(JavaBinCodec.java:394) > at > org.apache.solr.handler.ExportWriter.lambda$null$2(ExportWriter.java:219) > at > org.apache.solr.common.util.JavaBinCodec.writeMap(JavaBinCodec.java:437) > at > org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:354) > at > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:223) > at org.apache.solr.common.util.JavaBinCodec$1.put(JavaBinCodec.java:394) > at > org.apache.solr.handler.ExportWriter.lambda$write$3(ExportWriter.java:217) > at > org.apache.solr.common.util.JavaBinCodec.writeMap(JavaBinCodec.java:437) > at org.apache.solr.handler.ExportWriter.write(ExportWriter.java:215) > at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2601) > at > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49) > at > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.
[jira] [Commented] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537904#comment-16537904 ] Yonik Seeley commented on SOLR-12343: - Looks good, thanks for tracking that down! > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, > SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536095#comment-16536095 ] Yonik Seeley commented on SOLR-12343: - I'm occasionally getting a failure in testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN I haven't tried digging into it yet though. > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, > SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534377#comment-16534377 ] Yonik Seeley commented on SOLR-12343: - bq. it will stop returning the facet range "other" buckets completely since currently no code refines them at all Hmmm, so the patch I attached seems like it would only remove incomplete buckets in field facets under "other" buckets (i.e. if they don't actually need refining to be complete, they won't be removed by the current patch). But this could still be worse in some cases (missing vs incomplete when refinement is requested), so I agree this can wait until SOLR-12516 is done. > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, > SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-12343: --- Assignee: Yonik Seeley > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12533) Collection creation fails if metrics are called during core creation
[ https://issues.apache.org/jira/browse/SOLR-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-12533. - Resolution: Fixed Fix Version/s: 7.5 > Collection creation fails if metrics are called during core creation > > > Key: SOLR-12533 > URL: https://issues.apache.org/jira/browse/SOLR-12533 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.0 >Reporter: Peter Cseh >Priority: Major > Fix For: 7.5 > > Time Spent: 10m > Remaining Estimate: 0h > > There is a race condition in SorlCore's constructor: > - the metrics.indexSize call implicitly creates a data/index folder for that > core > - if the data/index folder exists, no segments file will be created > - the searcher won't start up if there are no segments file in the data/index > folder > This is probably the root cause for SOLR-2130 and SOLR-2801 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8378) Add DocIdSetIterator.range method
[ https://issues.apache.org/jira/browse/LUCENE-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532010#comment-16532010 ] Yonik Seeley edited comment on LUCENE-8378 at 7/3/18 10:18 PM: --- I assume it's a bug that minDoc is always returned? edit: oops, sorry, I missed the "static" in the method signature. I thought this was providing a slice of another iterator for a minute. was (Author: ysee...@gmail.com): I assume it's a bug that minDoc is always returned? > Add DocIdSetIterator.range method > - > > Key: LUCENE-8378 > URL: https://issues.apache.org/jira/browse/LUCENE-8378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Major > Attachments: LUCENE-8378.patch, LUCENE-8378.patch > > > We already have {{DocIdSetIterator.all}} and {{DocIdSetIterator.empty}} but > I'd like to also add a {{range}} method to match a specified range of docids. > E.g. this can be useful if you sort your index by a key, and then create a > custom query to match documents by values for that key, or by range > (LUCENE-7714). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8378) Add DocIdSetIterator.range method
[ https://issues.apache.org/jira/browse/LUCENE-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532010#comment-16532010 ] Yonik Seeley commented on LUCENE-8378: -- I assume it's a bug that minDoc is always returned? > Add DocIdSetIterator.range method > - > > Key: LUCENE-8378 > URL: https://issues.apache.org/jira/browse/LUCENE-8378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Major > Attachments: LUCENE-8378.patch, LUCENE-8378.patch > > > We already have {{DocIdSetIterator.all}} and {{DocIdSetIterator.empty}} but > I'd like to also add a {{range}} method to match a specified range of docids. > E.g. this can be useful if you sort your index by a key, and then create a > custom query to match documents by values for that key, or by range > (LUCENE-7714). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12343) JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530657#comment-16530657 ] Yonik Seeley commented on SOLR-12343: - I think some of what I just worked on for SOLR-12326 is related to (or can be used by) this issue. FacetRequestSortedMerger now has a "BitSet shardHasMoreBuckets" to help deal with the fact that complete buckets do not need participation from every shard. That info in conjunction with Context.sawShard should be enough to tell if a bucket is already "complete". For every bucket that isn't complete, we can either refine it, or drop it. > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12533) Collection creation fails if metrics are called during core creation
[ https://issues.apache.org/jira/browse/SOLR-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530356#comment-16530356 ] Yonik Seeley commented on SOLR-12533: - These changes look good to me. I plan on committing after unit tests finish running. > Collection creation fails if metrics are called during core creation > > > Key: SOLR-12533 > URL: https://issues.apache.org/jira/browse/SOLR-12533 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.0 >Reporter: Peter Cseh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > There is a race condition in SorlCore's constructor: > - the metrics.indexSize call implicitly creates a data/index folder for that > core > - if the data/index folder exists, no segments file will be created > - the searcher won't start up if there are no segments file in the data/index > folder > This is probably the root cause for SOLR-2130 and SOLR-2801 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12326) Unnecessary refinement requests
[ https://issues.apache.org/jira/browse/SOLR-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-12326. - Resolution: Fixed Fix Version/s: 7.5 > Unnecessary refinement requests > --- > > Key: SOLR-12326 > URL: https://issues.apache.org/jira/browse/SOLR-12326 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.5 > > Attachments: SOLR-12326.patch, SOLR-12326.patch > > > TestJsonFacets.testStatsDistrib() appears to result in more refinement > requests than would otherwise be expected. Those tests were developed before > refinement was implemented and hence do not need refinement to generate > correct results due to limited numbers of buckets. This should be detectable > by refinement code in the majority of cases to prevent extra work from being > done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-12326) Unnecessary refinement requests
[ https://issues.apache.org/jira/browse/SOLR-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-12326: --- Assignee: Yonik Seeley Attachment: SOLR-12326.patch Draft patch attached. TestJsonFacetRefinement still fails, I assume because not all field faceting implementations return "more" yet. More tests to be added as well. > Unnecessary refinement requests > --- > > Key: SOLR-12326 > URL: https://issues.apache.org/jira/browse/SOLR-12326 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12326.patch > > > TestJsonFacets.testStatsDistrib() appears to result in more refinement > requests than would otherwise be expected. Those tests were developed before > refinement was implemented and hence do not need refinement to generate > correct results due to limited numbers of buckets. This should be detectable > by refinement code in the majority of cases to prevent extra work from being > done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12326) Unnecessary refinement requests
[ https://issues.apache.org/jira/browse/SOLR-12326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519493#comment-16519493 ] Yonik Seeley commented on SOLR-12326: - One part of the solution is for the request merger to know if a shard has more buckets. If it knows the exact amount of over-request used, then it can figure it out. This is a little more fragile though, and I could envision future optimizations that dynamically change the amount of over-request based on things like heuristics, field statistics on that shard, and results of previous requests. For that reason, I'm planning on just passing back more:true for field facets that have more values. > Unnecessary refinement requests > --- > > Key: SOLR-12326 > URL: https://issues.apache.org/jira/browse/SOLR-12326 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Yonik Seeley >Priority: Major > > TestJsonFacets.testStatsDistrib() appears to result in more refinement > requests than would otherwise be expected. Those tests were developed before > refinement was implemented and hence do not need refinement to generate > correct results due to limited numbers of buckets. This should be detectable > by refinement code in the majority of cases to prevent extra work from being > done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11216) Make PeerSync more robust
[ https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514244#comment-16514244 ] Yonik Seeley commented on SOLR-11216: - {quote} SolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams()); request is not safely closed, is this intentional? won't this break the reference count mechanism? {quote} Yeah, it does look like it should be closed. A SolrQueryRequest grabs a searcher reference on-demand, so that may be why it isn't causing an issue with any tests (the commit command doesn't grab a searcher reference with the provided request). It should be fixed anyway though. > Make PeerSync more robust > - > > Key: SOLR-11216 > URL: https://issues.apache.org/jira/browse/SOLR-11216 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Priority: Major > Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch > > > First of all, I will change the issue's title with a better name when I have. > When digging into SOLR-10126. I found a case that can make peerSync fail. > * leader and replica receive update from 1 to 4 > * replica stop > * replica miss updates 5, 6 > * replica start recovery > ## replica buffer updates 7, 8 > ## replica request versions from leader, > ## in the same time leader receive update 9, so it will return updates from 1 > to 9 (for request versions) when replica get recent versions ( so it will be > 1,2,3,4,5,6,7,8,9 ) > ## replica do peersync and request updates 5, 6, 9 from leader > ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and > maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will > fail > My idea here is why replica request update 9 (step 6) while it knows that > updates with lower version ( update 7, 8 ) are on its buffering tlog. Should > we request only updates that lower than the lowest update in its buffering > tlog ( < 7 )? > Someone my ask that what if replica won't receive update 9. In that case, > leader will put the replica into LIR state, so replica will run recovery > process again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8359) Extend ToParentBlockJoinQuery with 'minimum matched children' functionality
[ https://issues.apache.org/jira/browse/LUCENE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513889#comment-16513889 ] Yonik Seeley commented on LUCENE-8359: -- I haven't had a chance to look at the patch, but +1 for the idea of adding the high level functionality! > Extend ToParentBlockJoinQuery with 'minimum matched children' functionality > > > Key: LUCENE-8359 > URL: https://issues.apache.org/jira/browse/LUCENE-8359 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Andrey Kudryavtsev >Priority: Minor > Labels: lucene > Attachments: LUCENE-8359 > > > I have a hierarchal data in index and requirements like 'match parent only if > at least {{n}} his children were matched'. > I used to solve it by combination of some lucene / solr tricks like 'frange' > filtration by sum of matched children score, so it's doable out of the box > with some efforts right now. But also it could be solved by > \{{ToParentBlockJoinQuery}} extension with new numeric parameter, tried to do > it in attached patch. > Not sure if this should be in main branch, just put it here, maybe someone > would have similar problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511238#comment-16511238 ] Yonik Seeley commented on SOLR-9685: bq. I'm confused about what's happening here as this was resolved again without the docs being updated I had reopened the issue to fix the bug that was found (not for the docs), and resolved again after the fix was committed. > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685-doc.patch, > SOLR-9685.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-9685. Resolution: Fixed OK, I also modified a test to test for the {"#tag":{"lucene" case. Right now, excludeTags only works on top-level filters, so we can only test that the syntax works for now on these sub-queries I think. > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685-doc.patch, > SOLR-9685.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509660#comment-16509660 ] Yonik Seeley commented on SOLR-9685: Attached draft patch to fix the issue of tagged queries on sub-parsers. > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch, > SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-9685: --- Attachment: SOLR-9685.patch > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch, > SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11779) Basic long-term collection of aggregated metrics
[ https://issues.apache.org/jira/browse/SOLR-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509609#comment-16509609 ] Yonik Seeley commented on SOLR-11779: - I'd consider it a minor bug for our default configs to be throwing exceptions by default when nothing is wrong. I'd suggest that this should not be a WARN level message (and definitely shouldn't log an exception). The text of the log message could be changed to remove the word Error as well, since it's not an error case. Perhaps "No .system collection, keeping metrics history in memory" > Basic long-term collection of aggregated metrics > > > Key: SOLR-11779 > URL: https://issues.apache.org/jira/browse/SOLR-11779 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.3, master (8.0) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-11779.patch, SOLR-11779.patch, SOLR-11779.patch, > SOLR-11779.patch, c1.png, c2.png, core.json, d1.png, d2.png, d3.png, > jvm-list.json, jvm-string.json, jvm.json, o1.png, u1.png > > > Tracking the key metrics over time is very helpful in understanding the > cluster and user behavior. > Currently even basic metrics tracking requires setting up an external system > and either polling {{/admin/metrics}} or using {{SolrMetricReporter}}-s. The > advantage of this setup is that these external tools usually provide a lot of > sophisticated functionality. The downside is that they don't ship out of the > box with Solr and require additional admin effort to set up. > Solr could collect some of the key metrics and keep their historical values > in a round-robin database (eg. using RRD4j) to keep the size of the historic > data constant (eg. ~64kB per metric), but at the same providing out of the > box useful insights into the basic system behavior over time. This data could > be persisted to the {{.system}} collection as blobs, and it could be also > presented in the Admin UI as graphs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11779) Basic long-term collection of aggregated metrics
[ https://issues.apache.org/jira/browse/SOLR-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509136#comment-16509136 ] Yonik Seeley commented on SOLR-11779: - I don't know if it's this issue or a related issue, but all basic tests as well as "bin/solr start" now throw the following exception: {code} 2018-06-12 03:45:57.146 WARN (main) [ ] o.a.s.h.a.MetricsHistoryHandler Error querying .system collection, keeping metrics history in memory org.apache.solr.common.SolrException: No such core: .system at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:161) ~[solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[solr-solrj-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:12] at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) ~[solr-solrj-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:12] at org.apache.solr.handler.admin.MetricsHistoryHandler.checkSystemCollection(MetricsHistoryHandler.java:282) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.handler.admin.MetricsHistoryHandler.(MetricsHistoryHandler.java:235) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.core.CoreContainer.createMetricsHistoryHandler(CoreContainer.java:780) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.core.CoreContainer.load(CoreContainer.java:578) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:252) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:172) [solr-core-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT 7773bf67643a152e1d12bed253345a40ef14f0e9 - yonik - 2018-06-11 20:14:07] at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:741) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:374) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497) [jetty-webapp-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459) [jetty-webapp-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:785) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:287) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545) [jetty-webapp-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:192) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:505) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:151) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:453) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64) [jetty-deploy-9.4.10.v20180503.jar:9.4.10.v20180503] at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610) [jetty-util-9.4.
[jira] [Commented] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509052#comment-16509052 ] Yonik Seeley commented on SOLR-9685: Stepping through with the debugger, it looks like this is the type of local-params string being built: {code} {!bool should={!tag=MYTAG}id:1 should=$_tt0 } {code} So we need to use variables for parameters here as well. > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509002#comment-16509002 ] Yonik Seeley commented on SOLR-9685: Here's one of the simplest examples of a query that fails to parse: {code} curl http://localhost:8983/solr/techproducts/query -d ' { query:{bool:{ must:{"#TOP" : "text:memory"} }} }' {code} {code} { "responseHeader":{ "status":400, "QTime":8, "params":{ "json":" {\n query:{bool:{\nmust:{\"#TOP\" : \"text:memory\"}\n }}\n}"}}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.search.SyntaxError"], "msg":"org.apache.solr.search.SyntaxError: Missing end to unquoted value starting at 6 str='{!tag=TOP'", "code":400}} {code} > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reopened SOLR-9685: > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9685) tag a query in JSON syntax
[ https://issues.apache.org/jira/browse/SOLR-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508753#comment-16508753 ] Yonik Seeley commented on SOLR-9685: Looks like escaping bugs when producing the local-params variant from the JSON one. If possible, this should be fixed for 7.4. > tag a query in JSON syntax > -- > > Key: SOLR-9685 > URL: https://issues.apache.org/jira/browse/SOLR-9685 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module, JSON Request API >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4, master (8.0) > > Attachments: SOLR-9685-doc.patch, SOLR-9685.patch, SOLR-9685.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > There should be a way to tag a query/filter in JSON syntax. > Perhaps these two forms could be equivalent: > {code} > "{!tag=COLOR}color:blue" > { tagged : { COLOR : "color:blue" } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans
[ https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499542#comment-16499542 ] Yonik Seeley commented on SOLR-5211: If we check for \_root\_ in the index, everything could be back compat (and avoid the need for schema update + reindex). If parent-child docs are being used, then updates could use 2 update terms (one for id and one for \_root\_) > updating parent as childless makes old children orphans > --- > > Key: SOLR-5211 > URL: https://issues.apache.org/jira/browse/SOLR-5211 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.5, 6.0 >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-5211.patch, SOLR-5211.patch > > > if I have parent with children in the index, I can send update omitting > children. as a result old children become orphaned. > I suppose separate \_root_ fields makes much trouble. I propose to extend > notion of uniqueKey, and let it spans across blocks that makes updates > unambiguous. > WDYT? Do you like to see a test proves this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
[ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499147#comment-16499147 ] Yonik Seeley commented on SOLR-12366: - Nice catch, this stuff has been broken forever! Looking back, I think not enough was exposed to be able to work per-segment, so Lucene's MultiReader.isDeleted(int doc) did a binary search each time. Once we gained the ability to operate per-segment, some code wasn't converted. {quote}IMO some callers of SolrIndexSearcher.getSlowAtomicReader should change to use MultiFields to avoid the temptation to have a LeafReader that has many slow methods. {quote} MultiFields has slow methods as well, and if you look at the histories, many places used MultiFields.getDeletedDocs even before (and were replaced with the equivalent?) For example, commit 6ffc159b40 changed getFirstMatch to use MultiFields.getDeletedDocs (which may not have been a bug since it probably was equivalent at the time?) Anyway, I think perhaps we should throw an exception for any place in SlowCompositeReaderWrapper that exposes code that does a binary search. We don't need a full Reader implementation here I think. A variable name change for "SolrIndexSearcher.leafReader" would really be welcome too... it's a bad name. We've been bit by the naming before as well: SOLR-9592 > Avoid SlowAtomicReader.getLiveDocs -- it's slow > --- > > Key: SOLR-12366 > URL: https://issues.apache.org/jira/browse/SOLR-12366 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12366.patch, SOLR-12366.patch, SOLR-12366.patch, > SOLR-12366.patch > > > SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) > is slow as it uses a binary search for each lookup. There are various places > in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the > liveDocs. Most of these places ought to work with SolrIndexSearcher's > getLiveDocs method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans
[ https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499072#comment-16499072 ] Yonik Seeley commented on SOLR-5211: bq. Or maybe your comment is how do we handle an existing index before this rule existed? More as an alternative direction that would not require the rule (that every document have root), only those with children (as is done today). We constantly get dinged on usability because of things that require static configuration, and this is yet another (that would require reindexing even) > updating parent as childless makes old children orphans > --- > > Key: SOLR-5211 > URL: https://issues.apache.org/jira/browse/SOLR-5211 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.5, 6.0 >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-5211.patch > > > if I have parent with children in the index, I can send update omitting > children. as a result old children become orphaned. > I suppose separate \_root_ fields makes much trouble. I propose to extend > notion of uniqueKey, and let it spans across blocks that makes updates > unambiguous. > WDYT? Do you like to see a test proves this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5211) updating parent as childless makes old children orphans
[ https://issues.apache.org/jira/browse/SOLR-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498627#comment-16498627 ] Yonik Seeley commented on SOLR-5211: It should be relatively trivial to know if the \_root\_ field exists in the index (i.e. when any parent/child groups exist) and do the right thing based on that. > updating parent as childless makes old children orphans > --- > > Key: SOLR-5211 > URL: https://issues.apache.org/jira/browse/SOLR-5211 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.5, 6.0 >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: SOLR-5211.patch > > > if I have parent with children in the index, I can send update omitting > children. as a result old children become orphaned. > I suppose separate \_root_ fields makes much trouble. I propose to extend > notion of uniqueKey, and let it spans across blocks that makes updates > unambiguous. > WDYT? Do you like to see a test proves this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12374) Add SolrCore.withSearcher(lambda accepting SolrIndexSearcher)
[ https://issues.apache.org/jira/browse/SOLR-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495231#comment-16495231 ] Yonik Seeley commented on SOLR-12374: - The CHANGES for 7.4 has: * SOLR-12374: SnapShooter.getIndexCommit can forget to decref the searcher; though it's not clear in practice when. (David Smiley) But it's missing on the master branch... > Add SolrCore.withSearcher(lambda accepting SolrIndexSearcher) > - > > Key: SOLR-12374 > URL: https://issues.apache.org/jira/browse/SOLR-12374 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Fix For: 7.4 > > Attachments: SOLR-12374.patch > > > I propose adding the following to SolrCore: > {code:java} > /** >* Executes the lambda with the {@link SolrIndexSearcher}. This is more > convenience than using >* {@link #getSearcher()} since there is no ref-counting business to worry > about. >* Example: >* >* IndexReader reader = > h.getCore().withSearcher(SolrIndexSearcher::getIndexReader); >* >*/ > @SuppressWarnings("unchecked") > public R withSearcher(Function lambda) { > final RefCounted refCounted = getSearcher(); > try { > return lambda.apply(refCounted.get()); > } finally { > refCounted.decref(); > } > } > {code} > This is a nice tight convenience method, avoiding the clumsy RefCounted API > which is easy to accidentally incorrectly use – see > https://issues.apache.org/jira/browse/SOLR-11616?focusedCommentId=16477719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16477719 > I guess my only (small) concern is if hypothetically you might make the > lambda short because it's easy to do that (see the one-liner example above) > but the object you return that you're interested in (say IndexReader) could > potentially become invalid if the SolrIndexSearcher closes. But I think/hope > that's impossible normally based on when this getSearcher() used? I could at > least add a warning to the docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-12417) velocity response writer v.json should enforce valid function name
[ https://issues.apache.org/jira/browse/SOLR-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-12417: --- Assignee: Yonik Seeley > velocity response writer v.json should enforce valid function name > -- > > Key: SOLR-12417 > URL: https://issues.apache.org/jira/browse/SOLR-12417 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: VelocityResponseWriter should enforce that v.json > parameter is just a function name >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Major > Attachments: SOLR-12417.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12417) velocity response writer v.json should enforce valid function name
[ https://issues.apache.org/jira/browse/SOLR-12417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-12417: Attachment: SOLR-12417.patch > velocity response writer v.json should enforce valid function name > -- > > Key: SOLR-12417 > URL: https://issues.apache.org/jira/browse/SOLR-12417 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Environment: VelocityResponseWriter should enforce that v.json > parameter is just a function name >Reporter: Yonik Seeley >Priority: Major > Attachments: SOLR-12417.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12417) velocity response writer v.json should enforce valid function name
Yonik Seeley created SOLR-12417: --- Summary: velocity response writer v.json should enforce valid function name Key: SOLR-12417 URL: https://issues.apache.org/jira/browse/SOLR-12417 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Environment: VelocityResponseWriter should enforce that v.json parameter is just a function name Reporter: Yonik Seeley -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12328) Adding graph json facet domain change
[ https://issues.apache.org/jira/browse/SOLR-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-12328. - Resolution: Fixed Fix Version/s: 7.4 > Adding graph json facet domain change > - > > Key: SOLR-12328 > URL: https://issues.apache.org/jira/browse/SOLR-12328 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 7.3 >Reporter: Daniel Meehl >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12328.patch > > > Json facets now support join queries via domain change. I've made a > relatively small enhancement to add graph to the mix. I'll attach a patch for > your viewing. I'm hoping this can be merged into solr proper. Please let me > know if there are any problems/changes/requirements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12328) Adding graph json facet domain change
[ https://issues.apache.org/jira/browse/SOLR-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492204#comment-16492204 ] Yonik Seeley commented on SOLR-12328: - I fixed up the null traversal filter noted, consolidated the tests, and committed. Thanks! > Adding graph json facet domain change > - > > Key: SOLR-12328 > URL: https://issues.apache.org/jira/browse/SOLR-12328 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Affects Versions: 7.3 >Reporter: Daniel Meehl >Assignee: Yonik Seeley >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12328.patch > > > Json facets now support join queries via domain change. I've made a > relatively small enhancement to add graph to the mix. I'll attach a patch for > your viewing. I'm hoping this can be merged into solr proper. Please let me > know if there are any problems/changes/requirements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org