[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-08-13 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906103#comment-16906103
 ] 

mosh commented on SOLR-13125:
-

{quote}What you would need to do is somehow influence the futures that solr is 
waiting on to return early and empty once your request has been filled up from 
the most recent collections. {quote}

After further investigation it seems like currently HttpShardHandler#take:288 
is designed to wait(block) for all shard requests.
{code:java}private ShardResponse take(boolean bailOnError) {

while (pending.size() > 0) {
  try {
Future future = completionService.take();
pending.remove(future);
ShardResponse rsp = future.get();
if (bailOnError && rsp.getException() != null) return rsp; // if 
exception, return immediately
// add response to the response list... we do this after the take() and
// not after the completion of "call" so we know when the last response
// for a request was received.  Otherwise we might return the same
// request more than once.
rsp.getShardRequest().responses.add(rsp);
if (rsp.getShardRequest().responses.size() == 
rsp.getShardRequest().actualShards.length) {
  return rsp;
}
  } catch (InterruptedException e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
  } catch (ExecutionException e) {
// should be impossible... the problem with catching the exception
// at this level is we don't know what ShardRequest it applied to
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
"Impossible Exception",e);
  }
}
return null;
  }{code}
This seems counter intuitive to our goal of "short-circuiting" shard requests, 
once sufficient documents have been returned.
HttpShardHandler also seems out of scope for SearchComponents, at least using 
the current design, in which SearchComponents are coupled to a SearchHandler.

[~gus_heck], WDYT?
Do you have any idea of how this should be designed without significant code 
changes?
Bear in mind, this discussion only concerns optimizing TRA queries, and could 
be made abstract in due time (resembling the way RoutedAliases were 
implemented).

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch, SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-07-16 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886733#comment-16886733
 ] 

Lucene/Solr QA commented on SOLR-13125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
22s{color} | {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 22s{color} 
| {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} Release audit (RAT) {color} | {color:red}  
1m 22s{color} | {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} Check forbidden APIs {color} | {color:red} 
 1m 22s{color} | {color:red} core in the patch failed. {color} |
| {color:red}-1{color} | {color:red} Validate source patterns {color} | 
{color:red}  1m 22s{color} | {color:red} core in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 24s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  3m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956490/SOLR-13125.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 19c78dd |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | LTS |
| compile | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-compile-solr_core.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-compile-solr_core.txt
 |
| Release audit (RAT) | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-compile-solr_core.txt
 |
| Check forbidden APIs | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-compile-solr_core.txt
 |
| Validate source patterns | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-compile-solr_core.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/493/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch, SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or excee

[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-07-16 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886151#comment-16886151
 ] 

Gus Heck commented on SOLR-13125:
-

The idea behind this patch is interesting. Unless I misunderstand the intent, 
the idea is to short circuit the response collection when the TRA collection 
names tell us that further responses will all return docs that are too far down 
the result list to ever be included. Unfortunately I don't think this patch 
does that. Issues I see:
 * This patch overrides finishStage() instead of handleResponses, which means 
that by the time your logic runs all responses have been received.
 * I don't see logic to handle values for the start parameter
 * Also not sure I like the tests checking debug messages rather than actual 
code behavior. That could get out of sync.

In any case, it's unclear to me if this can be handled in a search component 
without core changes, even if you override handleResponses() instead, you can't 
stop SearchHandler from looping and attempting to take() the results of every 
request that was sent (unless you throw an exception, but that wont be good). 
What you would need to do is somehow influence the futures that solr is waiting 
on to return early and empty once your request has been filled up from the most 
recent collections. (see 
org/apache/solr/handler/component/HttpShardHandler.java:281). Baring that, you 
could perhaps find a way to empty the pending queue, but that means you still 
have to wait for at least one uninteresting request to complete. The futures 
themselves would be waiting on the 
org/apache/solr/handler/component/HttpShardHandler.java:201. call to 
makeLoadBalancedRequest(), so I think this optimization requires the addition 
of an explicit short-circuit enabling hook. Possibly this could be a new method 
for SearchComponents to override, but we need to think some about how that 
would play with assumptions of existing code some. 

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch, SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-27 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753619#comment-16753619
 ] 

Lucene/Solr QA commented on SOLR-13125:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
38s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  2m 57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  2m 57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  2m 57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 77m  
5s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956490/SOLR-13125.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 8e69d12 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/272/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/272/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch, SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-24 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751783#comment-16751783
 ] 

Lucene/Solr QA commented on SOLR-13125:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 
38s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12956132/SOLR-13125.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / c317119 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/267/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/267/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-

[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-24 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750904#comment-16750904
 ] 

mosh commented on SOLR-13125:
-

Uploaded a new patch which replaces the use of Collections#sort with 
ReverseListIterator, for better performance.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, 
> SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-17 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744974#comment-16744974
 ] 

Lucene/Solr QA commented on SOLR-13125:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 
19s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12955049/SOLR-13125.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / a16f083 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/262/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/262/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

--

[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-15 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743706#comment-16743706
 ] 

mosh commented on SOLR-13125:
-

Added a patch which introduces a new query component.
The new query component only runs in cloud mode.
It does the following, in this particular order, and falls back in case any of 
the conditions are not met:

  _Stage Start:_
 # Checks whether an alias is queried.
 # Checks whether the alias is a TRA.
 # Checks whether sort was specified and is sorted by router.field(field TRA 
uses to route docs)

  _Finish Execute Query:_
 # gets all collections in TRA, sorted by date(descending or ascending, depends 
on the sort specified in query).
 # Count responses by collection, and add to whitelist.
 # If limit was not reached, drop requests from shards not in the whitelist.

The patch also contains a new test suite TRAQueryComponentTest,
which tests the behavior of this new SearchComponent.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-14 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742062#comment-16742062
 ] 

mosh commented on SOLR-13125:
-

Just added a no commit patch, which is up to date with the linked PR.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
> Attachments: SOLR-13125-no-commit.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-08 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737089#comment-16737089
 ] 

mosh commented on SOLR-13125:
-

{quote}However, we could optimize knowing which shards to fetch docs from for 
the top-X if we know how many docs matched the query in the first phase{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
How would you tackle this?
I was thinking a new SearchComponent could suffice.
WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved 
in the client, but no indication of router.name is sent.
Perhaps this should be changed so SolrJ requests are easier to interpret by 
SolrCloud nodes, eliminating the need to check Zookeeper for _router.name_.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

2019-01-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735803#comment-16735803
 ] 

David Smiley commented on SOLR-13125:
-

This does seem like a common case for TRAs – sort by time.  The code that does 
distributed search might _somehow_ look at the shards it resolves and 
understand there is some ordering.  Outright skipping some later shards changes 
the semantics (response would have different numFound) and thus it would no 
longer be an optimization.  However, we could optimize knowing which shards to 
fetch docs from for the top-X if we know how many docs matched the query in the 
first phase – and no longer return the doc IDs in that phase as is done now.  
Just an idea.

> Optimize Queries when sorting by router.field
> -
>
> Key: SOLR-13125
> URL: https://issues.apache.org/jira/browse/SOLR-13125
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Minor
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, 
> with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger 
> nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA 
> uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart 
> enough to wait for the more recent collections, and in case the limit was 
> reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than 
> router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the 
> alias.
> But to optimize this particular type of query, Solr could wait for the most 
> recent collection in the TRA, see whether the result set matches or exceeds 
> the limit. If so, the query could be returned to the user without waiting for 
> the rest of the shards. If not, the issuing node will block until the second 
> query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only 
> skipping to the next once there are no more results in the specified 
> collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org