[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-21 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r474794540



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -206,7 +206,7 @@ The default value of this parameter is blank, which causes 
no extra "explain inf
 
 == timeAllowed Parameter
 
-This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.
+This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<>, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.

Review comment:
   I added a notice to `omitHeader`, (WAND == segmentTerminateEarly?)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-21 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r474794540



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -206,7 +206,7 @@ The default value of this parameter is blank, which causes 
no extra "explain inf
 
 == timeAllowed Parameter
 
-This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.
+This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<>, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.

Review comment:
   I added a notice to `omitHeader`

##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -252,7 +252,7 @@ The default value of this parameter is `false`.
 
 This parameter may be set to either `true` or `false`.
 
-If set to `true`, this parameter excludes the header from the returned 
results. The header contains information about the request, such as the time it 
took to complete. The default value for this parameter is `false`.
+If set to `true`, this parameter excludes the header from the returned 
results. The header contains information about the request, such as the time it 
took to complete. The default value for this parameter is `false`. When using 
parameters such as 
<>, and 
<>,
 which can lead to partial results, it is advisable to keep the keep the 
header, so that the `partialResults` flag can be checked, and values such as 
`numFound`, `nextCursorMark`, <> counts, and 
result <> can be 
interpreted in the context of partial results.

Review comment:
   https://user-images.githubusercontent.com/685141/90911861-98c03b80-e3a7-11ea-910a-d1f9d8d266f6.png";>
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-21 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r474780615



##
File path: solr/core/src/test/org/apache/solr/CursorPagingTest.java
##
@@ -499,6 +492,61 @@ public void testSimple() throws Exception {
   ));
   }
 
+  /**
+   * test that timeAllowed parameter can be used with cursors
+   * uses DelayingSearchComponent in solrconfig-deeppaging.xml
+   */
+  public void testTimeAllowed() throws Exception {
+String wontExceedTimeout = "1";
+int numDocs = 100;
+// Create a bunch of docs, inspired by createIndex in 
ExitableDirectoryReaderTest
+for (int i = 0; i < numDocs; i++) {
+  assertU(adoc("id", Integer.toString(i), "name", "a" + i + " b" + i + " 
c" + i + " d"+i + " e" + i));

Review comment:
   I ended up keeping my own docs generation routine, because I wanted to 
do a little more with it:
   1. more than 100 docs [Bram's 
patch](https://issues.apache.org/jira/secure/attachment/13010039/SOLR-14413-bram.patch)
 used 1000, so I decided to use the same.
   1. generate an expected set of ids at the same time, so that I could confirm 
all results were paged through
   1. randomize the id insert order
   
   I took your advice about simplifying the name field.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-21 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r474778543



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -206,7 +206,7 @@ The default value of this parameter is blank, which causes 
no extra "explain inf
 
 == timeAllowed Parameter
 
-This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.
+This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`. 
When using `timeAllowed` in combination with 
<>, and the 
`partialResults` flag is present, `cursorMark` can match `nextCursorMark` even 
if there may be more results.

Review comment:
   
![image](https://user-images.githubusercontent.com/685141/90909168-644a8080-e3a3-11ea-8d32-53e3e4ae2564.png)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-21 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r47487



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -206,7 +206,7 @@ The default value of this parameter is blank, which causes 
no extra "explain inf
 
 == timeAllowed Parameter
 
-This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.
+This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`. 
When using `timeAllowed` in combination with 
<>, and the 
`partialResults` flag is present, `cursorMark` can match `nextCursorMark` even 
if there may be more results.

Review comment:
   
![image](https://user-images.githubusercontent.com/685141/90909033-32d1b500-e3a3-11ea-8ed4-2fb09a463248.png)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-19 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r473305818



##
File path: solr/solr-ref-guide/src/common-query-parameters.adoc
##
@@ -206,7 +206,7 @@ The default value of this parameter is blank, which causes 
no extra "explain inf
 
 == timeAllowed Parameter
 
-This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.
+This parameter specifies the amount of time, in milliseconds, allowed for a 
search to complete. If this time expires before the search is complete, any 
partial results will be returned, but values such as `numFound`, 
<>, 
<> counts, and result 
<> may not be accurate for 
the entire result set. In case of expiration, if `omitHeader` isn't set to 
`true` the response header contains a special flag called `partialResults`.

Review comment:
   I was just adding to the existing list there.  I'll pull it out into its 
own sentence.
   
   In my opinion, `timeAllowed+cursorMark+omitHeader` should be allowed, and 
SOLR shouldn't be second-guessing the caller's request:
   
   1. The combination `shards.tolerant+cursorMark` has the same issue, and 
`shards.tolerant+cursorMark+omitHeader` is allowed.
   2. There may be a reasonable scenario that simply doesn't call for it, even 
if it is advisable to keep the header.  Let's say you are implementing a 
product catalog, and:
   - For whatever reason, you don't need the header, you rely on the http 
status code
   - You want to use cursorMarks to allow people to scroll through your 
fine list of shoes or whatever
   - You want to time-bound that resultset and protect your infrastructure 
from long-running queries
   - It just simply isn't mission critical for that every user to see every 
shoe.  If some rare expensive query or latency-producing event occurs, and 
truncates your shoe list, it is okay (you probably wouldn't bother retrying if 
you knew partialResults were true anyway).
   
   My feeling is also a bit of a reaction to being annoyed about SolrCloud 
preventing us from creating collections without replicas, because we had 
legitimate reasons for standing up the replicas ourselves (thankfully, there is 
a workaround).
   
   
   Then again, I say this as someone who doesn't have to answer Solr customer 
questions, so my words are just words.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-19 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r473302488



##
File path: solr/core/src/test/org/apache/solr/CursorPagingTest.java
##
@@ -499,6 +492,61 @@ public void testSimple() throws Exception {
   ));
   }
 
+  /**
+   * test that timeAllowed parameter can be used with cursors
+   * uses DelayingSearchComponent in solrconfig-deeppaging.xml
+   */
+  public void testTimeAllowed() throws Exception {
+String wontExceedTimeout = "1";
+int numDocs = 100;
+// Create a bunch of docs, inspired by createIndex in 
ExitableDirectoryReaderTest
+for (int i = 0; i < numDocs; i++) {
+  assertU(adoc("id", Integer.toString(i), "name", "a" + i + " b" + i + " 
c" + i + " d"+i + " e" + i));
+  if (random().nextInt(numDocs) == 0) {
+assertU(commit());  // sometimes make multiple segments
+  }
+}
+assertU(commit());
+
+String cursorMark;
+SolrParams params = null;
+
+// start by triggering partial results - set the timeAllowed
+// lower than the sleep used in DelayingSearchComponent
+// NB: this only seems to work reliably if results havent
+// been returned yet.
+SolrParams partialParams = params("q", "name:a*",
+"fl", "id",
+"sort", "id desc",
+"rows", "2",
+"sleep", "10",
+"timeAllowed", "1");
+
+cursorMark = CURSOR_MARK_START;
+
+// assertCursor will confirm the nextCursorMark was set
+String partialCursorMark = assertCursor(req(partialParams, 
CURSOR_MARK_PARAM, cursorMark)
+, "/responseHeader/partialResults==true]"
+);
+
+// we can continue on, paging as normal
+cursorMark = partialCursorMark;
+params = params("q", "*:*",
+"fl", "id",
+"sort", "id desc",
+"rows", "2",
+"timeAllowed", wontExceedTimeout);
+cursorMark = assertCursor(req(params, CURSOR_MARK_PARAM, cursorMark)
+, "/response/numFound==100"
+);
+assertNotEquals(cursorMark, partialCursorMark);
+
+cursorMark = assertCursor(req(params, CURSOR_MARK_PARAM, cursorMark)

Review comment:
   good idea, maybe I'll run the cursor to the end, adding up the returned 
docs along the way, making sure we got them all, which would verify the cursor 
behavior, taking inspiration from Bram's patch: 
https://issues.apache.org/jira/secure/attachment/13010039/SOLR-14413-bram.patch





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] slackhappy commented on a change in pull request #1436: SOLR-14413: allow timeAllowed and cursorMark parameters

2020-08-19 Thread GitBox


slackhappy commented on a change in pull request #1436:
URL: https://github.com/apache/lucene-solr/pull/1436#discussion_r473300877



##
File path: solr/core/src/test/org/apache/solr/CursorPagingTest.java
##
@@ -499,6 +492,61 @@ public void testSimple() throws Exception {
   ));
   }
 
+  /**
+   * test that timeAllowed parameter can be used with cursors
+   * uses DelayingSearchComponent in solrconfig-deeppaging.xml
+   */
+  public void testTimeAllowed() throws Exception {
+String wontExceedTimeout = "1";
+int numDocs = 100;
+// Create a bunch of docs, inspired by createIndex in 
ExitableDirectoryReaderTest
+for (int i = 0; i < numDocs; i++) {
+  assertU(adoc("id", Integer.toString(i), "name", "a" + i + " b" + i + " 
c" + i + " d"+i + " e" + i));

Review comment:
   great point, I'll use ExitableDirectoryReaderTest.createIndex directly.  
In my testing, it seemed to be useful to use the higher complexity prefix 
search to have a better guarantee of triggering partialResults with 
timeAllowed.  I suspect that is what ExitableDirectoryReaderTest to use that 
strategy, but to your point, `"a" + i` would be sufficient.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org