[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479326#comment-13479326
 ] 

stack commented on HBASE-5974:
------------------------------

Thanks for reminding me about this important issue Anoop.   Lets get it into 
trunk at least (Should be a blocker on 0.96).

Minor: Do you foresee the sequencing being used other than in scanners (Its 
implemented in ScannerCallable only at moment)?  If NOT, would 
OutOfOrderNextException or OutOfOrderScannerNextException be a better name than 
CallSequenceOutOfOrderException?  If you do this, would suggest changing param 
names from callSeq to nextSeq?  The former is generic.

Is CallSequenceOutOfOrderException an exception that will bubble up into the 
application or is the client its intended audience?  If so, should we annotate 
it @InterfaceAudience.Private as Abortable is for instance?

{code}
+              if ((cause == null || (!(cause instanceof 
NotServingRegionException)
+                  && !(cause instanceof RegionServerStoppedException)))
+                  && !(e instanceof CallSequenceOutOfOrderException)) {
{code}

Would it make sense testing ! DoNotRetryIOException rather than calling out the 
two exceptions above?  Would that be too broad?  Otherwise, wondering if these 
two exceptions should inherit from a common base class given they are getting 
this special treatment.  Not important.  Just a thought.

Check your comments.  You seem to be saying 'scan' when you mean 'next'.

For example, this comment:

{code}
+            // increment the callSeq which will be getting used for the next 
time scan() call to
+            // the RS.In case of a timeout this increment should not happen so 
that the next
+            // trial also will be done with the same callSeq.
{code}

Regards the above comment, is there a sentence missing on what happens if we go 
back w/ the same callseq?  Maybe have another go at this whole comment... it 
could be a bit clearer.  Thanks.

Committing this set of comments.... more to come.




                
> Scanner retry behavior with RPC timeout on next() seems incorrect
> -----------------------------------------------------------------
>
>                 Key: HBASE-5974
>                 URL: https://issues.apache.org/jira/browse/HBASE-5974
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver
>    Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
>            Reporter: Todd Lipcon
>            Assignee: Anoop Sam John
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 5974_94-V4.patch, 5974_trunk.patch, 5974_trunk-V2.patch, 
> HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch, HBASE-5974_94-V3.patch
>
>
> I'm seeing the following behavior:
> - set RPC timeout to a short value
> - call next() for some batch of rows, big enough so the client times out 
> before the result is returned
> - the HConnectionManager stuff will retry the next() call to the same server. 
> At this point, one of two things can happen: 1) the previous next() call will 
> still be processing, in which case you get a LeaseException, because it was 
> removed from the map during the processing, or 2) the next() call will 
> succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to