[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271197#comment-13271197
 ] 

Jan Lukavsky commented on HBASE-5757:
-------------------------------------

Hi Jon,

I'm not sure, but IMO the purpose of DoNotRetryIOException is to instruct the 
HTable client not to retry the request. In TableInputFormat we are working on 
higher level, so retrying is OK. DNRIOEx is to distinguish exceptions that 
might be caused by region reassignment for instance, and that might disappear 
if the request is resent (and possibly dropping the cached region location and 
quering .META. again). UnknonwnScannerException on the other hand will not 
'disapper' if the *same* request is sent by HTable client. But in the 
InputFormat we can restart the scanner, and so we will not send the same 
request, hence it can succeed.

Retrying the request just once and then giving up is to avoid infinite cycles, 
and mostly it suffices to retry just once, because a typical cause of the 
UnknownScannerException or LeaseException is too slow Mapper (there could be 
other like scanning for too sparse column, but this will not be solved by this 
issue :)). There is possibility to lower scanner caching, but this might be 
inefficient (eg. when the 99.99% of time the caching is just OK, and then there 
exists some strange records, that take the Mapper longer to process). Lowering 
the caching globally just because of these few records doesn't sound like the 
'correct' solution.


                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
> scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
> handling so that if exception is caught a reconnect is attempted (without 
> bothering the mapred client). After that, HBASE-4269 changed this behavior 
> back, but in both mapred and mapreduce APIs. The question is, is there any 
> reason not to handle all errors that the input format can handle? In other 
> words, why not try to reissue the request after *any* IOException? I see the 
> following disadvantages of current approach
>  * the client may see exceptions like LeaseException and 
> ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise 
> hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
> seems to me a bit redundant, because typically one needs to update both these 
> parameters
>  * I don't see any possibility to get rid of LeaseException (this is 
> configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would 
> not be rethrown. -On the other hand, handling errors in InputFormat has 
> disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
> very big scanner.caching, and I manage to process only a few rows in timeout, 
> I will end up with single row being fetched many times (and will not be 
> explicitly notified about this). Could we solve this problem by adding some 
> counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to