[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280157#comment-13280157
 ] 

Zhihong Yu commented on HBASE-5757:
-----------------------------------

@Jan:
Neither patch applies to trunk as of today.
Can you attach patch for trunk and name it accordingly ?

Thanks
                
> TableInputFormat should handle as many errors as possible
> ---------------------------------------------------------
>
>                 Key: HBASE-5757
>                 URL: https://issues.apache.org/jira/browse/HBASE-5757
>             Project: HBase
>          Issue Type: Bug
>          Components: mapred, mapreduce
>    Affects Versions: 0.90.6
>            Reporter: Jan Lukavsky
>         Attachments: HBASE-5757.patch, HBASE-5757.patch
>
>
> Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
> scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
> handling so that if exception is caught a reconnect is attempted (without 
> bothering the mapred client). After that, HBASE-4269 changed this behavior 
> back, but in both mapred and mapreduce APIs. The question is, is there any 
> reason not to handle all errors that the input format can handle? In other 
> words, why not try to reissue the request after *any* IOException? I see the 
> following disadvantages of current approach
>  * the client may see exceptions like LeaseException and 
> ScannerTimeoutException if he fails to process all fetched data in timeout
>  * to avoid ScannerTimeoutException the client must raise 
> hbase.regionserver.lease.period
>  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
> seems to me a bit redundant, because typically one needs to update both these 
> parameters
>  * I don't see any possibility to get rid of LeaseException (this is 
> configured on server side)
> I think all of these issues would be gone, if the DoNotRetryIOException would 
> not be rethrown. -On the other hand, handling errors in InputFormat has 
> disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
> very big scanner.caching, and I manage to process only a few rows in timeout, 
> I will end up with single row being fetched many times (and will not be 
> explicitly notified about this). Could we solve this problem by adding some 
> counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to