[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280622#comment-13280622 ]
Hudson commented on HBASE-5757: ------------------------------- Integrated in HBase-0.92 #415 (See [https://builds.apache.org/job/HBase-0.92/415/]) HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341205) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java > TableInputFormat should handle as many errors as possible > --------------------------------------------------------- > > Key: HBASE-5757 > URL: https://issues.apache.org/jira/browse/HBASE-5757 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce > Affects Versions: 0.90.6 > Reporter: Jan Lukavsky > Assignee: Jan Lukavsky > Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 > > Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, > HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch > > > Prior to HBASE-4196 there was different handling of IOExceptions thrown from > scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this > handling so that if exception is caught a reconnect is attempted (without > bothering the mapred client). After that, HBASE-4269 changed this behavior > back, but in both mapred and mapreduce APIs. The question is, is there any > reason not to handle all errors that the input format can handle? In other > words, why not try to reissue the request after *any* IOException? I see the > following disadvantages of current approach > * the client may see exceptions like LeaseException and > ScannerTimeoutException if he fails to process all fetched data in timeout > * to avoid ScannerTimeoutException the client must raise > hbase.regionserver.lease.period > * timeouts for tasks is aready configured in mapred.task.timeout, so this > seems to me a bit redundant, because typically one needs to update both these > parameters > * I don't see any possibility to get rid of LeaseException (this is > configured on server side) > I think all of these issues would be gone, if the DoNotRetryIOException would > not be rethrown. -On the other hand, handling errors in InputFormat has > disadvantage, that it may hide from the user some inefficiency. Eg. if I have > very big scanner.caching, and I manage to process only a few rows in timeout, > I will end up with single row being fetched many times (and will not be > explicitly notified about this). Could we solve this problem by adding some > counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira