[ https://issues.apache.org/jira/browse/HBASE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896236#comment-13896236 ]
stack commented on HBASE-10452: ------------------------------- Thanks for looking into this [~d.yuan] All looks good to me caveat what Andy says -- just abort if you can't release the lock (something is really wrong if you can't release a lock). > Potential bugs in exception handlers > ------------------------------------ > > Key: HBASE-10452 > URL: https://issues.apache.org/jira/browse/HBASE-10452 > Project: HBase > Issue Type: Bug > Components: Client, master, regionserver, util > Affects Versions: 0.96.1 > Reporter: Ding Yuan > Attachments: HBase-10452-trunk-v2.patch, HBase-10452-trunk.patch > > > Hi HBase developers, > We are a group of researchers on software reliability. Recently we did a > study and found that majority of the most severe failures in HBase are caused > by bugs in exception handling logic -- that it is hard to anticipate all the > possible real-world error scenarios. Therefore we built a simple checking > tool that automatically detects some bug patterns that have caused some very > severe real-world failures. I am reporting some of the results here. Any > feedback is much appreciated! > Ding > ========================= > Case 1: > Line: 134, File: > "org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java" > {noformat} > protected void releaseTableLock() { > if (this.tableLock != null) { > try { > this.tableLock.release(); > } catch (IOException ex) { > LOG.warn("Could not release the table lock", ex); > //TODO: if we get here, and not abort RS, this lock will never be > released > } > } > {noformat} > The lock is not released if the exception occurs, causing potential deadlock > or starvation. > Similar code pattern can be found at: > Line: 135, File: "org/apache/hadoop/hbase/regionserver/SplitRequest.java" > ========================================== > ========================= > Case 2: > Line: 252, File: > "org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java" > {noformat} > try { > Field fEnd = SequenceFile.Reader.class.getDeclaredField("end"); > fEnd.setAccessible(true); > end = fEnd.getLong(this.reader); > } catch(Exception e) { /* reflection fail. keep going */ } > {noformat} > The caught Exception seems to be too general. > While reflection-related errors might be harmless, the try block can throw > other exceptions including "SecurityException", "IllegalAccessException", > etc. Currently > all those exceptions are ignored. Maybe > the safe way is to ignore the specific reflection-related errors while > logging and > handling other types of unexpected exceptions. > ========================================== > ========================= > Case 3: > Line: 148, File: "org/apache/hadoop/hbase/HBaseConfiguration.java" > {noformat} > try { > if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) { > isShowConf = true; > } > } catch (Exception e) { > } > {noformat} > Similar to the previous case, the exception handling is too general. While > ClassNotFound error might be the normal case and ignored, Class.forName can > also throw other exceptions (e.g., LinkageError) under some unexpected and > rare error cases. If that happens, the error will be lost. So maybe change it > to below: > {noformat} > try { > if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) { > isShowConf = true; > } > } catch (LinkageError e) { > LOG.warn(".."); > // handle linkage error > } catch (ExceptionInInitializerError e) { > LOG.warn(".."); > // handle Initializer error > } catch (ClassNotFoundException e) { > LOG.debug(".."); > // ignore > } > {noformat} > ========================================== > ========================= > Case 4: > Line: 163, File: "org/apache/hadoop/hbase/client/Get.java" > {noformat} > public Get setTimeStamp(long timestamp) { > try { > tr = new TimeRange(timestamp, timestamp+1); > } catch(IOException e) { > // Will never happen > } > return this; > } > {noformat} > Even if the IOException never happens right now, is it possible to happen in > the future due to code change? > At least there should be a log message. The current behavior is dangerous > since if the exception ever happens > in any unexpected scenario, it will be silently swallowed. > Similar code pattern can be found at: > Line: 300, File: "org/apache/hadoop/hbase/client/Scan.java" > ========================================== > ========================= > Case 5: > Line: 207, File: "org/apache/hadoop/hbase/util/JVM.java" > {noformat} > if (input != null){ > try { > input.close(); > } catch (IOException ignored) { > } > } > {noformat} > Any exception encountered in close is completely ignored, not even logged. > In particular, the same exception scenario was handled differently in other > methods in the same file: > Line: 154, same file > {noformat} > if (in != null){ > try { > in.close(); > } catch (IOException e) { > LOG.warn("Not able to close the InputStream", e); > } > } > {noformat} > Line: 248, same file > {noformat} > if (in != null){ > try { > in.close(); > } catch (IOException e) { > LOG.warn("Not able to close the InputStream", e); > } > } > {noformat} > ========================================== > ========================= > Case 6: empty handler for exception: java.io.IOException > Line: 312, File: "org/apache/hadoop/hbase/rest/RowResource.java" > {noformat} > } finally { > if (table != null) try { > table.close(); > } catch (IOException ioe) { } > } > {noformat} > IOException is completely ignored. This behavior is inconsistent with the same > code snippet at line 249 in the same file, where the IOExceptions was logged: > {noformat} > } finally { > if (table != null) try { > table.close(); > } catch (IOException ioe) { > LOG.debug("Exception received while closing the table", ioe); > } > } > {noformat} > ========================================== > ========================= > Case 7: > Line: 95, File: > "org/apache/hadoop/hbase/master/handler/EnableTableHandler.java" > {noformat} > try { > this.assignmentManager.getZKTable().removeEnablingTable(tableName, > true); > } catch (KeeperException e) { > // TODO : Use HBCK to clear such nodes > LOG.warn("Failed to delete the ENABLING node for the table " + > tableName > + ". The table will remain unusable. Run HBCK to manually fix > the problem."); > } > {noformat} > The log message in the exception handler and the comment seem to suggest that > such nodes should be cleared using HBCK. > ========================================== > ========================= > Case 8: > Line: 463, File: "org/apache/hadoop/hbase/client/ClientScanner.java" > {noformat} > try { > this.caller.callWithRetries(callable); > } catch (IOException e) { > // We used to catch this error, interpret, and rethrow. However, we > // have since decided that it's not nice for a scanner's close to > // throw exceptions. Chances are it was just an UnknownScanner > // exception due to lease time out. > } > {noformat} > Currently the handler is empty because it may be caused by "just an > UnknownScanner exception". > But what if it has other causes? Maybe the catch block can differentiate the > exception into > different causes, ignoring the ones caused by UnknownScanner while handle > others differently? > ========================================== -- This message was sent by Atlassian JIRA (v6.1.5#6160)