[ https://issues.apache.org/jira/browse/HBASE-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902900#comment-13902900 ]
Lars Hofhansl commented on HBASE-10546: --------------------------------------- Also see HBASE-6069, which *added* the initialize we're removing now (to avoid an NPE with Hive). With that in mind, we should not remove the initialize() calls. Let's just fix restart(...) > Two scanner objects are open for each hbase map task but only one scanner > object is closed > ------------------------------------------------------------------------------------------ > > Key: HBASE-10546 > URL: https://issues.apache.org/jira/browse/HBASE-10546 > Project: HBase > Issue Type: Bug > Reporter: Vasu Mariyala > Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 > > Attachments: 0.94-HBASE-10546.patch, trunk-HBASE-10546.patch > > > Map reduce framework calls createRecordReader of the > TableInputFormat/MultiTableInputFormat to get the record reader instance. In > this method, we are initializing the TableRecordReaderImpl (restart method). > This initializes the scanner object. After this, map reduce framework calls > initialize on the RecordReader. In our case, this calls restart of the > TableRecordReaderImpl again. Here, it doesn't close the first scanner. At the > end of the task, only the second scanner object is closed. Because of this, > the smallest read point of HRegion is affected. > We don't need to initialize the RecordReader in the createRecordReader method > and we need to close the scanner object in the restart method. (incase if the > restart method is called because of exceptions in the nextKeyValue method) -- This message was sent by Atlassian JIRA (v6.1.5#6160)