[ https://issues.apache.org/jira/browse/HBASE-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902780#comment-13902780 ]
Lars Hofhansl commented on HBASE-10546: --------------------------------------- For 0.94, would it be safer to leave the initialize calls in place? I find it hard looking at the code that this will not cause any issues if somebody subclassed any of the involved classes. Seems safer to just have the change in restart (in the worst case we create a close a scanner per input split). > Two scanner objects are open for each hbase map task but only one scanner > object is closed > ------------------------------------------------------------------------------------------ > > Key: HBASE-10546 > URL: https://issues.apache.org/jira/browse/HBASE-10546 > Project: HBase > Issue Type: Bug > Reporter: Vasu Mariyala > Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17 > > Attachments: 0.94-HBASE-10546.patch, trunk-HBASE-10546.patch > > > Map reduce framework calls createRecordReader of the > TableInputFormat/MultiTableInputFormat to get the record reader instance. In > this method, we are initializing the TableRecordReaderImpl (restart method). > This initializes the scanner object. After this, map reduce framework calls > initialize on the RecordReader. In our case, this calls restart of the > TableRecordReaderImpl again. Here, it doesn't close the first scanner. At the > end of the task, only the second scanner object is closed. Because of this, > the smallest read point of HRegion is affected. > We don't need to initialize the RecordReader in the createRecordReader method > and we need to close the scanner object in the restart method. (incase if the > restart method is called because of exceptions in the nextKeyValue method) -- This message was sent by Atlassian JIRA (v6.1.5#6160)