[
https://issues.apache.org/jira/browse/PHOENIX-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ujjawal Kumar updated PHOENIX-7367:
-----------------------------------
Description:
HBASE-28401 had a regression due to which HRegion#close throws NPE while trying
to close the memstore within the mapper
Due to this, snapshot based MR jobs have started failing in phoenix.
This is due to the fact that TableSnapshotResultIterator ends up trying to
release the read lock twice via HRegion#closeRegionOperation
* TableSnapshotResultIterator's next method [calls ScanningResultIterator's
next
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
* ScanningResultIterator's [next tries to close the SnapshotScanner
early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
* Within [SnapshotScanner's close
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
** HRegion#closeRegionOperation released the read lock and was successful
** HRegion#close which threw IOException due to memstore issue (HBASE-28401)
** SnapshotScanner catches the IOException but doesn't set region field to
null
* TableSnapshotResultIterator's [finally block calls ScanningResultIterator's
close
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
** *ScanningResultIterator's close is called again*
** *Since region field wasn't null,* *HRegion#closeRegionOperation is called
again and throws IllegalMonitorStateException while trying to release the read
lock*
*
** The IllegalMonitorStateException then causes the whole mapper to fail
It doesn't cause failure while doing snapshot reads via HBase (ref HBASE-28743
where same NPE was observed but mapper still passes)
, because the closest equivalent code (RecordReader within
TableSnapshotInputFormat) doesn't tries to close the region [as part of it's
nextKeyValue
method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
This is generally much safer [because record readers are always closed
explicitly (even if mapper's run method
fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
There are 2 improvements that can be done here :
1. Disable mslab for region created within snapshot (by setting
hbase.hregion.memstore.mslab.enabled set to false)
2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close (via
ScanningResultIterator) called within next method. It would anyways be closed
by the mapper at the end
was:
HBASE-28401 had a regression due to which HRegion#close throws NPE while trying
to close the memstore within the mapper
Due to this, snapshot based MR jobs have started failing in phoenix.
This is due to the fact that TableSnapshotResultIterator ends up trying to
release the read lock twice via HRegion#closeRegionOperation
* TableSnapshotResultIterator's next method [calls ScanningResultIterator's
next
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
* ScanningResultIterator's [next tries to close the SnapshotScanner
early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
* Within [SnapshotScanner's close
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
*
** HRegion#closeRegionOperation released the read lock and was successful
** HRegion#close which threw IOException due to memstore issue (HBASE-28401)
** SnapshotScanner catches the IOException but doesn't set region field to
null
* TableSnapshotResultIterator's [finally block calls ScanningResultIterator's
close
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
*
** *ScanningResultIterator's close is called again*
** *Since region field wasn't null,* *HRegion#closeRegionOperation is called
again and throws IllegalMonitorStateException while trying to release the read
lock*
*
** The IllegalMonitorStateException then causes the whole mapper to fail
It doesn't cause failure while doing snapshot reads via HBase (ref HBASE-28743
where same NPE was observed but mapper still passes)
, because the closest equivalent code (RecordReader within
TableSnapshotInputFormat) doesn't tries to close the region [as part of it's
nextKeyValue
method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
This is generally much safer [because record readers are always closed
explicitly (even if mapper's run method
fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
There are 2 improvements that can be done here :
1. Disable mslab for region created within snapshot (by setting
hbase.hregion.memstore.mslab.enabled set to false)
2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close (via
ScanningResultIterator) called within next method. It would anyways be closed
by the mapper at the end
> Snapshot based mapreduce jobs fails after HBASE-28401
> -----------------------------------------------------
>
> Key: PHOENIX-7367
> URL: https://issues.apache.org/jira/browse/PHOENIX-7367
> Project: Phoenix
> Issue Type: Bug
> Reporter: Ujjawal Kumar
> Priority: Major
>
> HBASE-28401 had a regression due to which HRegion#close throws NPE while
> trying to close the memstore within the mapper
> Due to this, snapshot based MR jobs have started failing in phoenix.
> This is due to the fact that TableSnapshotResultIterator ends up trying to
> release the read lock twice via HRegion#closeRegionOperation
> * TableSnapshotResultIterator's next method [calls ScanningResultIterator's
> next
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
> * ScanningResultIterator's [next tries to close the SnapshotScanner
> early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
> * Within [SnapshotScanner's close
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
>
> ** HRegion#closeRegionOperation released the read lock and was successful
> ** HRegion#close which threw IOException due to memstore issue (HBASE-28401)
> ** SnapshotScanner catches the IOException but doesn't set region field to
> null
> * TableSnapshotResultIterator's [finally block calls
> ScanningResultIterator's close
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
> ** *ScanningResultIterator's close is called again*
> ** *Since region field wasn't null,* *HRegion#closeRegionOperation is called
> again and throws IllegalMonitorStateException while trying to release the
> read lock*
> *
> ** The IllegalMonitorStateException then causes the whole mapper to fail
> It doesn't cause failure while doing snapshot reads via HBase (ref
> HBASE-28743 where same NPE was observed but mapper still passes)
> , because the closest equivalent code (RecordReader within
> TableSnapshotInputFormat) doesn't tries to close the region [as part of it's
> nextKeyValue
> method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
>
> This is generally much safer [because record readers are always closed
> explicitly (even if mapper's run method
> fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
> There are 2 improvements that can be done here :
> 1. Disable mslab for region created within snapshot (by setting
> hbase.hregion.memstore.mslab.enabled set to false)
> 2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close
> (via ScanningResultIterator) called within next method. It would anyways be
> closed by the mapper at the end
--
This message was sent by Atlassian Jira
(v8.20.10#820010)