[ https://issues.apache.org/jira/browse/PHOENIX-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated PHOENIX-7367: ---------------------------------- Fix Version/s: 5.2.1 5.3.0 > Snapshot based mapreduce jobs fails after HBASE-28401 > ----------------------------------------------------- > > Key: PHOENIX-7367 > URL: https://issues.apache.org/jira/browse/PHOENIX-7367 > Project: Phoenix > Issue Type: Bug > Reporter: Ujjawal Kumar > Assignee: Ujjawal Kumar > Priority: Major > Fix For: 5.2.1, 5.3.0 > > Attachments: Screenshot 2024-07-19 at 8.18.06 PM.png, Screenshot > 2024-07-19 at 8.18.25 PM.png > > > HBASE-28401 had a regression due to which HRegion#close throws NPE while > trying to close the memstore within the mapper > Due to this, snapshot based MR jobs have started failing in phoenix. > This is due to the fact that TableSnapshotResultIterator ends up trying to > release the read lock twice via HRegion#closeRegionOperation > * TableSnapshotResultIterator's next method [calls ScanningResultIterator's > next > method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180]. > * > ** ScanningResultIterator's [next tries to close the SnapshotScanner > early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225] > ** Within [SnapshotScanner's close > method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187] > * > ** > *** HRegion#closeRegionOperation released the read lock and was successful > *** HRegion#close which threw IOException due to memstore issue > (HBASE-28401) > *** SnapshotScanner catches the IOException but doesn't set region field to > null > * TableSnapshotResultIterator's [finally block calls > ScanningResultIterator's close > method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190]. > * > ** > *** *ScanningResultIterator's close is called again* > *** *Since region field wasn't null,* *HRegion#closeRegionOperation is > called again and throws IllegalMonitorStateException while trying to release > the read lock* > * > ** > *** The IllegalMonitorStateException then causes the whole mapper to fail > It doesn't cause failure while doing snapshot reads via HBase (ref > HBASE-28743 where same NPE was observed but mapper still passes) > , because the closest equivalent code (RecordReader within > TableSnapshotInputFormat) doesn't tries to close the region [as part of it's > nextKeyValue > method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280]. > > This is generally much safer [because record readers are always closed > explicitly (even if mapper's run method > fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481] > There are 2 improvements that can be done here : > 1. Disable mslab for region created within snapshot (by setting > hbase.hregion.memstore.mslab.enabled set to false) > 2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close > (via ScanningResultIterator) called within next method. It would anyways be > closed by the mapper at the end -- This message was sent by Atlassian Jira (v8.20.10#820010)