[ 
https://issues.apache.org/jira/browse/PHOENIX-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated PHOENIX-7367:
----------------------------------
    Fix Version/s: 5.2.1
                   5.3.0

> Snapshot based mapreduce jobs fails after HBASE-28401
> -----------------------------------------------------
>
>                 Key: PHOENIX-7367
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7367
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Ujjawal Kumar
>            Assignee: Ujjawal Kumar
>            Priority: Major
>             Fix For: 5.2.1, 5.3.0
>
>         Attachments: Screenshot 2024-07-19 at 8.18.06 PM.png, Screenshot 
> 2024-07-19 at 8.18.25 PM.png
>
>
> HBASE-28401 had a regression due to which HRegion#close throws NPE while 
> trying to close the memstore within the mapper
> Due to this, snapshot based MR jobs have started failing in phoenix. 
> This is due to the fact that TableSnapshotResultIterator ends up trying to 
> release the read lock twice via HRegion#closeRegionOperation 
>  * TableSnapshotResultIterator's next method [calls ScanningResultIterator's 
> next 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
>  * 
>  ** ScanningResultIterator's [next tries to close the SnapshotScanner 
> early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
>  ** Within [SnapshotScanner's close 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
>  * 
>  ** 
>  ***  HRegion#closeRegionOperation released the read lock and was successful
>  ***  HRegion#close which threw IOException due to memstore issue 
> (HBASE-28401)
>  ***  SnapshotScanner catches the IOException but doesn't set region field to 
> null
>  * TableSnapshotResultIterator's [finally block calls 
> ScanningResultIterator's close 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
>  * 
>  ** 
>  *** *ScanningResultIterator's close is called again*
>  *** *Since region field wasn't null,* *HRegion#closeRegionOperation is 
> called again and throws IllegalMonitorStateException while trying to release 
> the read lock*
>  * 
>  ** 
>  *** The IllegalMonitorStateException then causes the whole mapper to fail
> It doesn't cause failure while doing snapshot reads via HBase (ref 
> HBASE-28743 where same NPE was observed but mapper still passes)
> , because the closest equivalent code (RecordReader within 
> TableSnapshotInputFormat) doesn't tries to close the region [as part of it's 
> nextKeyValue 
> method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
>   
> This is generally much safer [because record readers are always closed 
> explicitly (even if mapper's run method 
> fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
> There are 2 improvements that can be done here : 
> 1. Disable mslab for region created within snapshot (by setting 
> hbase.hregion.memstore.mslab.enabled set to false)
> 2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close 
> (via ScanningResultIterator) called within next method. It would anyways be 
> closed by the mapper at the end



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to