[jira] [Updated] (PHOENIX-7367) Snapshot based mapreduce jobs fails after HBASE-28401

Ujjawal Kumar (Jira) Fri, 19 Jul 2024 08:19:43 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ujjawal Kumar updated PHOENIX-7367:
-----------------------------------
    Description: 
HBASE-28401 had a regression due to which HRegion#close throws NPE while trying 
to close the memstore within the mapper

Due to this, snapshot based MR jobs have started failing in phoenix. 

This is due to the fact that TableSnapshotResultIterator ends up trying to 
release the read lock twice via HRegion#closeRegionOperation 
 * TableSnapshotResultIterator's next method [calls ScanningResultIterator's 
next 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].

 * ScanningResultIterator's [next tries to close the SnapshotScanner 
early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
 * Within [SnapshotScanner's close 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]

 
 **  HRegion#closeRegionOperation released the read lock and was successful
 **  HRegion#close which threw IOException due to memstore issue (HBASE-28401)
 **  SnapshotScanner catches the IOException but doesn't set region field to 
null

 * TableSnapshotResultIterator's [finally block calls ScanningResultIterator's 
close 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].


 ** *ScanningResultIterator's close is called again*
 ** *Since region field wasn't null,* *HRegion#closeRegionOperation is called 
again and throws IllegalMonitorStateException while trying to release the read 
lock*

 * 
 ** The IllegalMonitorStateException then causes the whole mapper to fail

It doesn't cause failure while doing snapshot reads via HBase (ref HBASE-28743 
where same NPE was observed but mapper still passes)
, because the closest equivalent code (RecordReader within 
TableSnapshotInputFormat) doesn't tries to close the region [as part of it's 
nextKeyValue 
method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
  
This is generally much safer [because record readers are always closed 
explicitly (even if mapper's run method 
fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]

There are 2 improvements that can be done here : 
1. Disable mslab for region created within snapshot (by setting 
hbase.hregion.memstore.mslab.enabled set to false)
2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close (via 
ScanningResultIterator) called within next method. It would anyways be closed 
by the mapper at the end

  was:
HBASE-28401 had a regression due to which HRegion#close throws NPE while trying 
to close the memstore within the mapper

Due to this, snapshot based MR jobs have started failing in phoenix. 

This is due to the fact that TableSnapshotResultIterator ends up trying to 
release the read lock twice via HRegion#closeRegionOperation 
 * TableSnapshotResultIterator's next method [calls ScanningResultIterator's 
next 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].

 * ScanningResultIterator's [next tries to close the SnapshotScanner 
early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
 * Within [SnapshotScanner's close 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]

 * 
 **  HRegion#closeRegionOperation released the read lock and was successful
 **  HRegion#close which threw IOException due to memstore issue (HBASE-28401)
 **  SnapshotScanner catches the IOException but doesn't set region field to 
null

 * TableSnapshotResultIterator's [finally block calls ScanningResultIterator's 
close 
method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].

 * 
 ** *ScanningResultIterator's close is called again*
 ** *Since region field wasn't null,* *HRegion#closeRegionOperation is called 
again and throws IllegalMonitorStateException while trying to release the read 
lock*

 * 
 ** The IllegalMonitorStateException then causes the whole mapper to fail

It doesn't cause failure while doing snapshot reads via HBase (ref HBASE-28743 
where same NPE was observed but mapper still passes)
, because the closest equivalent code (RecordReader within 
TableSnapshotInputFormat) doesn't tries to close the region [as part of it's 
nextKeyValue 
method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
  
This is generally much safer [because record readers are always closed 
explicitly (even if mapper's run method 
fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]

There are 2 improvements that can be done here : 
1. Disable mslab for region created within snapshot (by setting 
hbase.hregion.memstore.mslab.enabled set to false)
2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close (via 
ScanningResultIterator) called within next method. It would anyways be closed 
by the mapper at the end


> Snapshot based mapreduce jobs fails after HBASE-28401
> -----------------------------------------------------
>
>                 Key: PHOENIX-7367
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7367
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Ujjawal Kumar
>            Priority: Major
>
> HBASE-28401 had a regression due to which HRegion#close throws NPE while 
> trying to close the memstore within the mapper
> Due to this, snapshot based MR jobs have started failing in phoenix. 
> This is due to the fact that TableSnapshotResultIterator ends up trying to 
> release the read lock twice via HRegion#closeRegionOperation 
>  * TableSnapshotResultIterator's next method [calls ScanningResultIterator's 
> next 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
>  * ScanningResultIterator's [next tries to close the SnapshotScanner 
> early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
>  * Within [SnapshotScanner's close 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
>  
>  **  HRegion#closeRegionOperation released the read lock and was successful
>  **  HRegion#close which threw IOException due to memstore issue (HBASE-28401)
>  **  SnapshotScanner catches the IOException but doesn't set region field to 
> null
>  * TableSnapshotResultIterator's [finally block calls 
> ScanningResultIterator's close 
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
>  ** *ScanningResultIterator's close is called again*
>  ** *Since region field wasn't null,* *HRegion#closeRegionOperation is called 
> again and throws IllegalMonitorStateException while trying to release the 
> read lock*
>  * 
>  ** The IllegalMonitorStateException then causes the whole mapper to fail
> It doesn't cause failure while doing snapshot reads via HBase (ref 
> HBASE-28743 where same NPE was observed but mapper still passes)
> , because the closest equivalent code (RecordReader within 
> TableSnapshotInputFormat) doesn't tries to close the region [as part of it's 
> nextKeyValue 
> method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
>   
> This is generally much safer [because record readers are always closed 
> explicitly (even if mapper's run method 
> fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
> There are 2 improvements that can be done here : 
> 1. Disable mslab for region created within snapshot (by setting 
> hbase.hregion.memstore.mslab.enabled set to false)
> 2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close 
> (via ScanningResultIterator) called within next method. It would anyways be 
> closed by the mapper at the end



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7367) Snapshot based mapreduce jobs fails after HBASE-28401

Reply via email to