[ 
https://issues.apache.org/jira/browse/HBASE-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950266#comment-17950266
 ] 

Ujjawal Kumar commented on HBASE-29295:
---------------------------------------

In our production use case, we saw around 2400 input split of 
TableSnapshotInputFormat each with size 5.4 MB (mainly coming from scan's 
base64 encoded representation) leading to memory footprint of 13 GB on a client 
JVM with 16 GB max heap size. 

> Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's 
> split objects
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-29295
>                 URL: https://issues.apache.org/jira/browse/HBASE-29295
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, snapshots
>            Reporter: Ujjawal Kumar
>            Priority: Major
>         Attachments: Screenshot 2025-05-08 at 6.26.15 PM.png
>
>
> Similar to HBASE-24859, we have seen that while performing reads via snapshot 
> in an MR job the memory consumption increases a lot on the client side.
> It happens due to the same reason mentioned in the HBASE-24859, scan is 
> embedded within TableSnapshotInputFormat's split object and can explode 
> client's memory usage for tables with large no of reasons. 
> The solution would be same as the other Jira - Don't store the scan object in 
> the split, instead read it via the conf while initializing the record reader 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to