[jira] [Comment Edited] (HBASE-29295) Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's split objects

Ujjawal Kumar (Jira) Thu, 08 May 2025 06:04:19 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950266#comment-17950266
 ]


Ujjawal Kumar edited comment on HBASE-29295 at 5/8/25 1:03 PM:
---------------------------------------------------------------

In our production use case, we saw (ref attached screenshot) !Screenshot 
2025-05-08 at 6.26.15 PM.png! around 2400 input split of 
TableSnapshotInputFormat each with size 5.4 MB (mainly coming from scan's 
base64 encoded representation) leading to memory footprint of 13 GB on a client 
JVM with 16 GB max heap size. 


was (Author: ukumar):
In our production use case, we saw around 2400 input split of 
TableSnapshotInputFormat each with size 5.4 MB (mainly coming from scan's 
base64 encoded representation) leading to memory footprint of 13 GB on a client 
JVM with 16 GB max heap size. 

> Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's 
> split objects
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-29295
>                 URL: https://issues.apache.org/jira/browse/HBASE-29295
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, snapshots
>    Affects Versions: 2.5.10
>            Reporter: Ujjawal Kumar
>            Assignee: Ujjawal Kumar
>            Priority: Major
>         Attachments: Screenshot 2025-05-08 at 6.26.15 PM.png
>
>
> Similar to HBASE-24859, we have seen that while performing reads via snapshot 
> in an MR job the memory consumption increases a lot on the client side.
> It happens due to the same reason mentioned in the HBASE-24859, scan is 
> embedded within TableSnapshotInputFormat's split object and can explode 
> client's memory usage for tables with large no of reasons. 
> The solution would be same as the other Jira - Don't store the scan object in 
> the split, instead read it via the conf while initializing the record reader 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HBASE-29295) Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's split objects

Reply via email to