[jira] [Commented] (HDFS-11047) Remove deep copies of FinalizedReplica to alleviate heap consumption on DataNode

Arpit Agarwal (JIRA) Wed, 26 Oct 2016 17:38:34 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610179#comment-15610179
 ]


Arpit Agarwal commented on HDFS-11047:
--------------------------------------

Nice catch [~xiaobingo]. Thanks for reporting and fixing this.

I agree with [~jpallas] that we can just change the behavior of 
getFinalizedBlocks as it is a private interface. We can document the 
requirement that the caller of {{getFinalizedBlocks}} first get the dataset 
lock via {{FsDatasetSpi#acquireDatasetLock}}.

In addition to the deep copy there is an apparently unnecessary list to array 
conversion that you removed. I wasn't able to follow the source history past 
2011 to see why it was introduced. IAC I can't think of any reason to retain it.

> Remove deep copies of FinalizedReplica to alleviate heap consumption on 
> DataNode
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-11047
>                 URL: https://issues.apache.org/jira/browse/HDFS-11047
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, fs
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>         Attachments: HDFS-11047.000.patch
>
>
> DirectoryScanner does scan by deep copying FinalizedReplica. In a deployment 
> with 500,000+ blocks, we've seen the DN heap usage being accumulated to high 
> peaks very quickly. Deep copies of FinalizedReplica will make DN heap usage 
> even worse if directory scans are scheduled more frequently. This proposes 
> removing unnecessary deep copies since DirectoryScanner#scan already holds 
> lock of dataset. The sibling work is tracked by AMBARI-18694
> DirectoryScanner#scan
> {code}
>     try(AutoCloseableLock lock = dataset.acquireDatasetLock()) {
>       for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
>         String bpid = entry.getKey();
>         ScanInfo[] blockpoolReport = entry.getValue();
>         
>         Stats statsRecord = new Stats(bpid);
>         stats.put(bpid, statsRecord);
>         LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
>         diffs.put(bpid, diffRecord);
>         
>         statsRecord.totalBlocks = blockpoolReport.length;
>         List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid); /* deep 
> copies here*/
> {code}
> FsDatasetImpl#getFinalizedBlocks
> {code}
>   public List<ReplicaInfo> getFinalizedBlocks(String bpid) {
>     try (AutoCloseableLock lock = datasetLock.acquire()) {
>       ArrayList<ReplicaInfo> finalized =
>           new ArrayList<ReplicaInfo>(volumeMap.size(bpid));
>       for (ReplicaInfo b : volumeMap.replicas(bpid)) {
>         if (b.getState() == ReplicaState.FINALIZED) {
>           finalized.add(new ReplicaBuilder(ReplicaState.FINALIZED)
>               .from(b).build()); /* deep copies here*/
>         }
>       }
>       return finalized;
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11047) Remove deep copies of FinalizedReplica to alleviate heap consumption on DataNode

Reply via email to