[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549321#comment-15549321 ]
Sean Mackrory commented on HDFS-10797: -------------------------------------- Thanks, [~xiaochen]. Except as noted below, I'll incorporate all your feedback into another patch... {quote}I don't think that will be a critical path to impact du performance{quote} Yeah - not sure if anything performance critical depends on du, but I would think correctness of the final result is far more important here anyway. {quote}In nodeIncluded, we safeguard includedNodes in a synchronized block, but we also provide a getIncludedNodes method, which could potentially be updated by the caller. No real usage yet, but I just feel this a bit unsafe in general, maybe return a clone of it instead?{quote} So my concern was not that the contents of the HashSet instance might change, but that the reference 'counts' temporarily points to a different object entirely when tallying the deleted, snapshotted INodes. Rather than protecting the data structures, it ensures no one can call getCounts() while counts would point to the wrong object. Beyond that, I think it's just as likely that threads calling getCounts in parallel will need their changes to propagate to the rest of the program, meaning the correct solution would be a thread-safe data structure rather than a clone. So I do think it's best to leave it as is until there is a use case for other concurrent accesses. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -------------------------------------------------------------------------- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sean Mackrory > Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org