[ 
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238670#comment-16238670
 ] 

Tsz Wo Nicholas Sze commented on HDFS-12594:
--------------------------------------------

Some other comments on the patch.

- Since there is already a 
"dfs.namenode.snapshotdiff.allow.snap-root-descendant", rename 
"dfs.snapshotdiff-report.limit" to "dfs.namenode.snapshotdiff.listing.limit" 
and move it next to DFS_NAMENODE_SNAPSHOT_DIFF_ALLOW_SNAP_ROOT_DESCENDANT.

- Use int for index and snapshotDiffReportLimit instead of Integer.  Use long 
instead of Long, boolean instead of Boolean, etc.

- SnapshotDiffReportGenerator should be moved to the 
org.apache.hadoop.hdfs.client.impl package.

- Use byte[][] in SnapshotDiffReportListing for sourcePath and targetPath
-* bytes2String and string2Bytes are expensive, please avoid calling them.
{code}
    public byte[] getParent() {
      if (sourcePath == null || DFSUtilClient.bytes2String(sourcePath)
          .isEmpty()) {
        return null;
      } else {
        Path path = new Path(DFSUtilClient.bytes2String(sourcePath));
        return DFSUtilClient.string2Bytes(path.getParent().toString());
      }
    }
{code}

- In DistributedFileSystem.getSnapshotDiffReportInternal,
-* deltetedList should be deletedList
-* remove snapDiffReport, just return snapshotDiffReport.generateReport();

I have not finished reviewing the entire patch yet.  Will continue.


> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC 
> response limit
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12594
>                 URL: https://issues.apache.org/jira/browse/HDFS-12594
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch, 
> HDFS-12594.003.patch, HDFS-12594.004.patch, SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than 
> the configuration value of ipc.maximum.response.length which is by default 
> 128 MB. 
> Worst case, with all Renames ops in sanpshots each with source and target 
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at 
> 8192 renames.
>  
> SnapshotDiff is currently used by distcp to optimize copy operations and in 
> case of the the diff report exceeding the limit , it fails with the below 
> exception:
> Test set: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
>   Time elapsed: 111.906 sec  <<< ERROR!
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host 
> is: "localhost":59808;
> Attached is the proposal for the changes required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to