[ 
https://issues.apache.org/jira/browse/HDFS-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216518#comment-16216518
 ] 

Ewan Higgs commented on HDFS-12594:
-----------------------------------

Some minor things on a first pass:

{code}
+      if (getLastIndex() != -1) {
+        setLastIndex(-1);
+      }
{code}
Why not just set it?

I think the basic design is a good approach but it would be nicer to 
restructure it by acknowledging that we're making a cursor/iterator here. So 
the report request/response as follows: 

{code}
message GetSnapshotDiffReportListingRequestProto {
  required string snapshotRoot = 1;
  required string fromSnapshot = 2;
  required string toSnapshot = 3;
  required string startPath = 4;
  required int32 index = 5 [default = -1];
}
// ...

message SnapshotDiffReportListingProto {
  // full path of the directory where snapshots were taken
  repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
  repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
  repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
  required bytes startPath = 4;
  required int32 index = 5 [default = -1];
  required bool isFromEarlier = 6;
}
{code}

... could be: 

{code}

message SnapshotDiffReportCursorProto
  required string startPath = 4;
  required int32 index = 5 [default = -1];
}

message GetSnapshotDiffReportListingRequestProto {
  required string snapshotRoot = 1;
  required string fromSnapshot = 2;
  required string toSnapshot = 3;
  optional SnapshotDiffReportCursorProto cursor = 4;
}

// ...

message SnapshotDiffReportListingProto {
  // full path of the directory where snapshots were taken
  repeated SnapshotDiffReportListingEntryProto modifiedEntries = 1;
  repeated SnapshotDiffReportListingEntryProto createdEntries = 2;
  repeated SnapshotDiffReportListingEntryProto deletedEntries = 3;
  required bool isFromEarlier = 4;
  optional SnapshotDiffReportCursorProto cursor = 5;
}
{code}

Making a request with no cursor starts at the beginning.

> SnapshotDiff - snapshotDiff fails if the snapshotDiff report exceeds the RPC 
> response limit
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12594
>                 URL: https://issues.apache.org/jira/browse/HDFS-12594
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>         Attachments: HDFS-12594.001.patch, HDFS-12594.002.patch, 
> HDFS-12594.003.patch, SnapshotDiff_Improvemnets .pdf
>
>
> The snapshotDiff command fails if the snapshotDiff report size is larger than 
> the configuration value of ipc.maximum.response.length which is by default 
> 128 MB. 
> Worst case, with all Renames ops in sanpshots each with source and target 
> name equal to MAX_PATH_LEN which is 8k characters, this would result in at 
> 8192 renames.
>  
> SnapshotDiff is currently used by distcp to optimize copy operations and in 
> case of the the diff report exceeding the limit , it fails with the below 
> exception:
> Test set: 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.095 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
> testDiffReportWithMillionFiles(org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport)
>   Time elapsed: 111.906 sec  <<< ERROR!
> java.io.IOException: Failed on local exception: 
> org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; 
> Host Details : local host is: "hw15685.local/10.200.5.230"; destination host 
> is: "localhost":59808;
> Attached is the proposal for the changes required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to