Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118
PS2, Line 118:   PartitionMetadataPB partitions = 8;
> Skipped 8?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163
PS2, Line 163:  * Node class to represent nodes in the backup graph.
> Curious why you went with the noun "vertex"; isn't "node" the more commonly
No specific reason, will change to Node.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS2, Line 35:  */
> Nit: Javadoc not quite formatted properly?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58
PS2, Line 58:       // use the `to_ms` metadata as the `from_ms` time for this 
backup.
> Nit: too long?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144
PS2, Line 144:     val columnCount = if (incremental) fieldCount - 1 else 
fieldCount
> This function is called per-row, right? I wonder if it'd be more performant
We could assume the column count is static across the life of an RDD and 
memoize this calculation, but the existing implementation didn't do that, so I 
kept the behavior the same. The additional if check should be cheap relative to 
the other work in the method.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39
PS2, Line 39:  */
> Formatting?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54
PS2, Line 54:     // TODO: Make parallel so each table isn't processed serially.
            :     options.tables.foreach { tableName =>
            :       val graph = io.readBackupGraph(tableName).filterByTim
> Did you test what effect this has? If it may have a drastic impact on perfo
This wasn't meant to be included, I was experimenting with it. I will remove it.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Generate an operation based on the row action.
             :               val operation = rowAction match {
             :                 case RowAction.UPSERT => table.newUpsert()
> Does Scala allow returning tuples like in Python?

Yes it does

> Seems like it'd be more elegant to return the row action separately from the 
> row itself.

The Row abstraction is a result of the Spark framework. It expects Dataframes 
to work with Rows and is less flexible than the raw RDD API.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
> Why do these three need to be listed here and in RestoreOptions if both ext
The CommonOptions trait is specifying that BackupOptions/RestoreOptions needs 
to implement a value function for tables, rootPath, and  kuduMasterAddresses. 
But it doesn't specify "how". BackupOptions/RestoreOptions implement those 
functions via a constructor.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93
PS2, Line 93:         // TODO: Document the limitations based on cluster 
configuration
> Nit: too long?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala@59
PS2, Line 59:  */
> Could you doc at least the interesting methods here?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 18:26:47 +0000
Gerrit-HasComments: Yes

Reply via email to