Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18174

to look at the new patch set (#2).

Change subject: Adding repartitioning logic along with coalesce logic to backup 
output
......................................................................

Adding repartitioning logic along with coalesce logic to backup output

We optionally use the coalesce and repartitions options in the BackUpKudu Spark 
command.
For every release we have to add this commit to our internal release.
Request to get this commit in apache/kudu to avoid having to add this commit 
for every new kudu release

Adding repartition logic along with coalesce to output files
Both the above params are optional.
Coalesce takes precedence over repartition if both of them are defined.

Testing

sudo /mnt/services/spark/bin/run-transform-cluster-mode-on 
report-center-batch-driver --stack rcspark_envoy --executor-cores 8 
--total-executor-cores 32 --executor-memory 55g --driver-memory 55g --conf 
spark.log4j.logger.org.apache.spark=WARN --conf 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf 
spark.speculation=false --class com.twilio.backup.BackupKuduTable 
/mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar
 --kuduMasterAddresses 
report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com
 --splitSizeBytes 1000000000 --scanRequestTimeoutMs 60000000 
--coalesceOutputPartitions 32 --rootPath 
s3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/ 
BillableItemUsageCategories
2022-01-27 08:07:01,015 - root - INFO:  TIMEOUT is None, status check interval 
60, job file None and connection retry to spark REST API 5 and arguments to job 
['--executor-cores', '8', '--total-executor-cores', '32', '--executor-memory', 
'55g', '--driver-memory', '55g', '--conf', 
'spark.log4j.logger.org.apache.spark=WARN', '--conf', 
'spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2', '--conf', 
'spark.speculation=false', '--class', 'com.twilio.backup.BackupKuduTable', 
'/mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar',
 '--kuduMasterAddresses', 
'report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com',
 '--splitSizeBytes', '1000000000', '--scanRequestTimeoutMs', '60000000', 
'--coalesceOutputPartitions', '32', '--rootPath', 
's3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/', 
'BillableItemUsageCategories']
2022-01-27 08:07:01,772 - root - INFO:  Job submitted as 
driver-20220127080701-17528
2022-01-27 08:08:02,680 - root - INFO:  Job submission 
[driver-20220127080701-17528] alive with state RUNNING on 
worker-20220127062817-172.25.72.200-7078
2022-01-27 08:09:03,707 - root - INFO:  Job submission 
[driver-20220127080701-17528] completed (state: FINISHED)
Finished: SUCCESS


Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestOptions.scala
3 files changed, 35 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/18174/2
--
To view, visit http://gerrit.cloudera.org:8080/18174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
Gerrit-Change-Number: 18174
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to