Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/18174
to look at the new patch set (#4).
Change subject: Adding repartitioning logic along with coalesce logic to backup
output
......................................................................
Adding repartitioning logic along with coalesce logic to backup output
We optionally use the coalesce and repartitions options in the BackupKuduTable
Spark command.
For every release we have to add this commit to our internal release.
Request to get this commit in apache/kudu to avoid having to add this commit
for every new kudu release
Adding repartition logic along with coalesce to output files
Both the above parameterss are optional.
Coalesce takes precedence over repartition if both of them are defined.
Testing
sudo /mnt/services/spark/bin/run-transform-cluster-mode-on
report-center-batch-driver --stack rcspark_envoy --executor-cores 8
--total-executor-cores 32 --executor-memory 55g --driver-memory 55g --conf
spark.log4j.logger.org.apache.spark=WARN --conf
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf
spark.speculation=false --class com.twilio.backup.BackupKuduTable
/mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar
--kuduMasterAddresses
report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com
--splitSizeBytes 1000000000 --scanRequestTimeoutMs 60000000
--coalesceOutputPartitions 32 --rootPath
s3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/
BillableItemUsageCategories
2022-01-27 08:07:01,015 - root - INFO: TIMEOUT is None, status check interval
60, job file None and connection retry to spark REST API 5 and arguments to job
['--executor-cores', '8', '--total-executor-cores', '32', '--executor-memory',
'55g', '--driver-memory', '55g', '--conf',
'spark.log4j.logger.org.apache.spark=WARN', '--conf',
'spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2', '--conf',
'spark.speculation=false', '--class', 'com.twilio.backup.BackupKuduTable',
'/mnt/services/report-center-batch-indexer/appJar/spark-report-center-batch-indexer-shaded.jar',
'--kuduMasterAddresses',
'report-center-leader-5.us1.twilio.com,report-center-leader-4.us1.twilio.com,report-center-leader-3.us1.twilio.com,report-center-leader-2.us1.twilio.com,report-center-leader-1.us1.twilio.com',
'--splitSizeBytes', '1000000000', '--scanRequestTimeoutMs', '60000000',
'--coalesceOutputPartitions', '32', '--rootPath',
's3a://com.twilio.prod.warehouse/data/report-center/kudu-table-backup/',
'BillableItemUsageCategories']
2022-01-27 08:07:01,772 - root - INFO: Job submitted as
driver-20220127080701-17528
2022-01-27 08:08:02,680 - root - INFO: Job submission
[driver-20220127080701-17528] alive with state RUNNING on
worker-20220127062817-172.25.72.200-7078
2022-01-27 08:09:03,707 - root - INFO: Job submission
[driver-20220127080701-17528] completed (state: FINISHED)
Finished: SUCCESS
Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestOptions.scala
3 files changed, 35 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/18174/4
--
To view, visit http://gerrit.cloudera.org:8080/18174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I328cb7e41bca14b7b6d73eb7721a86fb86203201
Gerrit-Change-Number: 18174
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)