elangelo commented on code in PR #4023:
URL: https://github.com/apache/solr/pull/4023#discussion_r3195607724
##########
solr/solr-ref-guide/modules/deployment-guide/pages/backup-restore.adoc:
##########
@@ -396,6 +396,39 @@ Any children under the `<repository>` tag are passed as
additional configuration
Information on each of the repository implementations provided with Solr is
provided below.
+=== Parallel File Transfers
+
+Backup and restore operations can transfer multiple index files in parallel to
improve throughput, especially when using cloud storage repositories like S3 or
GCS where latency is higher.
+The parallelism is controlled via system properties or environment variables:
+
+`solr.backup.maxparalleluploads`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `1`
+|===
++
+Maximum number of index files to upload in parallel during backup operations.
+Can also be set via the `SOLR_BACKUP_MAXPARALLELUPLOADS` environment variable.
+For cloud storage repositories (S3, GCS), consider setting this to `8` or
higher to improve backup performance.
Review Comment:
@mlbiscoc fixed this in the docs.
##########
solr/core/src/java/org/apache/solr/handler/IncrementalShardBackup.java:
##########
@@ -191,55 +213,154 @@ private BackupStats incrementalCopy(Collection<String>
indexFiles, Directory dir
URI indexDir = incBackupFiles.getIndexDir();
BackupStats backupStats = new BackupStats();
- for (String fileName : indexFiles) {
- Optional<ShardBackupMetadata.BackedFile> opBackedFile =
oldBackupPoint.getFile(fileName);
- Checksum originalFileCS = backupRepo.checksum(dir, fileName);
-
- if (opBackedFile.isPresent()) {
- ShardBackupMetadata.BackedFile backedFile = opBackedFile.get();
- Checksum existedFileCS = backedFile.fileChecksum;
- if (existedFileCS.equals(originalFileCS)) {
- currentBackupPoint.addBackedFile(opBackedFile.get());
- backupStats.skippedUploadingFile(existedFileCS);
- continue;
+ // Only use an executor for parallel uploads when parallelism > 1
+ // When set to 1, run synchronously to avoid thread-local state issues
with CallerRunsPolicy
+ int maxParallelUploads = DEFAULT_MAX_PARALLEL_UPLOADS;
+ ExecutorService executor =
+ maxParallelUploads > 1
+ ? new ExecutorUtil.MDCAwareThreadPoolExecutor(
+ 0,
+ maxParallelUploads,
+ 60L,
+ TimeUnit.SECONDS,
+ new SynchronousQueue<>(),
+ new SolrNamedThreadFactory("IncrementalBackup"),
Review Comment:
fixed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]