[ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889599#comment-16889599 ]
Mikhail Khludnev commented on SOLR-9961: ---------------------------------------- Linking a bunch of jiras proving that {{fs.hdfs.impl.disable.cache=true}} is ours' everything, which hard to believe for me. > RestoreCore needs the option to download files in parallel. > ----------------------------------------------------------- > > Key: SOLR-9961 > URL: https://issues.apache.org/jira/browse/SOLR-9961 > Project: Solr > Issue Type: Improvement > Components: Backup/Restore > Affects Versions: 6.2.1 > Reporter: Timothy Potter > Priority: Major > Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, > SOLR-9961.patch, SOLR-9961.patch > > > My backup to cloud storage (Google cloud storage in this case, but I think > this is a general problem) takes 8 minutes ... the restore of the same core > takes hours. The restore loop in RestoreCore is serial and doesn't allow me > to parallelize the expensive part of this operation (the IO from the remote > cloud storage service). We need the option to parallelize the download (like > distcp). > Also, I tried downloading the same directory using gsutil and it was very > fast, like 2 minutes. So I know it's not the pipe that's limiting perf here. > Here's a very rough patch that does the parallelization. We may also want to > consider a two-step approach: 1) download in parallel to a temp dir, 2) > perform all the of the checksum validation against the local temp dir. That > will save round trips to the remote cloud storage. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org