[ 
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822297#comment-15822297
 ] 

Hrishikesh Gadre commented on SOLR-9961:
----------------------------------------

[~thelabdude] I think this is a great improvement! Couple of comments,

bq. But as stated in the description, this now causes the various FileSystem 
already closed issue, so would need to be used with hdfs cache disabled.

I think the root cause of this problem is the fact that HdfsDirectory is using 
FileSystem.get(...) API. If we change that to FileSystem.newInstance(...) that 
problem will most likely go away. I think this would be a better solution than 
disabling HDFS caching. [~markrmil...@gmail.com] any thoughts?

bq. adds an option for BackupRepository implementations to download in parallel 
using a thread pool.

It seems a bit odd to add this configuration to BackupRepository interface. If 
we can ensure that all BackupRepository implementations to support concurrent 
copy operations then we can make the thread-pool and time out configurations 
global. For this to be feasible, the BackupRepository implementation just needs 
to make sure that the client state kept separate for each copy operation (which 
I think is doable)

The other approach could be to add another API in BackupRepository interface 
which accepts a list of files to be copied. The implementation of this API can 
choose to use multi-threaded (or a sequential) execution. This can even benefit 
backup operation. What do you think ?

Also as a minor comment, did you think about using CompletionService to fetch 
the results of completed tasks? Seems a bit cleaner...
   


> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>
>                 Key: SOLR-9961
>                 URL: https://issues.apache.org/jira/browse/SOLR-9961
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 6.2.1
>            Reporter: Timothy Potter
>         Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to