[
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822190#comment-15822190
]
Timothy Potter commented on SOLR-9961:
--------------------------------------
The other thing I found here is HdfsDirectory is closing a shared FileSystem
object because HdfsBackupRepository uses try with resources:
{code}
@Override
public void copyFileTo(URI sourceRepo, String fileName, Directory dest)
throws IOException {
try (HdfsDirectory dir = new HdfsDirectory(new Path(sourceRepo),
NoLockFactory.INSTANCE,
hdfsConfig, HdfsDirectory.DEFAULT_BUFFER_SIZE * 10)) {
dest.copyFrom(dir, fileName, fileName,
DirectoryFactory.IOCONTEXT_NO_CACHE);
}
}
{code}
This closes the FileSystem object that was retrieved with FileSystem.get.
Because of this (I think), I'm seeing lots of errors like the following while
doing the restore:
{code}
WARN - 2017-01-13 14:09:44.249; [ ] org.apache.solr.handler.RestoreCore;
Exception while restoring the backup index
java.lang.RuntimeException: Problem creating directory:
gs://hd-fusion/aggr_solr/myAggr3/snapshot.shard1
at
org.apache.solr.store.hdfs.HdfsDirectory.<init>(HdfsDirectory.java:91)
at
org.apache.solr.core.backup.repository.HdfsBackupRepository.copyFileTo(HdfsBackupRepository.java:175)
at
org.apache.solr.handler.RestoreCore.downloadFile(RestoreCore.java:196)
at org.apache.solr.handler.RestoreCore.access$000(RestoreCore.java:47)
at org.apache.solr.handler.RestoreCore$1.call(RestoreCore.java:101)
at org.apache.solr.handler.RestoreCore$1.call(RestoreCore.java:99)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: GoogleHadoopFileSystem has been closed or not
initialized.
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.checkOpen(GoogleHadoopFileSystemBase.java:1802)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1284)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at
org.apache.solr.store.hdfs.HdfsDirectory.<init>(HdfsDirectory.java:83)
... 9 more
{code}
There's a handy prop that allows you to disable the cache (add to
core-site.xml), which makes this error go away:
{code}
<property>
<name>fs.gs.impl.disable.cache</name>
<value>true</value>
</property>
{code}
> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Backup/Restore
> Affects Versions: 6.2.1
> Reporter: Timothy Potter
> Attachments: SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think
> this is a general problem) takes 8 minutes ... the restore of the same core
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me
> to parallelize the expensive part of this operation (the IO from the remote
> cloud storage service). We need the option to parallelize the download (like
> distcp).
> Also, I tried downloading the same directory using gsutil and it was very
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to
> consider a two-step approach: 1) download in parallel to a temp dir, 2)
> perform all the of the checksum validation against the local temp dir. That
> will save round trips to the remote cloud storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]