[ 
https://issues.apache.org/jira/browse/SOLR-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-9958:
---------------------------------
    Attachment: SOLR-9958.patch

I changed the title because I don't see the HdfsBackupRepository's close method 
being called before that error in the description occurs, so it looks like 
something else closed FileSystem from out under the repo! FileSystem.close 
closes it for all that were retrieved using FileSystem.get. Not sure if this 
patch (for 6.2.1) is the correct approach, but it fixes the problem and allows 
the backup to complete correctly. Basically, does {{FileSystem.newInstance}} 
instead of {{FileSystem.get}}. This may just be a bug in the underlying Google 
cloud storage impl, but I think we should try to work around it if possible and 
this seems like a reasonable approach to me. However, I haven't had my head in 
HDFS code in a long while, so may be missing something ...

> The FileSystem used by HdfsBackupRepository gets closed before the backup 
> completes.
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-9958
>                 URL: https://issues.apache.org/jira/browse/SOLR-9958
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Hadoop Integration
>    Affects Versions: 6.2.1
>            Reporter: Timothy Potter
>         Attachments: SOLR-9958.patch
>
>
> My shards get backed up correctly, but then it fails when backing up the 
> state from ZK. From the logs, it looks like the underlying FS gets closed 
> before the config stuff is written:
> {code}
> DEBUG - 2017-01-11 22:39:12.889; [   ] 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase; GHFS.close:=> 
> INFO  - 2017-01-11 22:39:12.889; [   ] org.apache.solr.handler.SnapShooter; 
> Done creating backup snapshot: shard1 at 
> gs://master-sector-142100.appspot.com/backups2/tim5
> INFO  - 2017-01-11 22:39:12.889; [   ] org.apache.solr.servlet.HttpSolrCall; 
> [admin] webapp=null path=/admin/cores 
> params={core=gettingstarted_shard1_replica1&qt=/admin/cores&name=shard1&action=BACKUPCORE&location=gs://master-sector-142100.appspot.com/backups2/tim5&wt=javabin&version=2}
>  status=0 QTime=24954
> INFO  - 2017-01-11 22:39:12.890; [   ] org.apache.solr.cloud.BackupCmd; 
> Starting to backup ZK data for backupName=tim5
> INFO  - 2017-01-11 22:39:12.890; [   ] 
> org.apache.solr.common.cloud.ZkStateReader; Load collection config from: 
> [/collections/gettingstarted]
> INFO  - 2017-01-11 22:39:12.891; [   ] 
> org.apache.solr.common.cloud.ZkStateReader; 
> path=[/collections/gettingstarted] [configName]=[gettingstarted] specified 
> config exists in ZooKeeper
> ERROR - 2017-01-11 22:39:12.892; [   ] org.apache.solr.common.SolrException; 
> Collection: gettingstarted operation: backup failed:java.io.IOException: 
> GoogleHadoopFileSystem has been closed or not initialized.
>     at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.checkOpen(GoogleHadoopFileSystemBase.java:1927)
>     at 
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.mkdirs(GoogleHadoopFileSystemBase.java:1367)
>     at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877)
>     at 
> org.apache.solr.core.backup.repository.HdfsBackupRepository.createDirectory(HdfsBackupRepository.java:153)
>     at 
> org.apache.solr.core.backup.BackupManager.downloadConfigDir(BackupManager.java:186)
>     at org.apache.solr.cloud.BackupCmd.call(BackupCmd.java:111)
>     at 
> org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:222)
>     at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:463)
>     at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to