Shalin Shekhar Mangar created SOLR-11381: --------------------------------------------
Summary: HdfsDirectoryFactory throws NPE on cleanup because file system has been closed Key: SOLR-11381 URL: https://issues.apache.org/jira/browse/SOLR-11381 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: hdfs Reporter: Shalin Shekhar Mangar Priority: Trivial Fix For: master (8.0), 7.1 I saw this happening on tests related to autoscaling. The old directory clean up is triggered on core close in a separate thread. This can cause a race condition where the filesystem is closed before the cleanup starts running. Then a NPE is thrown and cleanup fails. Fixing the NPE is simple but I think this is a real bug where old directories can be left around on HDFS. I don't know enough about HDFS to investigate further. Leaving it here for interested people to pitch in. {code} 105029 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.HdfsDirectoryFactory Error checking for old index directories to clean-up. java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2083) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2069) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:791) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:540) at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019) at java.lang.Thread.run(Thread.java:745) 105030 ERROR (OldIndexDirectoryCleanupThreadForCore-control_collection_shard1_replica_n1) [n:127.0.0.1:58542_ c:control_collection s:shard1 r:core_node2 x:control_collection_shard1_replica_n1] o.a.s.c.SolrCore Failed to cleanup old index directories for core control_collection_shard1_replica_n1 java.lang.NullPointerException at org.apache.solr.core.HdfsDirectoryFactory.cleanupOldIndexDirectories(HdfsDirectoryFactory.java:558) at org.apache.solr.core.SolrCore.lambda$cleanupOldIndexDirectories$32(SolrCore.java:3019) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org