[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256861#comment-15256861 ]
Colin Patrick McCabe commented on HDFS-10323: --------------------------------------------- Thanks for the detailed bug report, [~bpodgursky]. bq. 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child FileSystem, and not hold onto that path itself. This would be an incompatible change, right? It seems like a lot of code calling {{FS#close}} might not work with this change. bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems. This seems like the safest way to go. > transient deleteOnExit failure in ViewFileSystem due to close() ordering > ------------------------------------------------------------------------ > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation > Reporter: Ben Podgursky > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)