[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643906#comment-16643906 ]
Sean Mackrory commented on HADOOP-15819: ---------------------------------------- So I think it was mentioned in the other JIRA that each test should be in their own JVM, but if we're seeing FS's in one test that were already closed, and we get a stack trace that points to another test, then something isn't right. Not closing the fs in tests seems like a possible solution, but (a) we need to rule out the possibility that this is a bug in the actual product, and (b) closing the FS does multiple important things and duplicating that everywhere isn't just a Band-Aid, it's kind of a messy one. The FS cache really feels inherently broken in the parallel tests case, which is why I initially liked the idea of disabling caching for the tests. If you rely on the FileSystem class to either give you an existing instance or create one for you, then we need to rely on it to decide when to close it. Otherwise people either break what other threads are doing or they don't close their FS instances. Both of those are horrible. Some kind of ref-counting, maybe? But modifying the design of the FS cache scares me, because it's been that way for a long time and is kind of important. Which raises the question: if I'm seeing this all the time recently but none of us saw it before a little while ago - what broke it? Because I don't think it was the FS caching. I wonder if git-bisect might be a useful way to get a lead here. > S3A integration test failures: FileSystem is closed! - without parallel test > run > -------------------------------------------------------------------------------- > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.1.1 > Reporter: Gabor Bota > Assignee: Gabor Bota > Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at > org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40) > at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2018-10-04 00:52:25,596 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(61)) - FS was closed 6 times by: > 2018-10-04 00:52:25,596 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() > #1 --------------- > 2018-10-04 00:52:25,596 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(65)) - > java.lang.RuntimeException: close() called here > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83) > at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280) > at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298) > at > org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55) > at > org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.teardown(AbstractCommitITest.java:206) > at > org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations.teardown(ITestAssumedRoleCommitOperations.java:90) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2018-10-04 00:52:25,596 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() > #2 --------------- > 2018-10-04 00:52:25,596 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(65)) - > java.lang.RuntimeException: close() called here > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83) > at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280) > at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298) > at > org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2018-10-04 00:52:25,597 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(64)) - ----------------- close() > #3 --------------- > 2018-10-04 00:52:25,598 [Thread-4201] WARN s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:logClosers(65)) - > java.lang.RuntimeException: close() called here > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83) > at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280) > at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298) > at > org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > (...) > {noformat} > From the logs it seems that the closers are the teardown methods of the > abstract test classes, so the same FileSystem object gets reused between > tests. > To run a test with the modified fs the following should be in the config: > {noformat} > <property> > <name>fs.s3a.impl</name> > <value>org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem</value> > <description>The implementation class of the S3A Filesystem</description> > </property> > {noformat} > I file this jira to solve this issue, and start a conversation if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org