[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2019-01-03 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733167#comment-16733167
 ] 

Gabor Bota edited comment on HADOOP-15819 at 1/3/19 3:52 PM:
-

We could extend the docs with something like "Do not add FileSystem instances 
(with e.g org.apache.hadoop.fs.FileSystem#addFileSystemForTesting) to the cache 
that will be modified during the test runs. This can cause other tests to fail 
when using the same modified or closed FS instance. For more details see 
HADOOP-15819."

Should I create a new jira for this and upload a patch?


was (Author: gabor.bota):
We could extend the docs with something like "Do not add FileSystem instances 
to the cache that will be modified during the test runs with e.g 
org.apache.hadoop.fs.FileSystem#addFileSystemForTesting. This can cause other 
tests to fail when using the same modified or closed FS instance. For more 
details see HADOOP-15819."

Should I create a new jira for this and upload a patch?

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-19 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711917#comment-16711917
 ] 

Adam Antal edited comment on HADOOP-15819 at 12/19/18 1:59 PM:
---

I'd also like to add that it worked for my disabled-cache version of trunk (so 
I disabled cache for {{AbstractITCommitProtocol}} previously) - it was added to 
the patch as well. It actually works without disabling cache (so only removing 
the bindFileSystem call).


was (Author: adam.antal):
I'd also like to add that it worked for my disabled-cache version of trunk (so 
I disabled cache for {{AbstractITCommitProtocol }}previously) - it was added to 
the patch as well. It actually works without disabling cache (so only removing 
the bindFileSystem call).

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-11-29 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703616#comment-16703616
 ] 

Adam Antal edited comment on HADOOP-15819 at 11/29/18 6:26 PM:
---

Also some minor remark on {{S3ACloseEnforcedFileSystem}}: the method 
{{processDeleteOnExit()}} should not call the {{checkIfClosed()}} method in 
{{S3ACloseEnforcedFileSystem}}, because the former is only called after the 
{{close()}} (the order of the calls is {{S3ACloseEnforcedFileSystem:close() -> 
S3AFileSystem:close() -> FileSystem:close() -> 
S3ACloseEnforcedFileSystem:processDeleteOnExit()}}) thus creating misleading 
errors during the normal close of the filesystem.
I uploaded that for reference.


was (Author: adam.antal):
Also some minor remark on {{S3ACloseEnforcedFileSystem}}: the method 
{{processDeleteOnExit()}} should not call the {{checkIfClosed()}} method in 
{{S3ACloseEnforcedFileSystem}}, because the former is only called after the 
{{close()}} (the order of the calls is {{S3ACloseEnforcedFileSystem:close() -> 
S3AFileSystem:close() -> FileSystem:close() -> 
S3ACloseEnforcedFileSystem:processDeleteOnExit()}}) thus creating misleading 
errors during the normal close of the filesystem.

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-10-10 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644973#comment-16644973
 ] 

Gabor Bota edited comment on HADOOP-15819 at 10/10/18 2:19 PM:
---

I have some bad news: I get these issues for tests where 
{{FS_S3A_IMPL_DISABLE_CACHE}} set to true, so the caching _should be_ disabled.

What I did is I modified the 
{{org.apache.hadoop.fs.s3a.AbstractS3ATestBase#teardown}} to

{code:java}
  @Override
  public void teardown() throws Exception {
super.teardown();
boolean fsCacheDisabled = getConfiguration()
.getBoolean(FS_S3A_IMPL_DISABLE_CACHE, false);
if(fsCacheDisabled){
  describe("closing file system");
  LOG.warn("Closing fs. FS_S3A_IMPL_DISABLE_CACHE: " + fsCacheDisabled);
  IOUtils.closeStream(getFileSystem());
}
  }
{code}

And there were still issues after this.


was (Author: gabor.bota):
I have some bad news: I get these issues for tests where 
{{FS_S3A_IMPL_DISABLE_CACHE}} set to true, so the caching _should be_ disabled.

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: S3ACloseEnforcedFileSystem.java, 
> closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-10-10 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644940#comment-16644940
 ] 

Steve Loughran edited comment on HADOOP-15819 at 10/10/18 2:08 PM:
---

bq. The FS cache really feels inherently broken in the parallel tests case, 
which is why I initially liked the idea of disabling caching for the tests.

parallel tests run in their own JVMs,  the main issue there is that you need to 
be confident that tests aren't writing to the same local/remote paths.

At the same time, I've seen old test instances get recycled, which makes me 
thing that the parallel runner fields work out to instantiated test runners as 
they complete individual test cases. So reuse does happen, it just happens a 
lot more in sequential runs.

Tests which want special configs of the FS can't handle recycled classes, hence 
the need for  nee filesystems & close after, but I don't see why other tests 
should be closing much.

* HADOOP-13131 added the close, along with 
{{S3ATestUtils.createTestFilesystem()}}, which does create filesystems that 
need to be closed.
* But I don't see that creation happening much, especially given 
{{S3AContract}}'s FS is just from a get().
* Many tests call {{S3ATestUtils.disableFilesystemCaching(conf)}} before 
FileSystem.get, which guarantees unique instances

Which makes me think: yes, closing the FS in teardown is overkill except in the 
special case of "creates a new filesystem() either explicilty or implicitly.

As [~mackrorysd] says: surprising this hasn't surfaced before. But to fix it 
means that it should be done properly.

* IF tests were always closed and new ones created (i.e. no caching), test 
cases run way, way faster. 
* those tests which do need their own FS instance can close it in teardown, and 
set it up.
* And those tests which absolutely must have FS.get() Return their specific 
filesystem must: (a) enable caching and (b) remove their FS from the cache in 
teardown (e.g. FileSystem.closeAll)

This is probably going to force a review of all the tests, maybe have some 
method in AbstractS3ATestBase

{code}
protected boolean uniqueFilesystemInstance() { return false; }
{code}

then 
# if true, in createConfiguration() call {{disableFilesystemCaching}}
# if true in teardown: close the FS.

Next:
*  go through all uses of the {{disableFilesystemCaching}}, and in those tests 
have {{uniqueFilesystemInstance}} return true. 
* Look at uses of  {{S3ATestUtils.createTestFilesystem()}} & make sure they are 
closing this after

This is going to span all the tests. Joy






was (Author: ste...@apache.org):
bq. The FS cache really feels inherently broken in the parallel tests case, 
which is why I initially liked the idea of disabling caching for the tests.

parallel tests run in their own JVMs,  the main issue there is that you need to 
be confident that tests aren't writing to the same local/remote paths.

At the same time, I've seen old test instances get recycled, which makes me 
thing that the parallel runner fields work out to instantiated test runners as 
they complete individual test cases. So reuse does happen, it just happens a 
lot more in sequential runs.

Tests which want special configs of the FS can't handle recycled classes, hence 
the need for  nee filesystems & close after, but I don't see why other tests 
should be closing much.

* HADOOP-13131 added the close, along with 
{{S3ATestUtils.createTestFilesystem()}}, which does create filesystems that 
need to be closed.
* But I don't see that creation happening much, especially given 
{{S3AContract}}'s FS is just from a get().
* Many tests call {{S3ATestUtils.disableFilesystemCaching(conf)}} before 
FileSystem.get, which guarantees unique instances

Which makes me think: yes, closing the FS in teardown is overkill except in the 
special case of "creates a new filesystem() either explicilty or implicitly.

As [~mackrorysd] says: surprising this hasn't surfaced before. But to fix it 
means that it should be done properly.

* IF tests were always closed and new ones created (i.e. no catching), test 
cases run way, way faster. 
* those tests which do need their own FS instance can close it in teardown, and 
set it up.
* And those tests which absolutely must have FS.get() Return their specific 
filesystem must: (a) enable caching and (b) remove their FS from the cache in 
teardown (e.g. FileSystem.closeAll)

This is probably going to force a review of all the tests, maybe have some 
method in AbstractS3ATestBase

{code}
protected boolean uniqueFilesystemInstance() { return false; }
{code}

then 
# if true, in createConfiguration() call {{disableFilesystemCaching))
# if true in teardown: close the FS.

Next:
*  go through all uses of the {{disableFilesystemCaching)), and in those tests 
have {{uniqueFilesystemInstance}} return true. 
* Look at uses of  {{S3ATestUtils.createTestFilesystem()}} & 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-10-10 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644973#comment-16644973
 ] 

Gabor Bota edited comment on HADOOP-15819 at 10/10/18 1:47 PM:
---

I have some bad news: I get these issues for tests where 
{{FS_S3A_IMPL_DISABLE_CACHE}} set to true, so the caching _should be_ disabled.


was (Author: gabor.bota):
I have some bad news: I get these issues for tests where 
{{FS_S3A_IMPL_DISABLE_CACHE}} set to true, so the caching is disabled.

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: S3ACloseEnforcedFileSystem.java, 
> closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
>   at 
> 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-10-09 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643749#comment-16643749
 ] 

Gabor Bota edited comment on HADOOP-15819 at 10/9/18 5:37 PM:
--

Also note: I was getting interesting errors when I removed the line, maybe 
related that we haven't closed the fs:
{noformat}
ERROR] Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.778 s 
<<< FAILURE! - in org.apache.hadoop
.fs.contract.s3a.ITestS3AContractRootDir
[ERROR] 
testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.085 s
 <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.
082 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.08 s  <<< E
RROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.084
s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir.testListEmptyRootDirectory(ITestS3AContractRoo
tDir.java:63)

[ERROR] 
testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.078 s  <
<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testSimpleRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.079 s  <<
< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testMkDirDepth1(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.108 s  <<< ERRO
R!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.083
 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
(..)
{noformat}
So there should be another solution to this that won't break the tests but also 
won't cause the {{FileSystem is closed!}} issue. Running this test class 
separately all 9 tests are passing.
Could we disable the caching just for these test and have them run on a new FS 
instance?

(I get other issues as well when running the tests without closing the fs. I 
think there were many reasons the closing the fs, but I missed the history.)


was (Author: gabor.bota):
Also note: I was getting interesting errors when I removed the line, maybe 
related that we haven't closed the fs:
{noformat}
ERROR] Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.778 s 
<<< FAILURE! - in org.apache.hadoop
.fs.contract.s3a.ITestS3AContractRootDir
[ERROR] 
testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.085 s
 <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.
082 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.08 s  <<< E
RROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.084
s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir.testListEmptyRootDirectory(ITestS3AContractRoo
tDir.java:63)

[ERROR] 
testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.078 s  <
<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testSimpleRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.079 s  <<
< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testMkDirDepth1(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.108 s  <<< ERRO
R!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.083
 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
(..)
{noformat}
So there should be another solution to 

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-10-09 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643749#comment-16643749
 ] 

Gabor Bota edited comment on HADOOP-15819 at 10/9/18 5:10 PM:
--

Also note: I was getting interesting errors when I removed the line, maybe 
related that we haven't closed the fs:
{noformat}
ERROR] Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.778 s 
<<< FAILURE! - in org.apache.hadoop
.fs.contract.s3a.ITestS3AContractRootDir
[ERROR] 
testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.085 s
 <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.
082 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.08 s  <<< E
RROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.084
s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir.testListEmptyRootDirectory(ITestS3AContractRoo
tDir.java:63)

[ERROR] 
testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.078 s  <
<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testSimpleRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.079 s  <<
< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testMkDirDepth1(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.108 s  <<< ERRO
R!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.083
 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
(..)
{noformat}
So there should be another solution to this that won't break the tests but also 
won't cause the {{FileSystem is closed!}} issue. Running this test class 
separately all 9 tests are passing.
Could we disable the caching just for these test and have them run on a new FS 
instance?


was (Author: gabor.bota):
Also note: I was getting interesting errors when I removed the line, maybe 
related that we haven't closed the fs:
{noformat}
ERROR] Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.778 s 
<<< FAILURE! - in org.apache.hadoop
.fs.contract.s3a.ITestS3AContractRootDir
[ERROR] 
testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.085 s
 <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.
082 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.08 s  <<< E
RROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.084
s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir.testListEmptyRootDirectory(ITestS3AContractRoo
tDir.java:63)

[ERROR] 
testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.078 s  <
<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testSimpleRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.079 s  <<
< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testMkDirDepth1(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)  
Time elapsed: 0.108 s  <<< ERRO
R!
java.lang.IllegalArgumentException: Can not create a Path from an empty string

[ERROR] 
testRmEmptyRootDirRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir)
  Time elapsed: 0.083
 s  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
(..)
{noformat}
So there should be another solution to this that won't break the tests but also 
won't cause the {{FileSystem is closed!}} issue. 
Could we disable the caching just for these test and have them run