smengcl edited a comment on issue #696: HDDS-3056. Allow users to list volumes they have access to, and optionally allow all users to list all volumes URL: https://github.com/apache/hadoop-ozone/pull/696#issuecomment-611991079 I'm able to dig a bit into the root cause of the timeout in the new integration test I added. The symptom is that the tests succeed if I run each test case separately. But fails on the **second** test when I run all tests together. Turns out, when a mini ozone cluster launches for a second time in the **same** test class. In `setOwner()` call the OM side would [add the same volume to owner list](https://github.com/apache/hadoop-ozone/blob/80e9f0a7238953e41b06d22f0419f04ab31d4212/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/OMVolumeSetOwnerRequest.java#L156-L158) for a second time and **succeed**, which is very weird. The result of this is a **malformed** list in `UserVolumeInfo` for the user, see `prevVolList` variable in below screenshot: <img width="1440" alt="ss1" src="https://user-images.githubusercontent.com/50227127/78987769-cc2eb780-7ae3-11ea-9dc7-544b3783c667.png"> This eventually causes `testAclDisabledListAllDisallowed` to get stuck in the `it.hasNext()` infinite loop and timeout because of how `VolumeIterator` and [`OmMetadataManagerImpl#listVolumes`](https://github.com/apache/hadoop-ozone/blob/876bec0130094b24472a7017fdb1fd81a65023bc/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L825-L828) works. I am able to confirm my discovery by setting a breakpoint inside `addVolumeToOwnerList()`. If I only run `testAclDisabledListAllDisallowed` this one test directly in IntelliJ, the test case would just pass. This makes the problem weirder. Because I do call the shutdown function in `MiniOzoneClusterImpl` to do the cleanup. And it did [delete the temp directory for the entire cluster](https://github.com/apache/hadoop-ozone/blob/e2ebbf874d5e33565b27a24a02cfb4cee6330ea1/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java#L392). This in theory should have performed the clean up work. My questions: 1. Unless there are some other in-memory cache (`TableCache`) that is accidentally persisted across mini cluster (i.e. not fully cleaned up in `MiniOzoneClusterImpl`)? If this is the case we just need to somehow fix the test utility. 2. Or could it be the case that the [`userTable`](https://github.com/apache/hadoop-ozone/blob/876bec0130094b24472a7017fdb1fd81a65023bc/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L140) is flushed by mistake? In this case this would be a major bug (outside the scope of this jira) that should be fixed. Pinging for some help @bharatviswa504 @elek
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org