smengcl edited a comment on issue #696: HDDS-3056. Allow users to list volumes 
they have access to, and optionally allow all users to list all volumes
URL: https://github.com/apache/hadoop-ozone/pull/696#issuecomment-611991079
 
 
   I'm able to dig a bit into the root cause of the timeout in the new 
integration test I added.
   The symptom is that the tests succeed if I run each test case separately. 
But fails on the **second** test when I run all tests together.
   
   Turns out, when a mini ozone cluster launches for a second time in the 
**same** test class. In `setOwner()` the OM would [add the same volume to owner 
list](https://github.com/apache/hadoop-ozone/blob/80e9f0a7238953e41b06d22f0419f04ab31d4212/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/OMVolumeSetOwnerRequest.java#L156-L158)
 for a second time (the list already has the volume entry) and **succeed**, 
which is very weird.
   The result of this is a **malformed** list in `UserVolumeInfo` for the user, 
see `prevVolList` variable in the below screenshot:
   <img width="1440" alt="ss1" 
src="https://user-images.githubusercontent.com/50227127/78987769-cc2eb780-7ae3-11ea-9dc7-544b3783c667.png";>
   
   This eventually causes `testAclDisabledListAllDisallowed` to get stuck in 
the `it.hasNext()` infinite loop and timeout because of how `VolumeIterator` 
and 
[`OmMetadataManagerImpl#listVolumes`](https://github.com/apache/hadoop-ozone/blob/876bec0130094b24472a7017fdb1fd81a65023bc/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L825-L828)
 works.
   
   I am able to confirm my discovery by setting a breakpoint inside 
`addVolumeToOwnerList()`.
   
   If I only run `testAclDisabledListAllDisallowed` this one test directly in 
IntelliJ, the test case would just pass. This makes the problem weirder. 
Because I do call the shutdown function in `MiniOzoneClusterImpl` to do the 
cleanup. And it did [delete the temp directory for the entire 
cluster](https://github.com/apache/hadoop-ozone/blob/e2ebbf874d5e33565b27a24a02cfb4cee6330ea1/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java#L392).
 This in theory should have performed the clean up work.
   
   My questions:
   
   1. Unless there are some other in-memory cache (`TableCache`) that is 
accidentally persisted across mini cluster (i.e. not fully cleaned up in 
`MiniOzoneClusterImpl`)? If this is the case we just need to somehow fix the 
test utility.
   
   2. Or could it be the case that the 
[`userTable`](https://github.com/apache/hadoop-ozone/blob/876bec0130094b24472a7017fdb1fd81a65023bc/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java#L140)
 is flushed by mistake? In this case this would be a major bug (outside the 
scope of this jira) that should be fixed.
   
   Pinging for some help @bharatviswa504 @elek 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to