fapifta commented on PR #6992: URL: https://github.com/apache/ozone/pull/6992#issuecomment-2256243165
Hi @tanvipenumudy thank you for your work on this, I have some general doubt, let me share it. Most of our tests that were failing earlier when I moved off from getBucket call [HDDS-11235](https://issues.apache.org/jira/browse/HDDS-11235) and [HDDS-11232](https://issues.apache.org/jira/browse/HDDS-11232), I ran into test failures as semantics have changed. Here I don't see any test failures, however the semantics is changing, and this makes me nervous about the cache itself... Let's see a few cases what can happen: - in case the client initializes a FileSystem object, does a call and closes the object, then there is no change, other than we have one element in a cache that is used only once, and the cache is removed. (This is completely fine.) - in case the client initializes a FileSystem object, and issues subsequent calls that should succeed, and the calls does not contain any manipulation of the bucket(s) used after they are created and cached, then things are fine. - in case the client initializes a FileSystem object, and issues subsequent calls that should succeed, but it modifies a bucket (e.g.: changing a property) then it goes undetected from the client's point of view unless the change causes a subsequent call to fail. (I am not sure if there is such a scenario at the moment though). - in case the client initializes a FileSystem object, and issues subsequent calls that should fail in a particular way after a call removes the bucket, it might fail in a very different way, that might affects retries as well, as in this case it will be the server side that fails, and the failure handling in getBucket won't be effective as we get the bucket from the cache, and there we can not get the error. This last case as far as I can tell is not handled in the current code, and it can lead to unexpected behaviour, or behaviour we do not have tests for. There is an other thing we need to put more thought into when it comes to caching buckets. We need to consider what happens to the client, if the bucket was recreated with different properties, or removed after the client cached it, as the client uses stale information and I don't think the behaviour in these cases are fully defined yet fully correct... In this case I do not see any handling that would clear the cached bucket info, so if my assumptions are correct, the client may behave incorrectly until the cache evicts the bucket info after the 10 minute cache lifetime. Let me add one more generic thought... I believe our first goal should be to change our code to use at most one VolumeInfo, at most one BucketInfo, and then at most the required RPC calls to handle the actual operation, for example 3 in createkey (openkey and commitkey, and renameKey (from x.COPYING to x), or 1 for RenameKey, or 1 for DeleteKey and so on. Once that is done, we can introduce the cache, and cache VolumeInfo and BucketInfo objects, in order to reduce the overall volume of RPC calls that are caused by long running clients. Why? Currently the code calls getBucket here and there, by introducing caching, we either do error handling everywhere where we use the bucket object, and at the moment it is a bit hectic and error prone. If all our code that touches RPC is simplified, and it is clear how and when we call an RPC, that helps greatly to properly handle errors. I know this is arguable, and I also am sure that this is just a single point of view based on assumptions on the rest of the code I have not examined much, but this is how I think about this problem, please argue with this if I am wrong, as I would be happy to introduce the cache first and get rid of unnecessary InfoBucket calls first, but at the moment I don't think it is this simple even for just the delete/rename flow... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
