runzhiwang opened a new pull request #709: HDDS-3244. Improve write efficiency by opening RocksDB only once URL: https://github.com/apache/hadoop-ozone/pull/709 ## What changes were proposed in this pull request? What's the problem ? 1. This happens when datanode create container. I split the `HddsDispatcher.WriteChunk` into `HddsDispatcher.WriteData` and `HddsDispatcher.CommitData` as the code shows, to show the cost of them in jaeger UI. ![image](https://user-images.githubusercontent.com/51938049/77373666-cd51ac00-6da3-11ea-9c77-8d6864f05aac.png) 2. when datanode create each container, a new RocksDB instance will be [created](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java#L76) in `HddsDispatcher.WriteData` , but then the created RocksDB was [closed](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java#L83), until `HddsDispatcher.PutBlock` the RocsDB will be [opend](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java#L123) again, so the RocksDB was open twice in each datanode. And the RocksDB was not used until `HddsDispatcher.PutBlock`. 3. Besides, as the image shows, when leader datanode open RocksDB in `HddsDispatcher.WriteData` , 2 follower datanodes can not open RocksDB until the leader finish it. So the whole write cost 3 * cost(RocksDB.open) = 600ms to open RocksDB. ![image](https://user-images.githubusercontent.com/51938049/77320657-f0507180-6d4b-11ea-8188-acb26785f608.png) 4. When upload a 3KB file five times, the average cost is 912ms. ![image](https://user-images.githubusercontent.com/51938049/77320748-170ea800-6d4c-11ea-8131-81a1180a349a.png) How to fix it ? 1. Open RocksDB in `HddsDispatcher.CommitData` rather than `HddsDispatcher.WriteData`, because leader datanode and 2 follower datanodes can open RocksDB in parallel in `HddsDispatcher.CommitData`. 2. Put the RocksDB handler into cache after open it in `HddsDispatcher.CommitData`, to avoid open it again when `HddsDispatcher.PutBlock`. 3. So the whole write cost 1 * cost(RocksDB.open) = 200ms to open RocksDB. ![image](https://user-images.githubusercontent.com/51938049/77321385-17f40980-6d4d-11ea-925c-d3960342ed06.png) 4. When upload a 3KB file five times, the average cost is 516ms, improve about 44%. ![image](https://user-images.githubusercontent.com/51938049/77321360-0dd20b00-6d4d-11ea-84b3-efec6734db74.png) ## What is the link to the Apache JIRA https://issues.apache.org/jira/projects/HDDS/issues/HDDS-3244 ## How was this patch tested? I will change the UTs to pass the CI.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org