asharma4-lucid opened a new issue #2335: URL: https://github.com/apache/hudi/issues/2335
This is in reference to the github issue. https://github.com/apache/hudi/issues/2269 In order to mitigate the issue with a very high number of value based partitions, we want to shift to range based partitions. The value based partitions were coming out to be ~900K and growing which were leading to signficant increases in hudi writing time to our COW table. This was the case even with the auto cleaner turned off. So our understanding is that if the partitions are less and are more or less constant, that issue with write performance can be mitigated. With that in mind, we have the following questions: 1) We have already seeded a hudi table with ~900 K partitions based on value of a column. Since, the subsequent writes are becoming progressively slower(even with auto cleaner turned off), we wanted to see if there is any hudi utility to copy one hudi table to another hudi table? If there is one, then can that utility be used to change the existing partitioning strategy to something else (say range based in our case) 2) What should be the optimum number of partitions for hudi write to work efficiently in your opinion? We are thinking 1000 should be a good number, but wanted to see if this can be arrived at by some logic and/or other reasoning. 3) The reason we are doing range based partitions is that we want the ability to remove lesser value (older) range partitions from the hudi table as we continue to add higher value (newer) range partitions. The reason being we want to limit the number of partitions that the hudi table will have at a time, to a constant value (1000 if that can be a good constant number of partitions based on number 2 above). With this in mind, our questions are as follows: a) Can we easily remove partitions from the hudi table by simply removing/deleting the underlying directory for the partition? Or is there any additional metadata in the hudi table structure as well which would be required to be updated? b) Can we just copy the older partition folders to a different hudi table for the purposes of cold storage? If so, would it be possible to add them back to the hudi table manually, by restoring the partition directory of the hudi table. I guess this is related to question 3.a above. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org