asharma4-lucid opened a new issue #2335:
URL: https://github.com/apache/hudi/issues/2335


   This is in reference to the github issue. 
https://github.com/apache/hudi/issues/2269
   In order to mitigate the issue with a very high number of value based 
partitions, we want to shift to range based partitions. The value based 
partitions were coming out to be ~900K and growing which were leading to 
signficant increases in hudi writing time to our COW table. This was the case 
even with the auto cleaner turned off. So our understanding is that if the 
partitions are less and are more or less constant, that issue with write 
performance can be mitigated. With that in mind, we have the following 
questions:
   
   1) We have already seeded a hudi table with ~900 K partitions based on value 
of a column. Since, the subsequent writes are becoming progressively 
slower(even with auto cleaner turned off), we wanted to see if there is any 
hudi utility to copy one hudi table to another hudi table? If there is one, 
then can that utility be used to change the existing partitioning strategy to 
something else (say range based in our case)
   
   2) What should be the optimum number of partitions for hudi write to work 
efficiently in your opinion? We are thinking 1000 should be a good number, but 
wanted to see if this can be arrived at by some logic and/or other reasoning.
   
   3) The reason we are doing range based partitions is that we want the 
ability to remove lesser value (older) range partitions from the hudi table as 
we continue to add higher value (newer) range partitions. The reason being we 
want to limit the number of partitions that the hudi table will have at a time, 
to a constant value (1000 if that can be a good constant number of partitions 
based on number 2 above). With this in mind, our questions are as follows:
   
   a) Can we easily remove partitions from the hudi table by simply 
removing/deleting the underlying directory for the partition? Or is there any 
additional metadata in the hudi table structure as well which would be required 
to be updated? 
   
   b) Can we just copy the older partition folders to a different hudi table 
for the purposes of cold storage? If so, would it be possible to add them back 
to the hudi table manually, by restoring the partition directory of the hudi 
table. I guess this is related to question 3.a above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to