Hi, We have a requirement to ingest 30M records in S3 backed up by HUDI. I am figuring out the partition strategy and ending up with lot of partitions like 25M partitions (primary partition) --> 2.5 M (secondary partition) --> 2.5 M (third partition) and each parquet file will have the records with less than 10 rows of data.
Our dataset will be ingested at once in full and then it will be incremental daily with less than 1k updates. So its more read heavy rather than write heavy So what should be the suggestion in terms of HUDI performance - go ahead with the above partition strategy or shall I reduce my partitions and increase no of rows in each parquet file.
