Hi folks, We have a use-case where we want to ingest data concurrently for different partitions. Currently Hudi doesn't support concurrent writes on the same Hudi table.
One of the approaches we were thinking was to use one hudi table per partition of data. So let us say we have 1000 partitions, we will have 1000 Hudi tables which will enable us to write concurrently on each partition. And the metadata for each partition will be synced to a single metastore table (Assumption here is schema is same for all partitions). So this single metastore table can be used for all the spark, hive queries when querying data. Basically this metastore glues all the different hudi table data together in a single table. We already tested this approach and its working fine and each partition will have its own timeline and hudi table. We wanted to know if there are some gotchas or any other issues with this approach to enable concurrent writes? Or if there are any other approaches we can take? Thanks, Shayan