Hi, Do we have a support of concurrent writes in 0.6 as I got a similar requirement to ingest parallely from multiple jobs ? I am ok even if parallel writes are supported with different partitions.
On Thu, 9 Jul 2020 at 9:22 AM, Vinoth Chandar <vin...@apache.org> wrote: > We are looking into adding support for parallel writers in 0.6.0. So that > should help. > > I am curious to understand though why you prefer to have 1000 different > writer jobs, as opposed to having just one writer. Typical use cases for > parallel writing I have seen are related to backfills and such. > > +1 to Mario’s comment. Can’t think of anything else if your users are happy > querying 1000 tables. > > On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com> > wrote: > > > hey Shayan, > > > > that seems actually a very good approach ... just curious with the glue > > metastore you mentioned. Would it be an external metastore for spark to > > query over ??? external in terms of not managed by Hudi ??? > > > > that would be my only concern ... how to maintain the sync between all > > metadata partitions but , again, a very promising approach ! > > > > regards, > > > > Mario. > > > > Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com> > > escreveu: > > > > > Hi folks, > > > > > > We have a use-case where we want to ingest data concurrently for > > different > > > partitions. Currently Hudi doesn't support concurrent writes on the > same > > > Hudi table. > > > > > > One of the approaches we were thinking was to use one hudi table per > > > partition of data. So let us say we have 1000 partitions, we will have > > 1000 > > > Hudi tables which will enable us to write concurrently on each > partition. > > > And the metadata for each partition will be synced to a single > metastore > > > table (Assumption here is schema is same for all partitions). So this > > > single metastore table can be used for all the spark, hive queries when > > > querying data. Basically this metastore glues all the different hudi > > table > > > data together in a single table. > > > > > > We already tested this approach and its working fine and each partition > > > will have its own timeline and hudi table. > > > > > > We wanted to know if there are some gotchas or any other issues with > this > > > approach to enable concurrent writes? Or if there are any other > > approaches > > > we can take? > > > > > > Thanks, > > > Shayan > > > > > >