Thank you so much. On Mon, 19 Oct 2020 at 10:16 PM, Balaji Varadarajan <v.bal...@ymail.com.invalid> wrote:
> > We are planning to add parallel writing to Hudi (at different partition) > levels in the next release. > Balaji.V On Friday, October 16, 2020, 11:54:51 PM PDT, tanu dua < > tanu.dua...@gmail.com> wrote: > > Hi, > Do we have a support of concurrent writes in 0.6 as I got a similar > requirement to ingest parallely from multiple jobs ? I am ok even if > parallel writes are supported with different partitions. > > On Thu, 9 Jul 2020 at 9:22 AM, Vinoth Chandar <vin...@apache.org> wrote: > > > We are looking into adding support for parallel writers in 0.6.0. So that > > should help. > > > > I am curious to understand though why you prefer to have 1000 different > > writer jobs, as opposed to having just one writer. Typical use cases for > > parallel writing I have seen are related to backfills and such. > > > > +1 to Mario’s comment. Can’t think of anything else if your users are > happy > > querying 1000 tables. > > > > On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com> > > wrote: > > > > > hey Shayan, > > > > > > that seems actually a very good approach ... just curious with the glue > > > metastore you mentioned. Would it be an external metastore for spark to > > > query over ??? external in terms of not managed by Hudi ??? > > > > > > that would be my only concern ... how to maintain the sync between all > > > metadata partitions but , again, a very promising approach ! > > > > > > regards, > > > > > > Mario. > > > > > > Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com > > > > > escreveu: > > > > > > > Hi folks, > > > > > > > > We have a use-case where we want to ingest data concurrently for > > > different > > > > partitions. Currently Hudi doesn't support concurrent writes on the > > same > > > > Hudi table. > > > > > > > > One of the approaches we were thinking was to use one hudi table per > > > > partition of data. So let us say we have 1000 partitions, we will > have > > > 1000 > > > > Hudi tables which will enable us to write concurrently on each > > partition. > > > > And the metadata for each partition will be synced to a single > > metastore > > > > table (Assumption here is schema is same for all partitions). So this > > > > single metastore table can be used for all the spark, hive queries > when > > > > querying data. Basically this metastore glues all the different hudi > > > table > > > > data together in a single table. > > > > > > > > We already tested this approach and its working fine and each > partition > > > > will have its own timeline and hudi table. > > > > > > > > We wanted to know if there are some gotchas or any other issues with > > this > > > > approach to enable concurrent writes? Or if there are any other > > > approaches > > > > we can take? > > > > > > > > Thanks, > > > > Shayan > > > > > > > > >