Re: Hudi - Concurrent Writes

tanu dua Fri, 16 Oct 2020 23:55:11 -0700

Hi,
Do we have a support of concurrent writes in 0.6 as I got a similar
requirement to ingest parallely from multiple jobs ? I am ok even if
parallel writes are supported with different partitions.


On Thu, 9 Jul 2020 at 9:22 AM, Vinoth Chandar <vin...@apache.org> wrote:

> We are looking into adding support for parallel writers in 0.6.0. So that
> should help.
>
> I am curious to understand though why you prefer to have 1000 different
> writer jobs, as opposed to having just one writer. Typical use cases for
> parallel writing I have seen are related to backfills and such.
>
> +1 to Mario’s comment. Can’t think of anything else if your users are happy
> querying 1000 tables.
>
> On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com>
> wrote:
>
> > hey Shayan,
> >
> > that seems actually a very good approach ... just curious with the glue
> > metastore you mentioned. Would it be an external metastore for spark to
> > query over ??? external in terms of not managed by Hudi ???
> >
> > that would be my only concern ... how to maintain the sync between all
> > metadata partitions but , again, a very promising approach !
> >
> > regards,
> >
> > Mario.
> >
> > Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com>
> > escreveu:
> >
> > > Hi folks,
> > >
> > > We have a use-case where we want to ingest data concurrently for
> > different
> > > partitions. Currently Hudi doesn't support concurrent writes on the
> same
> > > Hudi table.
> > >
> > > One of the approaches we were thinking was to use one hudi table per
> > > partition of data. So let us say we have 1000 partitions, we will have
> > 1000
> > > Hudi tables which will enable us to write concurrently on each
> partition.
> > > And the metadata for each partition will be synced to a single
> metastore
> > > table (Assumption here is schema is same for all partitions). So this
> > > single metastore table can be used for all the spark, hive queries when
> > > querying data. Basically this metastore glues all the different hudi
> > table
> > > data together in a single table.
> > >
> > > We already tested this approach and its working fine and each partition
> > > will have its own timeline and hudi table.
> > >
> > > We wanted to know if there are some gotchas or any other issues with
> this
> > > approach to enable concurrent writes? Or if there are any other
> > approaches
> > > we can take?
> > >
> > > Thanks,
> > > Shayan
> > >
> >
>

Re: Hudi - Concurrent Writes

Reply via email to