Re: Hudi - Concurrent Writes

tanu dua Mon, 19 Oct 2020 09:48:00 -0700

Thank you so much.

On Mon, 19 Oct 2020 at 10:16 PM, Balaji Varadarajan
<v.bal...@ymail.com.invalid> wrote:


>
> We are planning to add parallel writing to Hudi (at different partition)
> levels in the next release.
> Balaji.V     On Friday, October 16, 2020, 11:54:51 PM PDT, tanu dua <
> tanu.dua...@gmail.com> wrote:
>
>  Hi,
> Do we have a support of concurrent writes in 0.6 as I got a similar
> requirement to ingest parallely from multiple jobs ? I am ok even if
> parallel writes are supported with different partitions.
>
> On Thu, 9 Jul 2020 at 9:22 AM, Vinoth Chandar <vin...@apache.org> wrote:
>
> > We are looking into adding support for parallel writers in 0.6.0. So that
> > should help.
> >
> > I am curious to understand though why you prefer to have 1000 different
> > writer jobs, as opposed to having just one writer. Typical use cases for
> > parallel writing I have seen are related to backfills and such.
> >
> > +1 to Mario’s comment. Can’t think of anything else if your users are
> happy
> > querying 1000 tables.
> >
> > On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com>
> > wrote:
> >
> > > hey Shayan,
> > >
> > > that seems actually a very good approach ... just curious with the glue
> > > metastore you mentioned. Would it be an external metastore for spark to
> > > query over ??? external in terms of not managed by Hudi ???
> > >
> > > that would be my only concern ... how to maintain the sync between all
> > > metadata partitions but , again, a very promising approach !
> > >
> > > regards,
> > >
> > > Mario.
> > >
> > > Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com
> >
> > > escreveu:
> > >
> > > > Hi folks,
> > > >
> > > > We have a use-case where we want to ingest data concurrently for
> > > different
> > > > partitions. Currently Hudi doesn't support concurrent writes on the
> > same
> > > > Hudi table.
> > > >
> > > > One of the approaches we were thinking was to use one hudi table per
> > > > partition of data. So let us say we have 1000 partitions, we will
> have
> > > 1000
> > > > Hudi tables which will enable us to write concurrently on each
> > partition.
> > > > And the metadata for each partition will be synced to a single
> > metastore
> > > > table (Assumption here is schema is same for all partitions). So this
> > > > single metastore table can be used for all the spark, hive queries
> when
> > > > querying data. Basically this metastore glues all the different hudi
> > > table
> > > > data together in a single table.
> > > >
> > > > We already tested this approach and its working fine and each
> partition
> > > > will have its own timeline and hudi table.
> > > >
> > > > We wanted to know if there are some gotchas or any other issues with
> > this
> > > > approach to enable concurrent writes? Or if there are any other
> > > approaches
> > > > we can take?
> > > >
> > > > Thanks,
> > > > Shayan
> > > >
> > >
> >

Re: Hudi - Concurrent Writes

Reply via email to