We are looking into adding support for parallel writers in 0.6.0. So that
should help.

I am curious to understand though why you prefer to have 1000 different
writer jobs, as opposed to having just one writer. Typical use cases for
parallel writing I have seen are related to backfills and such.

+1 to Mario’s comment. Can’t think of anything else if your users are happy
querying 1000 tables.

On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com> wrote:

> hey Shayan,
>
> that seems actually a very good approach ... just curious with the glue
> metastore you mentioned. Would it be an external metastore for spark to
> query over ??? external in terms of not managed by Hudi ???
>
> that would be my only concern ... how to maintain the sync between all
> metadata partitions but , again, a very promising approach !
>
> regards,
>
> Mario.
>
> Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <shayanh...@gmail.com>
> escreveu:
>
> > Hi folks,
> >
> > We have a use-case where we want to ingest data concurrently for
> different
> > partitions. Currently Hudi doesn't support concurrent writes on the same
> > Hudi table.
> >
> > One of the approaches we were thinking was to use one hudi table per
> > partition of data. So let us say we have 1000 partitions, we will have
> 1000
> > Hudi tables which will enable us to write concurrently on each partition.
> > And the metadata for each partition will be synced to a single metastore
> > table (Assumption here is schema is same for all partitions). So this
> > single metastore table can be used for all the spark, hive queries when
> > querying data. Basically this metastore glues all the different hudi
> table
> > data together in a single table.
> >
> > We already tested this approach and its working fine and each partition
> > will have its own timeline and hudi table.
> >
> > We wanted to know if there are some gotchas or any other issues with this
> > approach to enable concurrent writes? Or if there are any other
> approaches
> > we can take?
> >
> > Thanks,
> > Shayan
> >
>

Reply via email to