Great points prashant!

> So one partition has to ingest much more data than another. Basically,
one large partition delta affects the ingestion time of a smaller size
partition as well

So this indicates you want to commit writes into smaller partitions first,
without waiting for the larger partitions.. you do need concurrent writing
in that case.


> Also failure/corrupt data of one partition delta affects others if we
have single write. So we wanted these writes to be independent per
partition.

This actually seems to indicate you want restores/rollbacks at the
partition level. So if this is a hard requirement, then even concurrent
writing won’t help. You actually need separate physical tables with their
own timelines


On Thu, Jul 9, 2020 at 12:49 PM Prashant Wason <pwa...@uber.com.invalid>
wrote:

> With a large number of table you also run into the following potential
> issues:
> 1. Consistency: There is no single timeline so different tables (per
> partition) expose data from different times of ingestion. If the data
> within partitions is inter-dependent then the queries may see
> inconsistent results.
>
> 2. Complicated error handling / debugging: If some of the pipelines fail
> then data in some partitions may not have been updated for some time. This
> may lead to data consistency issues on the query side. Debugging any issue
> when 1000 separate datasets are involved is much more complicated then a
> single dataset (e.g. hudi-cli connects to one dataset at a time).
>
> 3. (Possibly minor) Excess load on the infra: With several parallel
> operations, the worst case load on the Namenode may go up N times (N=number
> of parallel pipelines). Under provisioned NameNode may lead to out of
> resource errors.
>
> 4. Adding new partitions would be complicated: Assuming you would want a
> new partition in future, the steps would be more involved.
>
> If only a few partitions are having the load issues, you can also look into
> the partitioning scheme.
> 1. Maybe invent a new column in the schema which is more
> uniformly distributed
> 2. Maybe split the loaded partitions into two partitions (range based or
> something like that)
> 3. If possible (depending on the ingestion source), prioritize ingestion
> for particular partitions (partition priority queue)
> 4. Limit the number of records ingested at a time to limit maximum job
> time
>
> Thanks
> Prashant
>
>
> On Thu, Jul 9, 2020 at 12:00 AM Shayan Hati <shayanh...@gmail.com> wrote:
>
> > Thanks for your response.
> >
> > @Mario: So the metastore can be something like a Glue/Hive metastore
> which
> > basically has the metadata about different partitions in a single table.
> > One challenge is per partition Hudi table can be queried using Hudi
> library
> > bundle, but across partitions it has to be queried based on the metastore
> > itself.
> >
> > @Vinoth: The use-case is we have different partitions and the data as
> well
> > as the load is skewed on them. So one partition has to ingest much more
> > data than another. Basically, one large partition delta affects the
> > ingestion time of a smaller size partition as well. Also failure/corrupt
> > data of one partition delta affects others if we have single write. So we
> > wanted these writes to be independent per partition.
> >
> > Also any timeline when 0.6.0 will be released?
> >
> > Thanks,
> > Shayan
> >
> >
> > On Thu, Jul 9, 2020 at 9:22 AM Vinoth Chandar <vin...@apache.org> wrote:
> >
> > > We are looking into adding support for parallel writers in 0.6.0. So
> that
> > > should help.
> > >
> > > I am curious to understand though why you prefer to have 1000 different
> > > writer jobs, as opposed to having just one writer. Typical use cases
> for
> > > parallel writing I have seen are related to backfills and such.
> > >
> > > +1 to Mario’s comment. Can’t think of anything else if your users are
> > happy
> > > querying 1000 tables.
> > >
> > > On Wed, Jul 8, 2020 at 7:28 AM Mario de Sá Vera <desav...@gmail.com>
> > > wrote:
> > >
> > > > hey Shayan,
> > > >
> > > > that seems actually a very good approach ... just curious with the
> glue
> > > > metastore you mentioned. Would it be an external metastore for spark
> to
> > > > query over ??? external in terms of not managed by Hudi ???
> > > >
> > > > that would be my only concern ... how to maintain the sync between
> all
> > > > metadata partitions but , again, a very promising approach !
> > > >
> > > > regards,
> > > >
> > > > Mario.
> > > >
> > > > Em qua., 8 de jul. de 2020 às 15:20, Shayan Hati <
> shayanh...@gmail.com
> > >
> > > > escreveu:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > We have a use-case where we want to ingest data concurrently for
> > > > different
> > > > > partitions. Currently Hudi doesn't support concurrent writes on the
> > > same
> > > > > Hudi table.
> > > > >
> > > > > One of the approaches we were thinking was to use one hudi table
> per
> > > > > partition of data. So let us say we have 1000 partitions, we will
> > have
> > > > 1000
> > > > > Hudi tables which will enable us to write concurrently on each
> > > partition.
> > > > > And the metadata for each partition will be synced to a single
> > > metastore
> > > > > table (Assumption here is schema is same for all partitions). So
> this
> > > > > single metastore table can be used for all the spark, hive queries
> > when
> > > > > querying data. Basically this metastore glues all the different
> hudi
> > > > table
> > > > > data together in a single table.
> > > > >
> > > > > We already tested this approach and its working fine and each
> > partition
> > > > > will have its own timeline and hudi table.
> > > > >
> > > > > We wanted to know if there are some gotchas or any other issues
> with
> > > this
> > > > > approach to enable concurrent writes? Or if there are any other
> > > > approaches
> > > > > we can take?
> > > > >
> > > > > Thanks,
> > > > > Shayan
> > > > >
> > > >
> > >
> >
> >
> > --
> > Shayan Hati
> >
>

Reply via email to