+1 for the vision, personally i'm promising the incremental ETL part, with
engine like Apache Flink we can do intermediate aggregation in streaming
style.

Best,
Danny Chan

leesf <leesf0...@gmail.com> 于2021年4月14日周三 上午9:52写道:

> +1. Cool and promising.
>
> Mehrotra, Udit <udi...@amazon.com.invalid> 于2021年4月14日周三 上午2:57写道:
>
> > Agree with the rebranding Vinoth. Hudi is not just a "table format" and
> we
> > need to do justice to all the cool auxiliary features/services we have
> > built.
> >
> > Also, timeline metadata service in particular would be a really big win
> if
> > we move towards something like that.
> >
> > On 4/13/21, 11:01 AM, "Pratyaksh Sharma" <pratyaks...@gmail.com> wrote:
> >
> >     CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender and
> > know the content is safe.
> >
> >
> >
> >     Definitely we are doing much more than only ingesting and managing
> data
> >     over DFS.
> >
> >     +1 from my side as well. :)
> >
> >     On Tue, Apr 13, 2021 at 10:02 PM Susu Dong <susudo...@gmail.com>
> > wrote:
> >
> >     > I love this rebranding. Totally agree. +1
> >     >
> >     > On Wed, Apr 14, 2021 at 1:25 AM Raymond Xu <
> > xu.shiyan.raym...@gmail.com>
> >     > wrote:
> >     >
> >     > > +1 The vision looks fantastic.
> >     > >
> >     > > On Tue, Apr 13, 2021 at 7:45 AM Gary Li <gar...@apache.org>
> wrote:
> >     > >
> >     > > > Awesome summary of Hudi! +1 as well.
> >     > > >
> >     > > > Gary Li
> >     > > > On 2021/04/13 14:13:24, Rubens Rodrigues <
> > rubenssoto2...@gmail.com>
> >     > > > wrote:
> >     > > > > Excellent, I agree
> >     > > > >
> >     > > > > Em ter, 13 de abr de 2021 07:23, vino yang <
> > yanghua1...@gmail.com>
> >     > > > escreveu:
> >     > > > >
> >     > > > > > +1 Excited by this new vision!
> >     > > > > >
> >     > > > > > Best,
> >     > > > > > Vino
> >     > > > > >
> >     > > > > > Dianjin Wang <djw...@streamnative.io.invalid>
> 于2021年4月13日周二
> >     > > 下午3:53写道:
> >     > > > > >
> >     > > > > > > +1  The new brand is straightforward, a better
> description
> > of
> >     > Hudi.
> >     > > > > > >
> >     > > > > > > Best,
> >     > > > > > > Dianjin Wang
> >     > > > > > >
> >     > > > > > >
> >     > > > > > > On Tue, Apr 13, 2021 at 1:41 PM Bhavani Sudha <
> >     > > > bhavanisud...@gmail.com>
> >     > > > > > > wrote:
> >     > > > > > >
> >     > > > > > > > +1 . Cannot agree more. I think this makes total sense
> > and will
> >     > > > provide
> >     > > > > > > for
> >     > > > > > > > a much better representation of the project.
> >     > > > > > > >
> >     > > > > > > > On Mon, Apr 12, 2021 at 10:30 PM Vinoth Chandar <
> >     > > vin...@apache.org
> >     > > > >
> >     > > > > > > wrote:
> >     > > > > > > >
> >     > > > > > > > > Hello all,
> >     > > > > > > > >
> >     > > > > > > > > Reading one more article today, positioning Hudi, as
> > just a
> >     > > table
> >     > > > > > > format,
> >     > > > > > > > > made me wonder, if we have done enough justice in
> > explaining
> >     > > > what we
> >     > > > > > > have
> >     > > > > > > > > built together here.
> >     > > > > > > > > I tend to think of Hudi as the data lake platform,
> > which has
> >     > > the
> >     > > > > > > > following
> >     > > > > > > > > components, of which - one if a table format, one is
> a
> >     > > > transactional
> >     > > > > > > > > storage layer.
> >     > > > > > > > > But the whole stack we have is definitely worth more
> > than the
> >     > > > sum of
> >     > > > > > > all
> >     > > > > > > > > the parts IMO (speaking from my own experience from
> > the past
> >     > > 10+
> >     > > > > > years
> >     > > > > > > of
> >     > > > > > > > > open source software dev).
> >     > > > > > > > >
> >     > > > > > > > > Here's what we have built so far.
> >     > > > > > > > >
> >     > > > > > > > > a) *table format* : something that stores table
> > schema, a
> >     > > > metadata
> >     > > > > > > table
> >     > > > > > > > > that stores file listing today, and being extended to
> > store
> >     > > > column
> >     > > > > > > ranges
> >     > > > > > > > > and more in the future (RFC-27)
> >     > > > > > > > > b) *aux metadata* : bloom filters, external record
> > level
> >     > > indexes
> >     > > > > > today,
> >     > > > > > > > > bitmaps/interval trees and other advanced on-disk
> data
> >     > > structures
> >     > > > > > > > tomorrow
> >     > > > > > > > > c) *concurrency control* : we always supported MVCC
> > based log
> >     > > > based
> >     > > > > > > > > concurrency (serialize writes into a time ordered
> > log), and
> >     > we
> >     > > > now
> >     > > > > > also
> >     > > > > > > > > have OCC for batch merge workloads with 0.8.0. We
> will
> > have
> >     > > > > > multi-table
> >     > > > > > > > and
> >     > > > > > > > > fully non-blocking writers soon (see future work
> > section of
> >     > > > RFC-22)
> >     > > > > > > > > d) *updates/deletes* : this is the bread-and-butter
> > use-case
> >     > > for
> >     > > > > > Hudi,
> >     > > > > > > > but
> >     > > > > > > > > we support primary/unique key constraints and we
> could
> > add
> >     > > > foreign
> >     > > > > > keys
> >     > > > > > > > as
> >     > > > > > > > > an extension, once our transactions can span tables.
> >     > > > > > > > > e) *table services*: a hudi pipeline today is
> > self-managing -
> >     > > > sizes
> >     > > > > > > > files,
> >     > > > > > > > > cleans, compacts, clusters data, bootstraps existing
> > data -
> >     > all
> >     > > > these
> >     > > > > > > > > actions working off each other without blocking one
> > another.
> >     > > (for
> >     > > > > > most
> >     > > > > > > > > parts).
> >     > > > > > > > > f) *data services*: we also have higher level
> > functionality
> >     > > with
> >     > > > > > > > > deltastreamer sources (scalable DFS listing source,
> > Kafka,
> >     > > > Pulsar is
> >     > > > > > > > > coming, ...and more), incremental ETL support,
> >     > de-duplication,
> >     > > > commit
> >     > > > > > > > > callbacks, pre-commit validations are coming, error
> > tables
> >     > have
> >     > > > been
> >     > > > > > > > > proposed. I could also envision us building towards
> > streaming
> >     > > > egress,
> >     > > > > > > > data
> >     > > > > > > > > monitoring.
> >     > > > > > > > >
> >     > > > > > > > > I also think we should build the following (subject
> to
> >     > separate
> >     > > > > > > > > DISCUSS/RFCs)
> >     > > > > > > > >
> >     > > > > > > > > g) *caching service*: Hudi specific caching service
> > that can
> >     > > hold
> >     > > > > > > mutable
> >     > > > > > > > > data and serve oft-queried data across engines.
> >     > > > > > > > > h) t*imeline metaserver:* We already run a metaserver
> > in
> >     > spark
> >     > > > > > > > > writer/drivers, backed by rocksDB & even Hudi's
> > metadata
> >     > table.
> >     > > > Let's
> >     > > > > > > > turn
> >     > > > > > > > > it into a scalable, sharded metastore, that all
> > engines can
> >     > use
> >     > > > to
> >     > > > > > > obtain
> >     > > > > > > > > any metadata.
> >     > > > > > > > >
> >     > > > > > > > > To this end, I propose we rebrand to "*Data Lake
> > Platform*"
> >     > as
> >     > > > > > opposed
> >     > > > > > > to
> >     > > > > > > > > "ingests & manages storage of large analytical
> > datasets over
> >     > > DFS
> >     > > > > > (hdfs
> >     > > > > > > or
> >     > > > > > > > > cloud stores)." and convey the scope of our vision,
> >     > > > > > > > > given we have already been building towards that. It
> > would
> >     > also
> >     > > > > > provide
> >     > > > > > > > new
> >     > > > > > > > > contributors a good lens to look at the project from.
> >     > > > > > > > >
> >     > > > > > > > > (This is very similar to for e.g, the evolution of
> > Kafka
> >     > from a
> >     > > > > > pub-sub
> >     > > > > > > > > system, to an event streaming platform - with
> addition
> > of
> >     > > > > > > > > MirrorMaker/Connect etc. )
> >     > > > > > > > >
> >     > > > > > > > > Please share your thoughts!
> >     > > > > > > > >
> >     > > > > > > > > Thanks
> >     > > > > > > > > Vinoth
> >     > > > > > > > >
> >     > > > > > > >
> >     > > > > > >
> >     > > > > >
> >     > > > >
> >     > > >
> >     > >
> >     >
> >
> >
>

Reply via email to