yes. although I think we can also store the bloom filter in log block
footer, this way if we want to log avro we can still do it. And indexing
would work.
I plan to explain this clearly in an RFC. yes. we will definitely add other
metadata into the headers :)

Our log format is pretty flexible. (good job Nishith! :))

On Thu, Oct 24, 2019 at 5:16 AM Jaimin Shah <shahjaimin0...@gmail.com>
wrote:

> Hi Vinoth,
>   I looked at the example. I think this will enable faster realtime and
> incremental view. Along with that having boomfilter footer the same way
> which we have in parquet files can help to index log files also?
>   We are storing parquet length which helps us to read the file. Will
> having json header with file type, starting offset and length will help? I
> think it will enable us to store multiple file of same format or different
> format ( parquet + Hbase) etc ?
>
> Thanks,
> Jaimin
>
> On Thu, 24 Oct 2019 at 02:37, Kabeer Ahmed <kab...@linuxmail.org> wrote:
>
> > Vinoth,
> >
> > Thanks for clarification. :-).
> > I looked at the email from a periphery without getting into details. I
> > will review it thoroughly in few days and catch up.
> > Thanks,
> > On Oct 23 2019, at 3:06 pm, Vinoth Chandar <vin...@apache.org> wrote:
> > > yes. to the append log, that is used for compaction. My guess was
> > Kabeer's
> > > concern was around actually sending user data into debug logs
> > (slf4j/log4j)
> > > which we dont.
> > >
> > > On the second part. yes, we want to option to write parquet data inline
> > > instead of avro. Once we harden this, any other format e.g Orc would
> also
> > > be easy to do. Thats my thinking. WDYT?
> > >
> > > On Wed, Oct 23, 2019 at 6:28 AM Jaimin Shah <shahjaimin0...@gmail.com>
> > > wrote:
> > >
> > > > Hi Vinoth,
> > > > Aren’t we writing user data to append log currently? The way I
> > > > understand is that currently data is written in avro which you want
> to
> > move
> > > > to inline parquet. Please correct me if I am missing something.
> > > >
> > > > Thanks,
> > > > Jaimin
> > > >
> > > > On Wednesday, 23 October 2019, Vinoth Chandar <vin...@apache.org>
> > wrote:
> > > > > Sure. Take your time! Just to clarify, here log refers to the Hudi
> > > > append
> > > > > log, not user's log4j or such logs. yes that would be very strange
> > to do.
> > > > > :)
> > > > >
> > > > > On Wed, Oct 23, 2019 at 3:06 AM Kabeer Ahmed <kab...@linuxmail.org
> >
> > > > wrote:
> > > > >
> > > > > > Hi Vinoth,
> > > > > > Have crazy week and the next 2 to 3 weeks are going to be very
> > busy. I
> > > > > > havent had a chance to look into this.
> > > > > > My thoughts are around security. The ideas of building external
> > indexes
> > > > > > come with loads of advantages and throwing user data into the
> logs
> > etc
> > > > > > makes me anxious. Let me do a deep dive and come back to you.
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > On Oct 21 2019, at 3:07 pm, Vinoth Chandar <vin...@apache.org>
> > wrote:
> > > > > > > Any thoughts? :) anyone?
> > > > > > >
> > > > > > > On Wed, Oct 9, 2019 at 11:06 AM Vinoth Chandar <
> > vin...@apache.org>
> > > > > > wrote:
> > > > > > > > Hi all,
> > > > > > > > Wanted to share some prototyping I was doing for HUDI-46. The
> > idea
> > > > > > >
> > > > > >
> > > > > > here is
> > > > > > > > to see if we can embed a parquet file "inline" into an outer
> > file
> > > > > > >
> > > > > >
> > > > >
> > > > > (our
> > > > > > > > log), so that if the user chooses to they can also get
> parquet
> > data
> > > > > > >
> > > > > >
> > > > >
> > > > > in
> > > > > > the
> > > > > > > > logs to speed up real-time view queries. We would be using
> the
> > > > > > >
> > > > > >
> > > > >
> > > > > standard
> > > > > > > > ParquetWriter and ParquetReader on top of a custom FileSystem
> > > > > > > > implementation.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > https://github.com/vinothchandar/incubator-hudi/commit/
> > > > > c60f4578f794d0f0d0e194b3e509cc0c5f132576
> > > > > > > > Wrote a small PoC with TODOs and gaps annotated. Wanted to
> see
> > if
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > you
> > > > > > all
> > > > > > > > can poke more holes here and see if can generalize to
> > embedding any
> > > > > > >
> > > > > >
> > > > > > file
> > > > > > > > for e.g HFile..
> > > > > > > >
> > > > > > > > I believe we can generalize it and thus build things like
> > external
> > > > > > > > indexing very easily on the existing log format.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinoth
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>

Reply via email to