Vinoth,
Thanks for clarification. :-).
I looked at the email from a periphery without getting into details. I will
review it thoroughly in few days and catch up.
Thanks,
On Oct 23 2019, at 3:06 pm, Vinoth Chandar <[email protected]> wrote:
> yes. to the append log, that is used for compaction. My guess was Kabeer's
> concern was around actually sending user data into debug logs (slf4j/log4j)
> which we dont.
>
> On the second part. yes, we want to option to write parquet data inline
> instead of avro. Once we harden this, any other format e.g Orc would also
> be easy to do. Thats my thinking. WDYT?
>
> On Wed, Oct 23, 2019 at 6:28 AM Jaimin Shah <[email protected]>
> wrote:
>
> > Hi Vinoth,
> > Aren’t we writing user data to append log currently? The way I
> > understand is that currently data is written in avro which you want to move
> > to inline parquet. Please correct me if I am missing something.
> >
> > Thanks,
> > Jaimin
> >
> > On Wednesday, 23 October 2019, Vinoth Chandar <[email protected]> wrote:
> > > Sure. Take your time! Just to clarify, here log refers to the Hudi
> > append
> > > log, not user's log4j or such logs. yes that would be very strange to do.
> > > :)
> > >
> > > On Wed, Oct 23, 2019 at 3:06 AM Kabeer Ahmed <[email protected]>
> > wrote:
> > >
> > > > Hi Vinoth,
> > > > Have crazy week and the next 2 to 3 weeks are going to be very busy. I
> > > > havent had a chance to look into this.
> > > > My thoughts are around security. The ideas of building external indexes
> > > > come with loads of advantages and throwing user data into the logs etc
> > > > makes me anxious. Let me do a deep dive and come back to you.
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > On Oct 21 2019, at 3:07 pm, Vinoth Chandar <[email protected]> wrote:
> > > > > Any thoughts? :) anyone?
> > > > >
> > > > > On Wed, Oct 9, 2019 at 11:06 AM Vinoth Chandar <[email protected]>
> > > > wrote:
> > > > > > Hi all,
> > > > > > Wanted to share some prototyping I was doing for HUDI-46. The idea
> > > > >
> > > >
> > > > here is
> > > > > > to see if we can embed a parquet file "inline" into an outer file
> > > > >
> > > >
> > >
> > > (our
> > > > > > log), so that if the user chooses to they can also get parquet data
> > > > >
> > > >
> > >
> > > in
> > > > the
> > > > > > logs to speed up real-time view queries. We would be using the
> > > > >
> > > >
> > >
> > > standard
> > > > > > ParquetWriter and ParquetReader on top of a custom FileSystem
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > >
> > > > https://github.com/vinothchandar/incubator-hudi/commit/
> > > c60f4578f794d0f0d0e194b3e509cc0c5f132576
> > > > > > Wrote a small PoC with TODOs and gaps annotated. Wanted to see if
> > > > >
> > > >
> > >
> >
> > you
> > > > all
> > > > > > can poke more holes here and see if can generalize to embedding any
> > > > >
> > > >
> > > > file
> > > > > > for e.g HFile..
> > > > > >
> > > > > > I believe we can generalize it and thus build things like external
> > > > > > indexing very easily on the existing log format.
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > >
> > > >
> > >
> >
>
>