Re: Monthly or Bi-Monthly Dev meeting?

2021-10-22 Thread Pratyaksh Sharma
I can save them all on my external hard disk. :)

On Fri, Oct 22, 2021 at 8:04 PM Vinoth Chandar  wrote:

> We could, but just need storage space over the longer term. :)
>
> On Wed, Oct 20, 2021 at 9:56 PM Raymond Xu 
> wrote:
>
> > Timing looks ok. Are we going to record the sessions too?
> >
> > On Wed, Oct 20, 2021 at 7:17 PM Vinoth Chandar 
> wrote:
> >
> > > I think we can do 7AM PST winters and 8AM summers.
> > > Will draft a page with a zoom link we can use and put up a PR.
> > >
> > >
> > > On Thu, Oct 14, 2021 at 9:48 AM Vinoth Chandar 
> > wrote:
> > >
> > > > Yes. I can do 7AM PST. Can others in PST chime in please?
> > > >
> > > > We can wrap this up this week.
> > > >
> > > > On Tue, Oct 12, 2021 at 7:25 PM Gary Li  wrote:
> > > >
> > > >> Hi Vinoth,
> > > >>
> > > >> Summertime 8 AM PST was 11 PM in China so I guess it works for some
> > > forks,
> > > >> but switching to wintertime it was 12 AM in China. It might be a bit
> > > late
> > > >> IMO. Does 3 PM UTC(7 AM PST in winter, 8 AM in summer) work?
> > > >>
> > > >> Best,
> > > >> Gary
> > > >>
> > > >> On Tue, Oct 5, 2021 at 9:20 PM Pratyaksh Sharma <
> > pratyaks...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Works for me in India :)
> > > >> >
> > > >> > On Tue, Oct 5, 2021 at 9:41 AM Vinoth Chandar 
> > > >> wrote:
> > > >> >
> > > >> > > Looks like there is enough interest here.
> > > >> > >
> > > >> > > Moving onto timing. Does 8AM PST, on the second thursday of
> every
> > > >> > > month work for everyone?
> > > >> > > This is the time I find, works best for most time zones.
> > > >> > >
> > > >> > > On Thu, Sep 23, 2021 at 1:15 PM Y Ethan Guo <
> > > ethan.guoyi...@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > +1 on monthly community sync.
> > > >> > > >
> > > >> > > > On Thu, Sep 23, 2021 at 12:32 PM Udit Mehrotra <
> > udi...@apache.org
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > +1 for the monthly meeting. It would be great to start
> syncing
> > > up
> > > >> > > > > again. Thanks Vinoth for bringing it up !
> > > >> > > > >
> > > >> > > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan <
> > n.siv...@gmail.com>
> > > >> > wrote:
> > > >> > > > > >
> > > >> > > > > > +1 on monthly meet up.
> > > >> > > > > >
> > > >> > > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang <
> > > >> yanghua1...@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > +1 for monthly
> > > >> > > > > > >
> > > >> > > > > > > Best,
> > > >> > > > > > > Vino
> > > >> > > > > > >
> > > >> > > > > > > Pratyaksh Sharma  于2021年9月23日周四
> > > >> 下午9:36写道:
> > > >> > > > > > >
> > > >> > > > > > > > Monthly should be good. Been a long time since we
> > > connected
> > > >> in
> > > >> > > > these
> > > >> > > > > > > > meetings. :)
> > > >> > > > > > > >
> > > >> > > > > > > > On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar <
> > > >> > > > > > > > mail.vinoth.chan...@gmail.com> wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > 1 hour monthly is what I was proposing to be
> specific.
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li <
> > > >> gar...@apache.org>
> > > >> > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > +1 for monthly.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Sep 23, 2021 at 8:28 PM Vinoth Chandar <
> > > >> > > > > vin...@apache.org>
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Hi all,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Once upon a time, we used to have a weekly
> > community
> > > >> > sync.
> > > >> > > > > > > Wondering
> > > >> > > > > > > > if
> > > >> > > > > > > > > > > there is interest in having a monthly or
> > bi-monthly
> > > >> dev
> > > >> > > > > meeting?
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Agenda could be
> > > >> > > > > > > > > > > - Update/Summary of all dev work tracks
> > > >> > > > > > > > > > > - Show and tell, where people can present their
> > > >> ongoing
> > > >> > > work
> > > >> > > > > > > > > > > - Open floor discussions, bring up new issues.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thanks
> > > >> > > > > > > > > > > Vinoth
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Regards,
> > > >> > > > > > -Sivabalan
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: Monthly or Bi-Monthly Dev meeting?

2021-10-22 Thread Vinoth Chandar
We could, but just need storage space over the longer term. :)

On Wed, Oct 20, 2021 at 9:56 PM Raymond Xu 
wrote:

> Timing looks ok. Are we going to record the sessions too?
>
> On Wed, Oct 20, 2021 at 7:17 PM Vinoth Chandar  wrote:
>
> > I think we can do 7AM PST winters and 8AM summers.
> > Will draft a page with a zoom link we can use and put up a PR.
> >
> >
> > On Thu, Oct 14, 2021 at 9:48 AM Vinoth Chandar 
> wrote:
> >
> > > Yes. I can do 7AM PST. Can others in PST chime in please?
> > >
> > > We can wrap this up this week.
> > >
> > > On Tue, Oct 12, 2021 at 7:25 PM Gary Li  wrote:
> > >
> > >> Hi Vinoth,
> > >>
> > >> Summertime 8 AM PST was 11 PM in China so I guess it works for some
> > forks,
> > >> but switching to wintertime it was 12 AM in China. It might be a bit
> > late
> > >> IMO. Does 3 PM UTC(7 AM PST in winter, 8 AM in summer) work?
> > >>
> > >> Best,
> > >> Gary
> > >>
> > >> On Tue, Oct 5, 2021 at 9:20 PM Pratyaksh Sharma <
> pratyaks...@gmail.com>
> > >> wrote:
> > >>
> > >> > Works for me in India :)
> > >> >
> > >> > On Tue, Oct 5, 2021 at 9:41 AM Vinoth Chandar 
> > >> wrote:
> > >> >
> > >> > > Looks like there is enough interest here.
> > >> > >
> > >> > > Moving onto timing. Does 8AM PST, on the second thursday of every
> > >> > > month work for everyone?
> > >> > > This is the time I find, works best for most time zones.
> > >> > >
> > >> > > On Thu, Sep 23, 2021 at 1:15 PM Y Ethan Guo <
> > ethan.guoyi...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > +1 on monthly community sync.
> > >> > > >
> > >> > > > On Thu, Sep 23, 2021 at 12:32 PM Udit Mehrotra <
> udi...@apache.org
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > +1 for the monthly meeting. It would be great to start syncing
> > up
> > >> > > > > again. Thanks Vinoth for bringing it up !
> > >> > > > >
> > >> > > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan <
> n.siv...@gmail.com>
> > >> > wrote:
> > >> > > > > >
> > >> > > > > > +1 on monthly meet up.
> > >> > > > > >
> > >> > > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang <
> > >> yanghua1...@gmail.com>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > +1 for monthly
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Vino
> > >> > > > > > >
> > >> > > > > > > Pratyaksh Sharma  于2021年9月23日周四
> > >> 下午9:36写道:
> > >> > > > > > >
> > >> > > > > > > > Monthly should be good. Been a long time since we
> > connected
> > >> in
> > >> > > > these
> > >> > > > > > > > meetings. :)
> > >> > > > > > > >
> > >> > > > > > > > On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar <
> > >> > > > > > > > mail.vinoth.chan...@gmail.com> wrote:
> > >> > > > > > > >
> > >> > > > > > > > > 1 hour monthly is what I was proposing to be specific.
> > >> > > > > > > > >
> > >> > > > > > > > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li <
> > >> gar...@apache.org>
> > >> > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > +1 for monthly.
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Sep 23, 2021 at 8:28 PM Vinoth Chandar <
> > >> > > > > vin...@apache.org>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi all,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Once upon a time, we used to have a weekly
> community
> > >> > sync.
> > >> > > > > > > Wondering
> > >> > > > > > > > if
> > >> > > > > > > > > > > there is interest in having a monthly or
> bi-monthly
> > >> dev
> > >> > > > > meeting?
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Agenda could be
> > >> > > > > > > > > > > - Update/Summary of all dev work tracks
> > >> > > > > > > > > > > - Show and tell, where people can present their
> > >> ongoing
> > >> > > work
> > >> > > > > > > > > > > - Open floor discussions, bring up new issues.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thanks
> > >> > > > > > > > > > > Vinoth
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Regards,
> > >> > > > > > -Sivabalan
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: feature request/proposal: leverage bloom indexes for readingb

2021-10-22 Thread Vinoth Chandar
Hi Nicolas,

Thanks for raising this! I think it's a very valid ask.
https://issues.apache.org/jira/browse/HUDI-2601 has been raised.

As a proof of concept, would you be able to give filterExists() a shot  and
see if the filtering time improves?
https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java#L172

In the upcoming 0.10.0 release, we are planning to move the bloom filters
out to a partition on the metadata table, to even speed this up for very
large tables.
https://issues.apache.org/jira/browse/HUDI-1295

Please let us know if you are interested in testing that when the PR is up.

Thanks
Vinoth

On Tue, Oct 19, 2021 at 4:38 AM Nicolas Paris 
wrote:

> hi !
>
> In my use case, for GDPR I have to export all informations of a given
> user from several hudi HUGE tables. Filtering the table results in a
> full scan of around 10 hours and this will get worst year after year.
>
> Since the filter criteria is based on the bloom key (user_id) it would
> be handy to exploit the bloom and produce a temporary table (in the
> metastore for eg) with the resulting rows.
>
> So far the bloom indexing is used for update/delete operations on a hudi
> table.
>
> 1. There is a oportunity to exploit the bloom for select operations.
> the hudi options would be:
> operation: select
> result-table: 
> result-path: 
> result-schema:  (optional ; when empty no
> sync with the hms, only raw path)
>
>
> 2. It could be implemented as predicate push down in the spark
> datasource API. When filtering with a IN statement.
>
>
> Thought ?
>