Re: Monthly or Bi-Monthly Dev meeting?
I can save them all on my external hard disk. :) On Fri, Oct 22, 2021 at 8:04 PM Vinoth Chandar wrote: > We could, but just need storage space over the longer term. :) > > On Wed, Oct 20, 2021 at 9:56 PM Raymond Xu > wrote: > > > Timing looks ok. Are we going to record the sessions too? > > > > On Wed, Oct 20, 2021 at 7:17 PM Vinoth Chandar > wrote: > > > > > I think we can do 7AM PST winters and 8AM summers. > > > Will draft a page with a zoom link we can use and put up a PR. > > > > > > > > > On Thu, Oct 14, 2021 at 9:48 AM Vinoth Chandar > > wrote: > > > > > > > Yes. I can do 7AM PST. Can others in PST chime in please? > > > > > > > > We can wrap this up this week. > > > > > > > > On Tue, Oct 12, 2021 at 7:25 PM Gary Li wrote: > > > > > > > >> Hi Vinoth, > > > >> > > > >> Summertime 8 AM PST was 11 PM in China so I guess it works for some > > > forks, > > > >> but switching to wintertime it was 12 AM in China. It might be a bit > > > late > > > >> IMO. Does 3 PM UTC(7 AM PST in winter, 8 AM in summer) work? > > > >> > > > >> Best, > > > >> Gary > > > >> > > > >> On Tue, Oct 5, 2021 at 9:20 PM Pratyaksh Sharma < > > pratyaks...@gmail.com> > > > >> wrote: > > > >> > > > >> > Works for me in India :) > > > >> > > > > >> > On Tue, Oct 5, 2021 at 9:41 AM Vinoth Chandar > > > >> wrote: > > > >> > > > > >> > > Looks like there is enough interest here. > > > >> > > > > > >> > > Moving onto timing. Does 8AM PST, on the second thursday of > every > > > >> > > month work for everyone? > > > >> > > This is the time I find, works best for most time zones. > > > >> > > > > > >> > > On Thu, Sep 23, 2021 at 1:15 PM Y Ethan Guo < > > > ethan.guoyi...@gmail.com > > > >> > > > > >> > > wrote: > > > >> > > > > > >> > > > +1 on monthly community sync. > > > >> > > > > > > >> > > > On Thu, Sep 23, 2021 at 12:32 PM Udit Mehrotra < > > udi...@apache.org > > > > > > > >> > > wrote: > > > >> > > > > > > >> > > > > +1 for the monthly meeting. It would be great to start > syncing > > > up > > > >> > > > > again. Thanks Vinoth for bringing it up ! > > > >> > > > > > > > >> > > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan < > > n.siv...@gmail.com> > > > >> > wrote: > > > >> > > > > > > > > >> > > > > > +1 on monthly meet up. > > > >> > > > > > > > > >> > > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang < > > > >> yanghua1...@gmail.com> > > > >> > > > > wrote: > > > >> > > > > > > > > >> > > > > > > +1 for monthly > > > >> > > > > > > > > > >> > > > > > > Best, > > > >> > > > > > > Vino > > > >> > > > > > > > > > >> > > > > > > Pratyaksh Sharma 于2021年9月23日周四 > > > >> 下午9:36写道: > > > >> > > > > > > > > > >> > > > > > > > Monthly should be good. Been a long time since we > > > connected > > > >> in > > > >> > > > these > > > >> > > > > > > > meetings. :) > > > >> > > > > > > > > > > >> > > > > > > > On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar < > > > >> > > > > > > > mail.vinoth.chan...@gmail.com> wrote: > > > >> > > > > > > > > > > >> > > > > > > > > 1 hour monthly is what I was proposing to be > specific. > > > >> > > > > > > > > > > > >> > > > > > > > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li < > > > >> gar...@apache.org> > > > >> > > > wrote: > > > >> > > > > > > > > > > > >> > > > > > > > > > +1 for monthly. > > > >> > > > > > > > > > > > > >> > > > > > > > > > On Thu, Sep 23, 2021 at 8:28 PM Vinoth Chandar < > > > >> > > > > vin...@apache.org> > > > >> > > > > > > > > wrote: > > > >> > > > > > > > > > > > > >> > > > > > > > > > > Hi all, > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Once upon a time, we used to have a weekly > > community > > > >> > sync. > > > >> > > > > > > Wondering > > > >> > > > > > > > if > > > >> > > > > > > > > > > there is interest in having a monthly or > > bi-monthly > > > >> dev > > > >> > > > > meeting? > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Agenda could be > > > >> > > > > > > > > > > - Update/Summary of all dev work tracks > > > >> > > > > > > > > > > - Show and tell, where people can present their > > > >> ongoing > > > >> > > work > > > >> > > > > > > > > > > - Open floor discussions, bring up new issues. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Thanks > > > >> > > > > > > > > > > Vinoth > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > -- > > > >> > > > > > Regards, > > > >> > > > > > -Sivabalan > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >
Re: Monthly or Bi-Monthly Dev meeting?
We could, but just need storage space over the longer term. :) On Wed, Oct 20, 2021 at 9:56 PM Raymond Xu wrote: > Timing looks ok. Are we going to record the sessions too? > > On Wed, Oct 20, 2021 at 7:17 PM Vinoth Chandar wrote: > > > I think we can do 7AM PST winters and 8AM summers. > > Will draft a page with a zoom link we can use and put up a PR. > > > > > > On Thu, Oct 14, 2021 at 9:48 AM Vinoth Chandar > wrote: > > > > > Yes. I can do 7AM PST. Can others in PST chime in please? > > > > > > We can wrap this up this week. > > > > > > On Tue, Oct 12, 2021 at 7:25 PM Gary Li wrote: > > > > > >> Hi Vinoth, > > >> > > >> Summertime 8 AM PST was 11 PM in China so I guess it works for some > > forks, > > >> but switching to wintertime it was 12 AM in China. It might be a bit > > late > > >> IMO. Does 3 PM UTC(7 AM PST in winter, 8 AM in summer) work? > > >> > > >> Best, > > >> Gary > > >> > > >> On Tue, Oct 5, 2021 at 9:20 PM Pratyaksh Sharma < > pratyaks...@gmail.com> > > >> wrote: > > >> > > >> > Works for me in India :) > > >> > > > >> > On Tue, Oct 5, 2021 at 9:41 AM Vinoth Chandar > > >> wrote: > > >> > > > >> > > Looks like there is enough interest here. > > >> > > > > >> > > Moving onto timing. Does 8AM PST, on the second thursday of every > > >> > > month work for everyone? > > >> > > This is the time I find, works best for most time zones. > > >> > > > > >> > > On Thu, Sep 23, 2021 at 1:15 PM Y Ethan Guo < > > ethan.guoyi...@gmail.com > > >> > > > >> > > wrote: > > >> > > > > >> > > > +1 on monthly community sync. > > >> > > > > > >> > > > On Thu, Sep 23, 2021 at 12:32 PM Udit Mehrotra < > udi...@apache.org > > > > > >> > > wrote: > > >> > > > > > >> > > > > +1 for the monthly meeting. It would be great to start syncing > > up > > >> > > > > again. Thanks Vinoth for bringing it up ! > > >> > > > > > > >> > > > > On Thu, Sep 23, 2021 at 12:14 PM Sivabalan < > n.siv...@gmail.com> > > >> > wrote: > > >> > > > > > > > >> > > > > > +1 on monthly meet up. > > >> > > > > > > > >> > > > > > On Thu, Sep 23, 2021 at 11:01 AM vino yang < > > >> yanghua1...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > > >> > > > > > > +1 for monthly > > >> > > > > > > > > >> > > > > > > Best, > > >> > > > > > > Vino > > >> > > > > > > > > >> > > > > > > Pratyaksh Sharma 于2021年9月23日周四 > > >> 下午9:36写道: > > >> > > > > > > > > >> > > > > > > > Monthly should be good. Been a long time since we > > connected > > >> in > > >> > > > these > > >> > > > > > > > meetings. :) > > >> > > > > > > > > > >> > > > > > > > On Thu, Sep 23, 2021 at 7:02 PM Vinoth Chandar < > > >> > > > > > > > mail.vinoth.chan...@gmail.com> wrote: > > >> > > > > > > > > > >> > > > > > > > > 1 hour monthly is what I was proposing to be specific. > > >> > > > > > > > > > > >> > > > > > > > > On Thu, Sep 23, 2021 at 6:30 AM Gary Li < > > >> gar...@apache.org> > > >> > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > > +1 for monthly. > > >> > > > > > > > > > > > >> > > > > > > > > > On Thu, Sep 23, 2021 at 8:28 PM Vinoth Chandar < > > >> > > > > vin...@apache.org> > > >> > > > > > > > > wrote: > > >> > > > > > > > > > > > >> > > > > > > > > > > Hi all, > > >> > > > > > > > > > > > > >> > > > > > > > > > > Once upon a time, we used to have a weekly > community > > >> > sync. > > >> > > > > > > Wondering > > >> > > > > > > > if > > >> > > > > > > > > > > there is interest in having a monthly or > bi-monthly > > >> dev > > >> > > > > meeting? > > >> > > > > > > > > > > > > >> > > > > > > > > > > Agenda could be > > >> > > > > > > > > > > - Update/Summary of all dev work tracks > > >> > > > > > > > > > > - Show and tell, where people can present their > > >> ongoing > > >> > > work > > >> > > > > > > > > > > - Open floor discussions, bring up new issues. > > >> > > > > > > > > > > > > >> > > > > > > > > > > Thanks > > >> > > > > > > > > > > Vinoth > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > -- > > >> > > > > > Regards, > > >> > > > > > -Sivabalan > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >
Re: feature request/proposal: leverage bloom indexes for readingb
Hi Nicolas, Thanks for raising this! I think it's a very valid ask. https://issues.apache.org/jira/browse/HUDI-2601 has been raised. As a proof of concept, would you be able to give filterExists() a shot and see if the filtering time improves? https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java#L172 In the upcoming 0.10.0 release, we are planning to move the bloom filters out to a partition on the metadata table, to even speed this up for very large tables. https://issues.apache.org/jira/browse/HUDI-1295 Please let us know if you are interested in testing that when the PR is up. Thanks Vinoth On Tue, Oct 19, 2021 at 4:38 AM Nicolas Paris wrote: > hi ! > > In my use case, for GDPR I have to export all informations of a given > user from several hudi HUGE tables. Filtering the table results in a > full scan of around 10 hours and this will get worst year after year. > > Since the filter criteria is based on the bloom key (user_id) it would > be handy to exploit the bloom and produce a temporary table (in the > metastore for eg) with the resulting rows. > > So far the bloom indexing is used for update/delete operations on a hudi > table. > > 1. There is a oportunity to exploit the bloom for select operations. > the hudi options would be: > operation: select > result-table: > result-path: > result-schema: (optional ; when empty no > sync with the hms, only raw path) > > > 2. It could be implemented as predicate push down in the spark > datasource API. When filtering with a IN statement. > > > Thought ? >