yes. we always keep atleast one version out, since deleting it could fail
the queries..
Thanks for the feedback. Will not remove it then.

We can work towards Impala support for your use-case, as a long term
solution. And revisit later may be

On Tue, Jun 11, 2019 at 9:54 PM Gary Li <yanjia.gary...@gmail.com> wrote:

> Thanks, Vinoth. That's very helpful.
>
> When I was using data consumers that don't support hoodie format, I have to
> use KEEP_LATEST_FILE_VERSIONS and CLEANER_FILE_VERSIONS_RETAINED_PROP = "1"
> to keep the parquet files clean, as discussed in
> https://github.com/apache/incubator-hudi/issues/715. When I use
> KEEP_LATEST_COMMITS with hoodie.cleaner.commits.retained = "1", I will
> still have two versions of parquet files.
>
> Comparing with running batch jobs, this way actually make my situation much
> better. So I'd recommend not to retire KEEP_LATEST_FILE_VERSIONS and some
> people might find it useful as I do.
>
> Thanks!
> Gary
>
>
> On Tue, Jun 11, 2019 at 9:20 AM Vinoth Chandar <vin...@apache.org> wrote:
>
> > Cool. So, cleaning policy determines how we clean up older versions of
> file
> > groups (simplistically old parquet and log files), to bound storage
> growth,
> >
> > KEEP_LATEST_COMMITS (default) : Retains (does not delete) any file
> (slice)
> > that was touched in the last X commits. The idea here is that you are
> able
> > to pull the incremental changes worth upto X commits.
> > KEEP_LATEST_FILE_VERSIONS :  If you are not interested in incremental
> pull
> > at all, you can choose to just retain X files (slices) per file group
> (i.e
> > files that share same prefix) instead. This could result in fewer files
> in
> > some cases.
> >
> > In practice, we always use KEEP_LATEST_COMMITS, I keep thinking about
> > starting a discussion to retire LATEST_FILE_VERSIONS actually..
> >
> > Hope that helps.
> >
> > On Tue, Jun 11, 2019 at 9:05 AM Gary Li <yanjia.gary...@gmail.com>
> wrote:
> >
> > > Hello Vinoth,
> > >
> > > Yes, that’s what I mean.
> > >
> > > Thanks
> > > Gary
> > >
> > > On Tue, Jun 11, 2019 at 9:03 AM Vinoth Chandar <vin...@apache.org>
> > wrote:
> > >
> > > > Hi Gary,
> > > >
> > > > Do  you mean cleaning policy?  KEEP_LATEST_FILE_VERSIONS vs
> > > >  KEEP_LATEST_COMMITS ?
> > > >
> > > > Thanks
> > > > VInoth
> > > >
> > > > On Mon, Jun 10, 2019 at 9:47 PM Gary Li <yanjia.gary...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am a little confused when I was looking at the compaction policy.
> > > What
> > > > is
> > > > > the difference between KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION?
> > What
> > > is
> > > > > the exact definition of "COMMIT" and "VERSION"?
> > > > >
> > > > > Thanks,
> > > > > Gary
> > > > >
> > > >
> > >
> >
>

Reply via email to