Thanks for the suggestion Ravi.

We can include a property in the clean files command which can decide if we
want to dry run.
clean files on table t1 options('dry_run' = true) --> This will only show
the segments which will be removed and will not clean/delete those segments
or any data for that matter.

By default, the dry_run will be set as false and the user can configure it
when they want to use it.

Rgds,
Vikram

On Mon, Sep 28, 2020 at 11:57 AM Akash r <akashr...@gmail.com> wrote:

> +1 for ravi's comment. It's better, clean and safe.
>
> Regards,
> Akash R Nilugal
>
> On Thu, Sep 24, 2020, 8:34 PM Ravindra Pesala <ravi.pes...@gmail.com>
> wrote:
>
> > Hi Vikram,
> >
> > +1
> >
> > It is good to remove the automatic cleanup.
> > But I am still worried about the clean file command executed by user as
> > well.  We need to enhance the clean file command to introduce dry run to
> > print what segments it is going to be deleted and what is left. If user
> ok
> > with dry run result then he can go for actual run.
> >
> > Regards,
> > Ravindra.
> >
> > On Mon, 21 Sep 2020 at 1:27 PM, Vikram Ahuja <vikramahuja8...@gmail.com>
> > wrote:
> >
> > > Hi Ravi and David,
> > >
> > >
> > >
> > > 1. All the automatic clean data in the case of
> load/insert/compact/delete
> > >
> > > will be removed, so cleaning will only happen when the clean files
> > command
> > >
> > > is called.
> > >
> > >
> > >
> > > 2. We will only add the data to trash when we try to clean data which
> is
> > in
> > >
> > > IN PROGRESS state. In case of COmpacted/Marked For Delete it will not
> be
> > >
> > > moved to the trash, it will be directly deleted. The user will only be
> > able
> > >
> > > to recover the In Progress segments if the user wants. @Ravi -> Is this
> > >
> > > okay for trash usage? Only using it for in progress segments.
> > >
> > >
> > >
> > > 3. No trash management will be implemented, the data will ONLY BE
> REMOVED
> > >
> > > from the trash folder immediately when the clean files command is
> called.
> > >
> > > There will be no time to live, the data can be kept in the trash folder
> > >
> > > untill the user triggers clean files command.
> > >
> > >
> > >
> > > Let me know if you have any questions.
> > >
> > >
> > >
> > > Vikram Ahuja
> > >
> > >
> > >
> > > On Fri, Sep 18, 2020 at 1:43 PM David CaiQiang <david.c...@gmail.com>
> > > wrote:
> > >
> > >
> > >
> > > > agree with Ravindra,
> > >
> > > >
> > >
> > > > 1. stop all automatic clean data in
> > load/insert/compact/update/delete...
> > >
> > > >
> > >
> > > > 2. when clean files command clean in-progress or uncertain data, we
> can
> > >
> > > > move
> > >
> > > > them to data trash.
> > >
> > > >     it can prevent delete useful data by mistake, we already find
> this
> > >
> > > > issue
> > >
> > > > in some scenes.
> > >
> > > >     other cases(for example clean mark_for_delete/compacted segment)
> > > should
> > >
> > > > not use the data trash folder, clean data directly.
> > >
> > > >
> > >
> > > > 3. no need data trash management, suggest keeping it simple.
> > >
> > > >     The clean file command should support empty trash immediately, it
> > > will
> > >
> > > > be enough.
> > >
> > > >
> > >
> > > >
> > >
> > > >
> > >
> > > > -----
> > >
> > > > Best Regards
> > >
> > > > David Cai
> > >
> > > > --
> > >
> > > > Sent from:
> > >
> > > >
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > >
> > > >
> > >
> > >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>

Reply via email to