Thanks for the suggestion Ravi. We can include a property in the clean files command which can decide if we want to dry run. clean files on table t1 options('dry_run' = true) --> This will only show the segments which will be removed and will not clean/delete those segments or any data for that matter.
By default, the dry_run will be set as false and the user can configure it when they want to use it. Rgds, Vikram On Mon, Sep 28, 2020 at 11:57 AM Akash r <akashr...@gmail.com> wrote: > +1 for ravi's comment. It's better, clean and safe. > > Regards, > Akash R Nilugal > > On Thu, Sep 24, 2020, 8:34 PM Ravindra Pesala <ravi.pes...@gmail.com> > wrote: > > > Hi Vikram, > > > > +1 > > > > It is good to remove the automatic cleanup. > > But I am still worried about the clean file command executed by user as > > well. We need to enhance the clean file command to introduce dry run to > > print what segments it is going to be deleted and what is left. If user > ok > > with dry run result then he can go for actual run. > > > > Regards, > > Ravindra. > > > > On Mon, 21 Sep 2020 at 1:27 PM, Vikram Ahuja <vikramahuja8...@gmail.com> > > wrote: > > > > > Hi Ravi and David, > > > > > > > > > > > > 1. All the automatic clean data in the case of > load/insert/compact/delete > > > > > > will be removed, so cleaning will only happen when the clean files > > command > > > > > > is called. > > > > > > > > > > > > 2. We will only add the data to trash when we try to clean data which > is > > in > > > > > > IN PROGRESS state. In case of COmpacted/Marked For Delete it will not > be > > > > > > moved to the trash, it will be directly deleted. The user will only be > > able > > > > > > to recover the In Progress segments if the user wants. @Ravi -> Is this > > > > > > okay for trash usage? Only using it for in progress segments. > > > > > > > > > > > > 3. No trash management will be implemented, the data will ONLY BE > REMOVED > > > > > > from the trash folder immediately when the clean files command is > called. > > > > > > There will be no time to live, the data can be kept in the trash folder > > > > > > untill the user triggers clean files command. > > > > > > > > > > > > Let me know if you have any questions. > > > > > > > > > > > > Vikram Ahuja > > > > > > > > > > > > On Fri, Sep 18, 2020 at 1:43 PM David CaiQiang <david.c...@gmail.com> > > > wrote: > > > > > > > > > > > > > agree with Ravindra, > > > > > > > > > > > > > > 1. stop all automatic clean data in > > load/insert/compact/update/delete... > > > > > > > > > > > > > > 2. when clean files command clean in-progress or uncertain data, we > can > > > > > > > move > > > > > > > them to data trash. > > > > > > > it can prevent delete useful data by mistake, we already find > this > > > > > > > issue > > > > > > > in some scenes. > > > > > > > other cases(for example clean mark_for_delete/compacted segment) > > > should > > > > > > > not use the data trash folder, clean data directly. > > > > > > > > > > > > > > 3. no need data trash management, suggest keeping it simple. > > > > > > > The clean file command should support empty trash immediately, it > > > will > > > > > > > be enough. > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- > > > > > > > Best Regards > > > > > > > David Cai > > > > > > > -- > > > > > > > Sent from: > > > > > > > > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > > > > > > > > > > > > -- > > Thanks & Regards, > > Ravi > > >