Hi all, This mail is regarding enhancing the clean files command. Current behaviour : Currently when clean files is called, the segments which are MARKED_FOR_DELETE or are COMPACTED are deleted and their entries are removed from tablestatus file, Fact folder and metadata/segments folder.
Enhancement behaviour idea: In this enhancement the idea is to create a trash folder(like Recycle Bin, with 777 config) which can be stored in /tmp folder(or user defined folder, a new property will be exposed). Here when ever a segment is cleaned , the necessary carbondata files (no other files) can be copied to this folder. The RecycleBin folder can have a folder for each table with name like DBName_TableName. We can keep the carbondata files here for 3 days(or as long as the user wants, a carbon property will be exposed for the same.). They can be deleted if they are not modified since 3 days or as per the property. We can maintain a thread which checks the aging time and deletes the necessary carbondata files from the trash folder. Apart from that, while cleaning INSERT_IN_PROGRESS segments will be cleaned too, but will try to get a segment lock before cleaning the INSERT_IN_PROGRESS segments. If the code is able to acquire the segment lock, i.e., it is a stale folder, it can be cleaned. If the code is not able to acquire the segment lock that means load is in progress or any other operation is in progress, in that case the INSERT_IN_PROGRESS segment will not be cleaned. Please provide input and suggestions for this enhancement idea. Thanks Vikram Ahuja -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
