Hi david, Thanks for starting this discussion, i have some questions and inputs
1. in solution 1, it just plane compression, where we will get the benefit of size, but still we will face, reliability issues in case of concurrency. So can be -1. 2. solution 2 writing, and reading to separate files is pretty good idea, in order to avoid many issues which i mentioned in point 1. You mentioned a new format, what my understanding is, you will have a new file which contains list of all table status like "statusFileName":"status-uuid1","status-uuid2",.. and you store "status-uuid1" files in metadata. Am i right? If I am, then your plan is to read this new format and then go to actual files right? When do you merge all these files, and what is the threshold for these files, i mean to say on what basis you decide you should create new status file? 3. Solution 3: writing a delta file what is the obvious benefit we gonna get? whenever i query, we need to read all the status and decide the valid segments right? I dont think we get any benefit here, correct me if my understanding is wrong. 4. This is better idea to keep in progress in other file, with this we can avoid some unnecessary validations in many operations. But this we need to decide with which solution we need to combine, may be once i get my doubts cleared, i can suggest some. * Suggestion/Idea:* Now we have table status file with so many details, but in all the cases we do not read or required all details, can we have some abstraction layer, or status on top of the actual status with some above optimizations, so that we will read less/only required data especially during query? Regards, Akash -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/