Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

akashrn5 Thu, 03 Sep 2020 01:33:58 -0700

Hi david, 

Thanks for starting this discussion, i have some questions and inputs


1. in solution 1, it just plane compression, where we will get the benefit
of size,
but still we will face, reliability issues in case of concurrency. So can be
-1.

2. solution 2
writing, and reading to separate files is pretty good idea, in order to
avoid many issues 
which i mentioned in point 1.
You mentioned a new format, what my understanding is, you will have a new
file which contains list of all
table status like "statusFileName":"status-uuid1","status-uuid2",.. and you
store "status-uuid1" files in metadata. 
Am i right?

If I am, then your plan is to read this new format and then go to actual
files right?
When do you merge all these files, and what is the threshold for these
files, i mean to say on what basis you decide
you should create new status file?

3. Solution 3:
writing a delta file what is the obvious benefit we gonna get?
whenever i query, we need to read all the status and decide the valid
segments right?

I dont think we get any benefit here, correct me if my understanding is
wrong.


4. This is better idea to keep in progress in other file, with this we can
avoid some unnecessary validations
in many operations. But this we need to decide with which solution we need
to combine,
may be once i get my doubts cleared, i can suggest some.

*
Suggestion/Idea:* Now we have table status file with so many details, but in
all the cases we do not read or required all
details, can we have some abstraction layer, or status on top of the actual
status with some above optimizations, 
so that we will read less/only required data especially during query?

Regards,
Akash



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

Reply via email to