Re: Discussion regrading design of data load after kettle removal.

Jacky Li Tue, 11 Oct 2016 21:09:07 -0700

Hi Ravindra,

Regarding the design
(https://drive.google.com/file/d/0B4TWTVbFSTnqTF85anlDOUQ5S1BqYzFpLWcwZnBLSVVqSWpj/view),
I have following question:


1. In SortProcessorStep, I think it is better to include MergeSort in this
step also, so it includes all logic for sorting. In this case, developer can
implement a external sort (spill to files only if necessary), then the
loading process is a on-line sorting if memory is sufficient. I think it
will improve loading performance a lot.

2. In EncoderProcessorStep, apart from the dictionary encoding, what other
processing it will do? How about delta, RLE, etc.

3. In InputProcessorStep, it needs some schema definition to parse the input
and convert to the row, right? For example, how to read from JSON, AVRO
file?

Regards,
Jacky



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1783.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Discussion regrading design of data load after kettle removal.

Reply via email to