Hi Ravindra, Regarding the design (https://drive.google.com/file/d/0B4TWTVbFSTnqTF85anlDOUQ5S1BqYzFpLWcwZnBLSVVqSWpj/view), I have following question:
1. In SortProcessorStep, I think it is better to include MergeSort in this step also, so it includes all logic for sorting. In this case, developer can implement a external sort (spill to files only if necessary), then the loading process is a on-line sorting if memory is sufficient. I think it will improve loading performance a lot. 2. In EncoderProcessorStep, apart from the dictionary encoding, what other processing it will do? How about delta, RLE, etc. 3. In InputProcessorStep, it needs some schema definition to parse the input and convert to the row, right? For example, how to read from JSON, AVRO file? Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1783.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.