Hi All,

Currently in carbondata spark integration module CarbonScanRDD, carbon is 
overriding spark task distribution mechanism. This is required in older version 
of carbon, because in carbon V1 and V2 format the blocklet size in the file is 
small, by distributing spark task as per number of blocklet it can improve task 
parallelism. 

However, this feature is not required for V3 format, since the blocklet size 
now is much bigger, so it is not much benefit we can get from this feature and 
it makes code very complex. Furthermore, it is not good to manipulate even the 
executor allocation in carbon layer.

So I suggest to remove this feature.

Regards,
Jacky Li

Reply via email to