Hi First, let me understand your propoal,you mean : 1, If user defines the "sort_columns=columns" : all behaviors are same as the current, no any change.(most of users will set this key option during create carbondata table) 2, If user doesn't define the "sort_columns" : current default behavior: all the dimension columns are selected for sort_columns, sort_scope is local_sort : *you propose to change this default behavior,use the no_sort, right ?*
if yes, I agree with this proposal. and propose to remove "empty sort_column" option. *it would be more easy for users to understand. If define the sort_column, use the local_sort, if don't define the sort_column, use the no_sort.* Regards Liang Ajantha Bhat wrote > Hi all, > Currently in carbondata, we have 'local_sort' as default sort_scope and by > default, all the dimension columns are selected for sort_columns. > This will slow down the data loading. > *To give the best performance benefit to user by default values, * > we can change sort_scope to 'no_sort' and stop using all dimensions for > sort_columns by default. > Also if sort_columns are specified but sort_scope is not specified by the > user, implicitly need to consider scort_scope as 'local_sort'. > These default values are applicable for carbonsession, spark file format > and SDK also. (all will have the same behavior) > > With these changes below is the performance results of TPCH queries on > 500GB data > > > > ** Load time is improved nearly by 4 times. * total Query time by all > queries is improved. (50% of queries are faster with no_sort, other 50% > queries are slightly degraded or same. overall better performance)* > Also when I did this change, I found few major issues from existing code > in > 'no_sort' and empty sort_columns flow. I have fixed that also. > Below are the issues found, > > > > > *[CARBONDATA-3162] Range filters don't remove null values for no_sort > direct dictionary dimension columns. [CARBONDATA-3163] If table has > different time format, for no_sort columns data goes as bad record (null) > for second table when loaded after first table.[CARBONDATA-3164] During > no_sort, exception happened at converter step is not reaching to user. > same > problem in SDK and spark file format flow also.Also fixed multiple test > case issues.* > I have already opened a PR for fixing these issues. > https://github.com/apache/carbondata/pull/2966 > > Let me know if any suggestions about these changes. > > Thanks, > Ajantha -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/