Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

Liang Chen Sat, 15 Dec 2018 01:53:11 -0800

Hi

First, let me understand your propoal,you mean : 
1, If user defines the "sort_columns=columns" : all behaviors are same as
the current, no any change.(most of users will set this key option during
create carbondata table)
2, If user doesn't define the "sort_columns" : current default behavior: all
the dimension columns are selected for sort_columns, sort_scope is
local_sort :  *you propose to change this default behavior,use the no_sort,
right ?*


if yes, I agree with this proposal. and propose to remove "empty
sort_column" option. *it would be more easy for users to understand.  If
define the sort_column, use the local_sort, if don't define the sort_column,
use the no_sort.*

Regards
Liang


Ajantha Bhat wrote
> Hi all,
> Currently in carbondata, we have 'local_sort' as default sort_scope and by
> default, all the dimension columns are selected for sort_columns.
> This will slow down the data loading.
> *To give the best performance benefit to user by default values, *
> we can change sort_scope to 'no_sort' and stop using all dimensions for
> sort_columns by default.
> Also if sort_columns are specified but sort_scope is not specified by the
> user, implicitly need to consider scort_scope as 'local_sort'.
> These default values are applicable for carbonsession, spark file format
> and SDK also. (all will have the same behavior)
> 
> With these changes below is the performance results of TPCH queries on
> 500GB data
> 
> 
> 
> ** Load time is improved nearly by 4 times. * total Query time by all
> queries is improved. (50% of queries are faster with no_sort, other 50%
> queries are slightly degraded or same. overall better performance)*
> Also when I did this change, I found few major issues from existing code
> in
> 'no_sort' and empty sort_columns flow. I have fixed that also.
> Below are the issues found,
> 
> 
> 
> 
> *[CARBONDATA-3162] Range filters don't remove null values for no_sort
> direct dictionary dimension columns. [CARBONDATA-3163] If table has
> different time format, for no_sort columns data goes as bad record (null)
> for second table when loaded after first table.[CARBONDATA-3164] During
> no_sort, exception happened at converter step is not reaching to user.
> same
> problem in SDK and spark file format flow also.Also fixed multiple test
> case issues.*
> I have already opened a PR for fixing these issues.
> https://github.com/apache/carbondata/pull/2966
> 
> Let me know if any suggestions about these changes.
> 
> Thanks,
> Ajantha





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

Reply via email to