This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push: new 10cbf4e [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS 10cbf4e is described below commit 10cbf4ec018de4671284e9f6974d05b22609f3a0 Author: manishnalla1994 <manish.nalla1...@gmail.com> AuthorDate: Mon May 27 12:09:04 2019 +0530 [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS Documentation change done for Global Sort Partitions during Range Column DataLoad/Compaction. This closes #3234 --- docs/dml-of-carbondata.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md index 6ec0520..3e2a22d 100644 --- a/docs/dml-of-carbondata.md +++ b/docs/dml-of-carbondata.md @@ -281,6 +281,8 @@ CarbonData DML statements are documented here,which includes: If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data. For RANGE_COLUMN, GLOBAL_SORT_PARTITIONS is used to specify the number of range partitions also. + GLOBAL_SORT_PARTITIONS should be specified optimally during RANGE_COLUMN LOAD because if a higher number is configured then the load time may be less but it will result in creation of more files which would degrade the query and compaction performance. + Conversely, if less partitions are configured then the load performance may degrade due to less use of parallelism but the query and compaction will become faster. Hence the user may choose optimal number depending on the use case. ``` OPTIONS('GLOBAL_SORT_PARTITIONS'='2') ```