This is an automated email from the ASF dual-hosted git repository.

ravipesala pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 10cbf4e  [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS
10cbf4e is described below

commit 10cbf4ec018de4671284e9f6974d05b22609f3a0
Author: manishnalla1994 <manish.nalla1...@gmail.com>
AuthorDate: Mon May 27 12:09:04 2019 +0530

    [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS
    
    Documentation change done for Global Sort Partitions during Range Column 
DataLoad/Compaction.
    
    This closes #3234
---
 docs/dml-of-carbondata.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index 6ec0520..3e2a22d 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -281,6 +281,8 @@ CarbonData DML statements are documented here,which 
includes:
 
     If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the 
number of partitions to use while shuffling data for sort using 
GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, 
then it uses the number of map task as reduce task. It is recommended that each 
reduce task deal with 512MB-1GB data.
     For RANGE_COLUMN, GLOBAL_SORT_PARTITIONS is used to specify the number of 
range partitions also.
+    GLOBAL_SORT_PARTITIONS should be specified optimally during RANGE_COLUMN 
LOAD because if a higher number is configured then the load time may be less 
but it will result in creation of more files which would degrade the query and 
compaction performance.
+    Conversely, if less partitions are configured then the load performance 
may degrade due to less use of parallelism but the query and compaction will 
become faster. Hence the user may choose optimal number depending on the use 
case.
   ```
   OPTIONS('GLOBAL_SORT_PARTITIONS'='2')
   ```

Reply via email to