[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1534


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-22 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152589826
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -0,0 +1,713 @@
+
+
+# Data Management on CarbonData
+
+This tutorial is going to introduce all commands and data operations on 
CarbonData.
+
+* [CREATE TABLE](#create-table)
+* [TABLE MANAGEMENT](#table-management)
+* [LOAD DATA](#load-data)
+* [UPDATE AND DELETE](#update-and-delete)
+* [COMPACTION](#compaction)
+* [PARTITION](#partition)
+* [BUCKETING](#bucketing)
+* [SEGMENT MANAGEMENT](#segment-management)
+
+## CREATE TABLE
+
+  This command can be used to create a CarbonData table by specifying the 
list of fields along with the table properties.
+  
+  ```
+  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , 
...)]
+  STORED BY 'carbondata'
+  [TBLPROPERTIES (property_name=property_value, ...)]
+  ```  
+  
+### Usage Guidelines
+
+  Following are the guidelines for TBLPROPERTIES, CarbonData's additional 
table options can be set via carbon.properties.
+  
+   - **Dictionary Encoding Configuration**
+
+ Dictionary encoding is turned off for all columns by default from 1.3 
onwards, you can use this command for including columns to do dictionary 
encoding.
+ Suggested use cases : do dictionary encoding for low cardinality 
columns, it might help to improve data compression ratio and performance.
+
+ ```
+ TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
+ ```
+ 
+   - **Inverted Index Configuration**
+
+ By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
+ Suggested use cases : For high cardinality columns, you can disable 
the inverted index for improving the data loading performance.
+
+ ```
+ TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3')
+ ```
+
+   - **Sort Columns Configuration**
+
+ This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
+ * If users don't specify "SORT_COLUMN" property, by default MDK index 
be built by using all dimension columns except complex datatype column. 
+ * If this property is specified but with empty argument, then the 
table will be loaded without sort..
+ Suggested use cases : Only build MDK index for required columns,it 
might help to improve the data loading performance.
+
+ ```
+ TBLPROPERTIES ('SORT_COLUMNS'='column1, column3')
+ OR
+ TBLPROPERTIES ('SORT_COLUMNS'='')
+ ```
+
+   - **Sort Scope Configuration**
+   
+ This property is for users to specify the scope of the sort during 
data load, following are the types of sort scope.
+ 
+ * LOCAL_SORT: It is the default sort scope. 
+ * NO_SORT: It will load the data in unsorted manner, it will 
significantly increase load performance.   
+ * BATCH_SORT: It increases the load performance but decreases the 
query performance if identified blocks > parallelism.
+ * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
+   And if you care about loading resources isolation strictly, because 
the system uses the spark GroupBy to sort data, the resource can be controlled 
by spark. 
+ 
+   - **Table Block Size Configuration**
+
+ This command is for setting block size of this table, the default 
value is 1024 MB and supports a range of 1 MB to 2048 MB.
+
+ ```
+ TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
+ //512 or 512M both are accepted.
--- End diff --

accept, fixed.


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-22 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152589597
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -0,0 +1,713 @@
+
+
+# Data Management on CarbonData
+
+This tutorial is going to introduce all commands and data operations on 
CarbonData.
+
+* [CREATE TABLE](#create-table)
+* [TABLE MANAGEMENT](#table-management)
+* [LOAD DATA](#load-data)
+* [UPDATE AND DELETE](#update-and-delete)
+* [COMPACTION](#compaction)
+* [PARTITION](#partition)
+* [BUCKETING](#bucketing)
+* [SEGMENT MANAGEMENT](#segment-management)
+
+## CREATE TABLE
+
+  This command can be used to create a CarbonData table by specifying the 
list of fields along with the table properties.
+  
+  ```
+  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , 
...)]
+  STORED BY 'carbondata'
+  [TBLPROPERTIES (property_name=property_value, ...)]
+  ```  
+  
+### Usage Guidelines
+
+  Following are the guidelines for TBLPROPERTIES, CarbonData's additional 
table options can be set via carbon.properties.
+  
+   - **Dictionary Encoding Configuration**
+
+ Dictionary encoding is turned off for all columns by default from 1.3 
onwards, you can use this command for including columns to do dictionary 
encoding.
+ Suggested use cases : do dictionary encoding for low cardinality 
columns, it might help to improve data compression ratio and performance.
+
+ ```
+ TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
+ ```
+ 
+   - **Inverted Index Configuration**
+
+ By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
+ Suggested use cases : For high cardinality columns, you can disable 
the inverted index for improving the data loading performance.
+
+ ```
+ TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3')
+ ```
+
+   - **Sort Columns Configuration**
+
+ This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
+ * If users don't specify "SORT_COLUMN" property, by default MDK index 
be built by using all dimension columns except complex datatype column. 
+ * If this property is specified but with empty argument, then the 
table will be loaded without sort..
+ Suggested use cases : Only build MDK index for required columns,it 
might help to improve the data loading performance.
+
+ ```
+ TBLPROPERTIES ('SORT_COLUMNS'='column1, column3')
+ OR
+ TBLPROPERTIES ('SORT_COLUMNS'='')
+ ```
+
+   - **Sort Scope Configuration**
+   
+ This property is for users to specify the scope of the sort during 
data load, following are the types of sort scope.
+ 
+ * LOCAL_SORT: It is the default sort scope. 
+ * NO_SORT: It will load the data in unsorted manner, it will 
significantly increase load performance.   
+ * BATCH_SORT: It increases the load performance but decreases the 
query performance if identified blocks > parallelism.
+ * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
+   And if you care about loading resources isolation strictly, because 
the system uses the spark GroupBy to sort data, the resource can be controlled 
by spark. 
+ 
+   - **Table Block Size Configuration**
+
+ This command is for setting block size of this table, the default 
value is 1024 MB and supports a range of 1 MB to 2048 MB.
+
+ ```
+ TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
+ //512 or 512M both are accepted.
+ ```
+
+### Example:
+```
+CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
+   productNumber Int,
+   productName String,
+   storeCity String,
+   storeProvince String,
+   productCategory String,
+   productBatch String,
+   saleQuantity Int,
+   revenue Int)
+STORED BY 'carbondata'
+TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber',
+   'NO_INVERTED_INDEX'='productBatch',
+   'SORT_COLUMNS'='productName,storeCity',
+   'SORT_SCOPE'='NO_SORT',
+   'TABLE_BLOCKSIZE'='512')
+```
+
+## TABLE MANAGEMENT  
+
+### SHOW TABLE
+
+  This command can be 

[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-22 Thread vandana7
Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152549583
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -0,0 +1,713 @@
+
+
+# Data Management on CarbonData
+
+This tutorial is going to introduce all commands and data operations on 
CarbonData.
+
+* [CREATE TABLE](#create-table)
+* [TABLE MANAGEMENT](#table-management)
+* [LOAD DATA](#load-data)
+* [UPDATE AND DELETE](#update-and-delete)
+* [COMPACTION](#compaction)
+* [PARTITION](#partition)
+* [BUCKETING](#bucketing)
+* [SEGMENT MANAGEMENT](#segment-management)
+
+## CREATE TABLE
+
+  This command can be used to create a CarbonData table by specifying the 
list of fields along with the table properties.
+  
+  ```
+  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , 
...)]
+  STORED BY 'carbondata'
+  [TBLPROPERTIES (property_name=property_value, ...)]
+  ```  
+  
+### Usage Guidelines
+
+  Following are the guidelines for TBLPROPERTIES, CarbonData's additional 
table options can be set via carbon.properties.
+  
+   - **Dictionary Encoding Configuration**
+
+ Dictionary encoding is turned off for all columns by default from 1.3 
onwards, you can use this command for including columns to do dictionary 
encoding.
+ Suggested use cases : do dictionary encoding for low cardinality 
columns, it might help to improve data compression ratio and performance.
+
+ ```
+ TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
+ ```
+ 
+   - **Inverted Index Configuration**
+
+ By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
+ Suggested use cases : For high cardinality columns, you can disable 
the inverted index for improving the data loading performance.
+
+ ```
+ TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3')
+ ```
+
+   - **Sort Columns Configuration**
+
+ This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
+ * If users don't specify "SORT_COLUMN" property, by default MDK index 
be built by using all dimension columns except complex datatype column. 
+ * If this property is specified but with empty argument, then the 
table will be loaded without sort..
+ Suggested use cases : Only build MDK index for required columns,it 
might help to improve the data loading performance.
+
+ ```
+ TBLPROPERTIES ('SORT_COLUMNS'='column1, column3')
+ OR
+ TBLPROPERTIES ('SORT_COLUMNS'='')
+ ```
+
+   - **Sort Scope Configuration**
+   
+ This property is for users to specify the scope of the sort during 
data load, following are the types of sort scope.
+ 
+ * LOCAL_SORT: It is the default sort scope. 
+ * NO_SORT: It will load the data in unsorted manner, it will 
significantly increase load performance.   
+ * BATCH_SORT: It increases the load performance but decreases the 
query performance if identified blocks > parallelism.
+ * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
+   And if you care about loading resources isolation strictly, because 
the system uses the spark GroupBy to sort data, the resource can be controlled 
by spark. 
+ 
+   - **Table Block Size Configuration**
+
+ This command is for setting block size of this table, the default 
value is 1024 MB and supports a range of 1 MB to 2048 MB.
+
+ ```
+ TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
+ //512 or 512M both are accepted.
+ ```
+
+### Example:
+```
+CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
+   productNumber Int,
+   productName String,
+   storeCity String,
+   storeProvince String,
+   productCategory String,
+   productBatch String,
+   saleQuantity Int,
+   revenue Int)
+STORED BY 'carbondata'
+TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber',
+   'NO_INVERTED_INDEX'='productBatch',
+   'SORT_COLUMNS'='productName,storeCity',
+   'SORT_SCOPE'='NO_SORT',
+   'TABLE_BLOCKSIZE'='512')
+```
+
+## TABLE MANAGEMENT  
+
+### SHOW TABLE
+
+  This command can be used to 

[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-22 Thread vandana7
Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152548815
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -0,0 +1,713 @@
+
+
+# Data Management on CarbonData
+
+This tutorial is going to introduce all commands and data operations on 
CarbonData.
+
+* [CREATE TABLE](#create-table)
+* [TABLE MANAGEMENT](#table-management)
+* [LOAD DATA](#load-data)
+* [UPDATE AND DELETE](#update-and-delete)
+* [COMPACTION](#compaction)
+* [PARTITION](#partition)
+* [BUCKETING](#bucketing)
+* [SEGMENT MANAGEMENT](#segment-management)
+
+## CREATE TABLE
+
+  This command can be used to create a CarbonData table by specifying the 
list of fields along with the table properties.
+  
+  ```
+  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , 
...)]
+  STORED BY 'carbondata'
+  [TBLPROPERTIES (property_name=property_value, ...)]
+  ```  
+  
+### Usage Guidelines
+
+  Following are the guidelines for TBLPROPERTIES, CarbonData's additional 
table options can be set via carbon.properties.
+  
+   - **Dictionary Encoding Configuration**
+
+ Dictionary encoding is turned off for all columns by default from 1.3 
onwards, you can use this command for including columns to do dictionary 
encoding.
+ Suggested use cases : do dictionary encoding for low cardinality 
columns, it might help to improve data compression ratio and performance.
+
+ ```
+ TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
+ ```
+ 
+   - **Inverted Index Configuration**
+
+ By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
+ Suggested use cases : For high cardinality columns, you can disable 
the inverted index for improving the data loading performance.
+
+ ```
+ TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3')
+ ```
+
+   - **Sort Columns Configuration**
+
+ This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
+ * If users don't specify "SORT_COLUMN" property, by default MDK index 
be built by using all dimension columns except complex datatype column. 
+ * If this property is specified but with empty argument, then the 
table will be loaded without sort..
+ Suggested use cases : Only build MDK index for required columns,it 
might help to improve the data loading performance.
+
+ ```
+ TBLPROPERTIES ('SORT_COLUMNS'='column1, column3')
+ OR
+ TBLPROPERTIES ('SORT_COLUMNS'='')
+ ```
+
+   - **Sort Scope Configuration**
+   
+ This property is for users to specify the scope of the sort during 
data load, following are the types of sort scope.
+ 
+ * LOCAL_SORT: It is the default sort scope. 
+ * NO_SORT: It will load the data in unsorted manner, it will 
significantly increase load performance.   
+ * BATCH_SORT: It increases the load performance but decreases the 
query performance if identified blocks > parallelism.
+ * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
+   And if you care about loading resources isolation strictly, because 
the system uses the spark GroupBy to sort data, the resource can be controlled 
by spark. 
+ 
+   - **Table Block Size Configuration**
+
+ This command is for setting block size of this table, the default 
value is 1024 MB and supports a range of 1 MB to 2048 MB.
+
+ ```
+ TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
+ //512 or 512M both are accepted.
--- End diff --

add a Note tag before writing  512 or 512M both are accepted. as "//" are 
used in the code for making notes or comments


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480438
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
+  This means that there will be one index for each load and as number of 
data load increases, the number of indices also increases. 
+  Compaction feature combines several segments into one large segment by 
merge sorting the data from across the segments.
+  
+  There are two types of compaction Minor and Major compaction.
+  
+  ```
   ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR'
-```
+  ```
 
   - **Minor Compaction**
+  
+  In minor compaction the user can specify how many loads to be merged. 
+  Minor compaction triggers for every data load if the parameter 
carbon.enable.auto.load.merge is set to true. 
+  If any segments are available to be merged, then compaction will run 
parallel with data load, there are 2 levels in minor compaction:
+  * Level 1: Merging of the segments which are not yet compacted.
+  * Level 2: Merging of the compacted segments again to form a bigger 
segment.
+  
   ```
   ALTER TABLE table_name COMPACT 'MINOR'
   ```
   
   - **Major Compaction**
+  
+  In Major compaction, many segments can be merged into one big segment. 
--- End diff --

In Major compaction, multiple segments can be merged into one large segment.


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480127
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
+  This means that there will be one index for each load and as number of 
data load increases, the number of indices also increases. 
+  Compaction feature combines several segments into one large segment by 
merge sorting the data from across the segments.
+  
+  There are two types of compaction Minor and Major compaction.
--- End diff --

There are two types of copaction, Minor and Major compaction.


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480541
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
+  This means that there will be one index for each load and as number of 
data load increases, the number of indices also increases. 
+  Compaction feature combines several segments into one large segment by 
merge sorting the data from across the segments.
+  
+  There are two types of compaction Minor and Major compaction.
+  
+  ```
   ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR'
-```
+  ```
 
   - **Minor Compaction**
+  
+  In minor compaction the user can specify how many loads to be merged. 
+  Minor compaction triggers for every data load if the parameter 
carbon.enable.auto.load.merge is set to true. 
+  If any segments are available to be merged, then compaction will run 
parallel with data load, there are 2 levels in minor compaction:
+  * Level 1: Merging of the segments which are not yet compacted.
+  * Level 2: Merging of the compacted segments again to form a bigger 
segment.
+  
   ```
   ALTER TABLE table_name COMPACT 'MINOR'
   ```
   
   - **Major Compaction**
+  
+  In Major compaction, many segments can be merged into one big segment. 
+  User will specify the compaction size until which segments can be 
merged, Major compaction is usually done during the off-peak time.
+  This command merges the specified number of segments into one segment: 
+ 
   ```
   ALTER TABLE table_name COMPACT 'MAJOR'
   ```
 
 ## PARTITION
 
+  Similar other system's partition features, CarbonData's partition 
feature can be used to improve query performance by filtering on the partition 
column. 
--- End diff --

Similar to other system's partition features, CarbonData's partition 
feature also can be used to improve query performance by filtering on the 
partition column. 


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480386
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
+  This means that there will be one index for each load and as number of 
data load increases, the number of indices also increases. 
+  Compaction feature combines several segments into one large segment by 
merge sorting the data from across the segments.
+  
+  There are two types of compaction Minor and Major compaction.
+  
+  ```
   ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR'
-```
+  ```
 
   - **Minor Compaction**
+  
+  In minor compaction the user can specify how many loads to be merged. 
+  Minor compaction triggers for every data load if the parameter 
carbon.enable.auto.load.merge is set to true. 
+  If any segments are available to be merged, then compaction will run 
parallel with data load, there are 2 levels in minor compaction:
+  * Level 1: Merging of the segments which are not yet compacted.
+  * Level 2: Merging of the compacted segments again to form a bigger 
segment.
--- End diff --

Level 2: Merging of the compacted segments again to form a larger segment.


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480183
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
+  This means that there will be one index for each load and as number of 
data load increases, the number of indices also increases. 
+  Compaction feature combines several segments into one large segment by 
merge sorting the data from across the segments.
+  
+  There are two types of compaction Minor and Major compaction.
+  
+  ```
   ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR'
-```
+  ```
 
   - **Minor Compaction**
+  
+  In minor compaction the user can specify how many loads to be merged. 
--- End diff --

In Minor compaction, user can specify the number of loads to be merged.


---


[GitHub] carbondata pull request #1534: [CARBONDATA-1770] Update error docs and conso...

2017-11-21 Thread sgururajshetty
Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1534#discussion_r152480015
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -461,25 +461,46 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
 ## COMPACTION
 
-This command merges the specified number of segments into one segment, 
compaction help to improve query performance.
-```
+  Compaction help to improve query performance, because frequently load 
data, will generate several CarbonData files, because data is sorted only 
within each load(per load per segment and one B+ tree index).
--- End diff --

Compaction improves the query performance significantly. During the load 
data, several CarbonData files are generated, this is because data is sorted 
only within each load (per load segment and one B+ tree index). 


---