Github user vandana7 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1534#discussion_r152549583 --- Diff: docs/data-management-on-carbondata.md --- @@ -0,0 +1,713 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to you under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +# Data Management on CarbonData + +This tutorial is going to introduce all commands and data operations on CarbonData. + +* [CREATE TABLE](#create-table) +* [TABLE MANAGEMENT](#table-management) +* [LOAD DATA](#load-data) +* [UPDATE AND DELETE](#update-and-delete) +* [COMPACTION](#compaction) +* [PARTITION](#partition) +* [BUCKETING](#bucketing) +* [SEGMENT MANAGEMENT](#segment-management) + +## CREATE TABLE + + This command can be used to create a CarbonData table by specifying the list of fields along with the table properties. + + ``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name[(col_name data_type , ...)] + STORED BY 'carbondata' + [TBLPROPERTIES (property_name=property_value, ...)] + ``` + +### Usage Guidelines + + Following are the guidelines for TBLPROPERTIES, CarbonData's additional table options can be set via carbon.properties. + + - **Dictionary Encoding Configuration** + + Dictionary encoding is turned off for all columns by default from 1.3 onwards, you can use this command for including columns to do dictionary encoding. + Suggested use cases : do dictionary encoding for low cardinality columns, it might help to improve data compression ratio and performance. + + ``` + TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2') + ``` + + - **Inverted Index Configuration** + + By default inverted index is enabled, it might help to improve compression ratio and query speed, especially for low cardinality columns which are in reward position. + Suggested use cases : For high cardinality columns, you can disable the inverted index for improving the data loading performance. + + ``` + TBLPROPERTIES ('NO_INVERTED_INDEX'='column1, column3') + ``` + + - **Sort Columns Configuration** + + This property is for users to specify which columns belong to the MDK(Multi-Dimensions-Key) index. + * If users don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex datatype column. + * If this property is specified but with empty argument, then the table will be loaded without sort.. + Suggested use cases : Only build MDK index for required columns,it might help to improve the data loading performance. + + ``` + TBLPROPERTIES ('SORT_COLUMNS'='column1, column3') + OR + TBLPROPERTIES ('SORT_COLUMNS'='') + ``` + + - **Sort Scope Configuration** + + This property is for users to specify the scope of the sort during data load, following are the types of sort scope. + + * LOCAL_SORT: It is the default sort scope. + * NO_SORT: It will load the data in unsorted manner, it will significantly increase load performance. + * BATCH_SORT: It increases the load performance but decreases the query performance if identified blocks > parallelism. + * GLOBAL_SORT: It increases the query performance, especially high concurrent point query. + And if you care about loading resources isolation strictly, because the system uses the spark GroupBy to sort data, the resource can be controlled by spark. + + - **Table Block Size Configuration** + + This command is for setting block size of this table, the default value is 1024 MB and supports a range of 1 MB to 2048 MB. + + ``` + TBLPROPERTIES ('TABLE_BLOCKSIZE'='512') + //512 or 512M both are accepted. + ``` + +### Example: + ``` + CREATE TABLE IF NOT EXISTS productSchema.productSalesTable ( + productNumber Int, + productName String, + storeCity String, + storeProvince String, + productCategory String, + productBatch String, + saleQuantity Int, + revenue Int) + STORED BY 'carbondata' + TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber', + 'NO_INVERTED_INDEX'='productBatch', + 'SORT_COLUMNS'='productName,storeCity', + 'SORT_SCOPE'='NO_SORT', + 'TABLE_BLOCKSIZE'='512') + ``` + +## TABLE MANAGEMENT + +### SHOW TABLE + + This command can be used to list all the tables in current database or all the tables of a specific database. + ``` + SHOW TABLES [IN db_Name] + ``` + + Example: + ``` + SHOT TABLES --- End diff -- SHOW TABLES
---