Repository: incubator-carbondata-site Updated Branches: refs/heads/asf-site b96a419a5 -> 2f826c1b5
http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/2f826c1b/src/main/webapp/quick-start-guide.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/quick-start-guide.html b/src/main/webapp/quick-start-guide.html index 0c58684..0246f64 100644 --- a/src/main/webapp/quick-start-guide.html +++ b/src/main/webapp/quick-start-guide.html @@ -156,21 +156,17 @@ <div class="row"> <div class="col-sm-12 col-md-12"> <div> - <h1> <a id="quick-start" class="anchor" href="#quick-start" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Quick Start</h1> - <p>This tutorial provides a quick introduction to using CarbonData.</p> - <h2> <a id="prerequisites" class="anchor" href="#prerequisites" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Prerequisites</h2> - <ul> <li> -<a href="https://github.com/apache/incubator-carbondata/blob/master/build" target=_blank>Installation and building CarbonData</a>.</li> +<p><a href="https://github.com/apache/incubator-carbondata/blob/master/build" target=_blank>Installation and building CarbonData</a>.</p> +</li> <li> <p>Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData.</p> - <pre><code>cd carbondata cat > sample.csv << EOF id,name,city,age @@ -181,124 +177,82 @@ EOF </code></pre> </li> </ul> - <h2> <a id="interactive-analysis-with-spark-shell-version-21" class="anchor" href="#interactive-analysis-with-spark-shell-version-21" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Interactive Analysis with Spark Shell Version 2.1</h2> - <p>Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/" target=_blank>Apache Spark Documentation</a> for more details on Spark shell.</p> - <h4> <a id="basics" class="anchor" href="#basics" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Basics</h4> - <p>Start Spark shell by running the following command in the Spark directory:</p> - <pre><code>./bin/spark-shell --jars <carbondata assembly jar path> </code></pre> - <p><strong>NOTE</strong>: Assembly jar will be available after <a href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target=_blank>building CarbonData</a> and can be copied from <code>./assembly/target/scala-2.1x/carbondata_xxx.jar</code></p> - <p>In this shell, SparkSession is readily available as <code>spark</code> and Spark context is readily available as <code>sc</code>.</p> - <p>In order to create a CarbonSession we will have to configure it explicitly in the following manner :</p> - <ul> <li>Import the following :</li> </ul> - <pre><code>import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ </code></pre> - <ul> <li>Create a CarbonSession :</li> </ul> - <pre><code>val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>") </code></pre> - <p><strong>NOTE</strong>: By default metastore location is pointed to <code>../carbon.metastore</code>, user can provide own metastore location to CarbonSession like <code>SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>", "<local metastore path>")</code></p> - <h4> <a id="executing-queries" class="anchor" href="#executing-queries" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Executing Queries</h4> - <h6> <a id="creating-a-table" class="anchor" href="#creating-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Creating a Table</h6> - <pre><code>scala>carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name string, city string, age Int) STORED BY 'carbondata'") </code></pre> - <h6> <a id="loading-data-to-a-table" class="anchor" href="#loading-data-to-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading Data to a Table</h6> - <pre><code>scala>carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table") </code></pre> - <p><strong>NOTE</strong>: Please provide the real file path of <code>sample.csv</code> for the above script.</p> - <h6> <a id="query-data-from-a-table" class="anchor" href="#query-data-from-a-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Query Data from a Table</h6> - <pre><code>scala>carbon.sql("SELECT * FROM test_table").show() scala>carbon.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show() </code></pre> - <h2> <a id="interactive-analysis-with-spark-shell-version-16" class="anchor" href="#interactive-analysis-with-spark-shell-version-16" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Interactive Analysis with Spark Shell Version 1.6</h2> - <h4> <a id="basics-1" class="anchor" href="#basics-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Basics</h4> - <p>Start Spark shell by running the following command in the Spark directory:</p> - <pre><code>./bin/spark-shell --jars <carbondata assembly jar path> </code></pre> - <p><strong>NOTE</strong>: Assembly jar will be available after <a href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target=_blank>building CarbonData</a> and can be copied from <code>./assembly/target/scala-2.1x/carbondata_xxx.jar</code></p> - <p><strong>NOTE</strong>: In this shell, SparkContext is readily available as <code>sc</code>.</p> - <ul> <li>In order to execute the Queries we need to import CarbonContext:</li> </ul> - <pre><code>import org.apache.spark.sql.CarbonContext </code></pre> - <ul> <li>Create an instance of CarbonContext in the following manner :</li> </ul> - <pre><code>val cc = new CarbonContext(sc, "<hdfs store path>") </code></pre> - <p><strong>NOTE</strong>: If running on local machine without hdfs, configure the local machine's store path instead of hdfs store path</p> - <h4> <a id="executing-queries-1" class="anchor" href="#executing-queries-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Executing Queries</h4> - <h6> <a id="creating-a-table-1" class="anchor" href="#creating-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Creating a Table</h6> - <pre><code>scala>cc.sql("CREATE TABLE IF NOT EXISTS test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") </code></pre> - <p>To see the table created :</p> - <pre><code>scala>cc.sql("SHOW TABLES").show() </code></pre> - <h6> <a id="loading-data-to-a-table-1" class="anchor" href="#loading-data-to-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Loading Data to a Table</h6> - <pre><code>scala>cc.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table") </code></pre> - <p><strong>NOTE</strong>: Please provide the real file path of <code>sample.csv</code> for the above script.</p> - <h6> <a id="query-data-from-a-table-1" class="anchor" href="#query-data-from-a-table-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Query Data from a Table</h6> - <pre><code>scala>cc.sql("SELECT * FROM test_table").show() scala>cc.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show() </code></pre> http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/2f826c1b/src/main/webapp/supported-data-types-in-carbondata.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/supported-data-types-in-carbondata.html b/src/main/webapp/supported-data-types-in-carbondata.html index 13e640f..b56bc59 100644 --- a/src/main/webapp/supported-data-types-in-carbondata.html +++ b/src/main/webapp/supported-data-types-in-carbondata.html @@ -156,17 +156,13 @@ <div class="row"> <div class="col-sm-12 col-md-12"> <div> - <h1> <a id="data-types" class="anchor" href="#data-types" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Data Types</h1> - <h4> <a id="carbondata-supports-the-following-data-types" class="anchor" href="#carbondata-supports-the-following-data-types" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>CarbonData supports the following data types:</h4> - <ul> <li> <p>Numeric Types</p> - <ul> <li>SMALLINT</li> <li>INT/INTEGER</li> @@ -177,7 +173,6 @@ </li> <li> <p>Date/Time Types</p> - <ul> <li>TIMESTAMP</li> <li>DATE</li> @@ -185,7 +180,6 @@ </li> <li> <p>String Types</p> - <ul> <li>STRING</li> <li>CHAR</li> @@ -193,7 +187,6 @@ </li> <li> <p>Complex Types</p> - <ul> <li>arrays: ARRAY<code><data_type></code> </li> http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/2f826c1b/src/main/webapp/troubleshooting.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/troubleshooting.html b/src/main/webapp/troubleshooting.html index 96ac1f2..48f8871 100644 --- a/src/main/webapp/troubleshooting.html +++ b/src/main/webapp/troubleshooting.html @@ -156,250 +156,179 @@ <div class="row"> <div class="col-sm-12 col-md-12"> <div> - <h1> <a id="troubleshooting" class="anchor" href="#troubleshooting" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Troubleshooting</h1> - <p>This tutorial is designed to provide troubleshooting for end users and developers who are building, deploying, and using CarbonData.</p> - <h2> <a id="failed-to-load-thrift-libraries" class="anchor" href="#failed-to-load-thrift-libraries" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to load thrift libraries</h2> - <p><strong>Symptom</strong></p> - <p>Thrift throws following exception :</p> - -<pre><code> thrift: error while loading shared libraries: - libthriftc.so.0: cannot open shared object file: No such file or directory +<pre><code>thrift: error while loading shared libraries: +libthriftc.so.0: cannot open shared object file: No such file or directory </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The complete path to the directory containing the libraries is not configured correctly.</p> - <p><strong>Procedure</strong></p> - <p>Follow the Apache thrift docs at <a href="https://thrift.apache.org/docs/install" target=_blank>https://thrift.apache.org/docs/install</a> to install thrift correctly.</p> - <h2> <a id="failed-to-launch-the-spark-shell" class="anchor" href="#failed-to-launch-the-spark-shell" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to launch the Spark Shell</h2> - <p><strong>Symptom</strong></p> - <p>The shell prompts the following error :</p> - -<pre><code> org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis - $OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis - $OverrideCatalog$$overrides_$e +<pre><code>org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis +$OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis +$OverrideCatalog$$overrides_$e </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The Spark Version and the selected Spark Profile do not match.</p> - <p><strong>Procedure</strong></p> - <ol> -<li><p>Ensure your spark version and selected profile for spark are correct.</p></li> +<li> +<p>Ensure your spark version and selected profile for spark are correct.</p> +</li> <li> <p>Use the following command :</p> - -<pre><code> "mvn -Pspark-2.1 -Dspark.version {yourSparkVersion} clean package" -</code></pre> - -<p>Note : Refrain from using "mvn clean package" without specifying the profile.</p> </li> </ol> +<pre><code>``` + "mvn -Pspark-2.1 -Dspark.version {yourSparkVersion} clean package" +``` +Note : Refrain from using "mvn clean package" without specifying the profile. +</code></pre> <h2> <a id="failed-to-execute-load-query-on-cluster" class="anchor" href="#failed-to-execute-load-query-on-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute load query on cluster.</h2> - <p><strong>Symptom</strong></p> - <p>Load query failed with the following exception:</p> - -<pre><code> Dictionary file is locked for updation. +<pre><code>Dictionary file is locked for updation. </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The carbon.properties file is not identical in all the nodes of the cluster.</p> - <p><strong>Procedure</strong></p> - <p>Follow the steps to ensure the carbon.properties file is consistent across all the nodes:</p> - <ol> -<li><p>Copy the carbon.properties file from the master node to all the other nodes in the cluster. - For example, you can use ssh to copy this file to all the nodes.</p></li> -<li><p>For the changes to take effect, restart the Spark cluster.</p></li> +<li> +<p>Copy the carbon.properties file from the master node to all the other nodes in the cluster. +For example, you can use ssh to copy this file to all the nodes.</p> +</li> +<li> +<p>For the changes to take effect, restart the Spark cluster.</p> +</li> </ol> - <h2> <a id="failed-to-execute-insert-query-on-cluster" class="anchor" href="#failed-to-execute-insert-query-on-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute insert query on cluster.</h2> - <p><strong>Symptom</strong></p> - <p>Load query failed with the following exception:</p> - -<pre><code> Dictionary file is locked for updation. +<pre><code>Dictionary file is locked for updation. </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The carbon.properties file is not identical in all the nodes of the cluster.</p> - <p><strong>Procedure</strong></p> - <p>Follow the steps to ensure the carbon.properties file is consistent across all the nodes:</p> - <ol> -<li><p>Copy the carbon.properties file from the master node to all the other nodes in the cluster. - For example, you can use scp to copy this file to all the nodes.</p></li> -<li><p>For the changes to take effect, restart the Spark cluster.</p></li> +<li> +<p>Copy the carbon.properties file from the master node to all the other nodes in the cluster. +For example, you can use scp to copy this file to all the nodes.</p> +</li> +<li> +<p>For the changes to take effect, restart the Spark cluster.</p> +</li> </ol> - <h2> <a id="failed-to-connect-to-hiveuser-with-thrift" class="anchor" href="#failed-to-connect-to-hiveuser-with-thrift" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to connect to hiveuser with thrift</h2> - <p><strong>Symptom</strong></p> - <p>We get the following exception :</p> - -<pre><code> Cannot connect to hiveuser. +<pre><code>Cannot connect to hiveuser. </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The external process does not have permission to access.</p> - <p><strong>Procedure</strong></p> - <p>Ensure that the Hiveuser in mysql must allow its access to the external processes.</p> - <h2> <a id="failure-to-read-the-metastore-db-during-table-creation" class="anchor" href="#failure-to-read-the-metastore-db-during-table-creation" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failure to read the metastore db during table creation.</h2> - <p><strong>Symptom</strong></p> - <p>We get the following exception on trying to connect :</p> - -<pre><code> Cannot read the metastore db +<pre><code>Cannot read the metastore db </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The metastore db is dysfunctional.</p> - <p><strong>Procedure</strong></p> - <p>Remove the metastore db from the carbon.metastore in the Spark Directory.</p> - <h2> <a id="failed-to-load-data-on-the-cluster" class="anchor" href="#failed-to-load-data-on-the-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to load data on the cluster</h2> - <p><strong>Symptom</strong></p> - <p>Data loading fails with the following exception :</p> - -<pre><code> Data Load failure exeception +<pre><code>Data Load failure exeception </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The following issue can cause the failure :</p> - <ol> -<li><p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p></li> +<li> +<p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p> +</li> <li> <p>Path to hdfs ddl is not configured correctly in the carbon.properties.</p> - +</li> +</ol> <p><strong>Procedure</strong></p> - <p>Follow the steps to ensure the following configuration files are consistent across all the nodes:</p> - <ol> <li> <p>Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster. For example, you can use scp to copy this file to all the nodes.</p> - <p>Note : Set the path to hdfs ddl in carbon.properties in the master node.</p> </li> -<li><p>For the changes to take effect, restart the Spark cluster.</p></li> -</ol> +<li> +<p>For the changes to take effect, restart the Spark cluster.</p> </li> </ol> - <h2> <a id="failed-to-insert-data-on-the-cluster" class="anchor" href="#failed-to-insert-data-on-the-cluster" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to insert data on the cluster</h2> - <p><strong>Symptom</strong></p> - <p>Insertion fails with the following exception :</p> - -<pre><code> Data Load failure exeception +<pre><code>Data Load failure exeception </code></pre> - <p><strong>Possible Cause</strong></p> - <p>The following issue can cause the failure :</p> - <ol> -<li><p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p></li> +<li> +<p>The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.</p> +</li> <li> <p>Path to hdfs ddl is not configured correctly in the carbon.properties.</p> - +</li> +</ol> <p><strong>Procedure</strong></p> - <p>Follow the steps to ensure the following configuration files are consistent across all the nodes:</p> - <ol> <li> <p>Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster. For example, you can use scp to copy this file to all the nodes.</p> - <p>Note : Set the path to hdfs ddl in carbon.properties in the master node.</p> </li> -<li><p>For the changes to take effect, restart the Spark cluster.</p></li> -</ol> +<li> +<p>For the changes to take effect, restart the Spark cluster.</p> </li> </ol> - <h2> <a id="failed-to-execute-concurrent-operationsloadinsertupdate-on-table-by-multiple-workers" class="anchor" href="#failed-to-execute-concurrent-operationsloadinsertupdate-on-table-by-multiple-workers" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to execute Concurrent Operations(Load,Insert,Update) on table by multiple workers.</h2> - <p><strong>Symptom</strong></p> - <p>Execution fails with the following exception :</p> - -<pre><code> Table is locked for updation. +<pre><code>Table is locked for updation. </code></pre> - <p><strong>Possible Cause</strong></p> - <p>Concurrency not supported.</p> - <p><strong>Procedure</strong></p> - <p>Worker must wait for the query execution to complete and the table to release the lock for another query execution to succeed.</p> - <h2> <a id="failed-to-create-a-table-with-a-single-numeric-column" class="anchor" href="#failed-to-create-a-table-with-a-single-numeric-column" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Failed to create a table with a single numeric column.</h2> - <p><strong>Symptom</strong></p> - <p>Execution fails with the following exception :</p> - -<pre><code> Table creation fails. +<pre><code>Table creation fails. </code></pre> - <p><strong>Possible Cause</strong></p> - <p>Behaviour not supported.</p> - <p><strong>Procedure</strong></p> - <p>A single column that can be considered as dimension is mandatory for table creation.</p> </div> </div> http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/2f826c1b/src/main/webapp/useful-tips-on-carbondata.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/useful-tips-on-carbondata.html b/src/main/webapp/useful-tips-on-carbondata.html index 977fe40..39e6b3c 100644 --- a/src/main/webapp/useful-tips-on-carbondata.html +++ b/src/main/webapp/useful-tips-on-carbondata.html @@ -156,29 +156,21 @@ <div class="row"> <div class="col-sm-12 col-md-12"> <div> - <h1> <a id="useful-tips" class="anchor" href="#useful-tips" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Useful Tips</h1> - <p>This tutorial guides you to create CarbonData Tables and optimize performance. The following sections will elaborate on the above topics :</p> - <ul> <li><a href="#suggestions-to-create-carbondata-table">Suggestions to create CarbonData Table</a></li> <li><a href="#configurations-for-optimizing-carbondata-performance">Configurations For Optimizing CarbonData Performance</a></li> </ul> - <h2> <a id="suggestions-to-create-carbondata-table" class="anchor" href="#suggestions-to-create-carbondata-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Suggestions to Create CarbonData Table</h2> - <p>Recently CarbonData was used to analyze performance of Telecommunication field. The results of the analysis for table creation with dimensions ranging from -10 thousand to 10 billion rows and 100 to 300 columns have been summarized below. </p> - +10 thousand to 10 billion rows and 100 to 300 columns have been summarized below.</p> <p>The following table describes some of the columns from the table used.</p> - <p><strong>Table Column Description</strong></p> - <table> <thead> <tr> @@ -233,18 +225,14 @@ The results of the analysis for table creation with dimensions ranging from </tr> </tbody> </table> - <p>CarbonData has more than 50 test cases, on the basis of these we have following suggestions to enhance the query performance :</p> - <ul> <li> <p><strong>Put the frequently-used column filter in the beginning</strong></p> - -<p>For example, MSISDN filter is used in most of the query then we must put the MSISDN in the first column. +<p>For example, MSISDN filter is used in most of the query then we must put the MSISDN in the first column. The create table command can be modified as suggested below :</p> </li> </ul> - <pre><code> create table carbondata_table( msisdn String, ... @@ -252,23 +240,18 @@ The create table command can be modified as suggested below :</p> TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,..', 'DICTIONARY_INCLUDE'='...'); </code></pre> - <p>Now the query with MSISDN in the filter will be more efficient.</p> - <ul> <li> <p><strong>Put the frequently-used columns in the order of low to high cardinality</strong></p> - <p>If the table in the specified query has multiple columns which are frequently used to filter the results, it is suggested to put -the columns in the order of cardinality low to high. This ordering of frequently used columns improves the compression ratio and +the columns in the order of cardinality low to high. This ordering of frequently used columns improves the compression ratio and enhances the performance of queries with filter on these columns.</p> - -<p>For example if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order of table is suggested as -Dime_1>HOST>MSISDN as Dime_1 has the lowest cardinality. +<p>For example if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order of table is suggested as +Dime_1>HOST>MSISDN as Dime_1 has the lowest cardinality. The create table command can be modified as suggested below :</p> </li> </ul> - <pre><code> create table carbondata_table( Dime_1 String, HOST String, @@ -278,16 +261,13 @@ The create table command can be modified as suggested below :</p> TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST..', 'DICTIONARY_INCLUDE'='Dime_1..'); </code></pre> - <ul> <li> <p><strong>Put the Dimension type columns in order of low to high cardinality</strong></p> - <p>If the columns used to filter are not frequently used, then it is suggested to order all the columns of dimension type in order of low to high cardinality. The create table command can be modified as below :</p> </li> </ul> - <pre><code> create table carbondata_table( Dime_1 String, BEGIN_TIME bigint @@ -298,16 +278,13 @@ The create table command can be modified as below :</p> TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..', 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..'); </code></pre> - <ul> <li> <p><strong>For measure type columns with non high accuracy, replace Numeric(20,0) data type with Double data type</strong></p> - -<p>For columns of measure type, not requiring high accuracy, it is suggested to replace Numeric data type with Double to enhance +<p>For columns of measure type, not requiring high accuracy, it is suggested to replace Numeric data type with Double to enhance query performance. The create table command can be modified as below :</p> </li> </ul> - <pre><code> create table carbondata_table( Dime_1 String, BEGIN_TIME bigint @@ -321,20 +298,15 @@ query performance. The create table command can be modified as below :</p> TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); </code></pre> - <p>The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times.</p> - <ul> <li> <p><strong>Columns of incremental character should be re-arranged at the end of dimensions</strong></p> - <p>Consider the following scenario where data is loaded each day and the start_time is incremental for each load, it is -suggested to put start_time at the end of dimensions. </p> - +suggested to put start_time at the end of dimensions.</p> <p>Incremental values are efficient in using min/max index. The create table command can be modified as below :</p> </li> </ul> - <pre><code> create table carbondata_table( Dime_1 String, HOST String, @@ -348,26 +320,20 @@ suggested to put start_time at the end of dimensions. </p> TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); </code></pre> - <ul> <li> <p><strong>Avoid adding high cardinality columns to dictionary</strong></p> - -<p>If the system has low memory configuration, then it is suggested to exclude high cardinality columns from the dictionary to -enhance load performance. Creation of dictionary for high cardinality columns at time of load will degrade load performance due to -excessive memory usage. </p> - +<p>If the system has low memory configuration, then it is suggested to exclude high cardinality columns from the dictionary to +enhance load performance. Creation of dictionary for high cardinality columns at time of load will degrade load performance due to +excessive memory usage.</p> <p>By default CarbonData determines the cardinality at the first data load and allows for dictionary creation only if the cardinality is less than 1 million.</p> </li> </ul> - <h2> <a id="configurations-for-optimizing-carbondata-performance" class="anchor" href="#configurations-for-optimizing-carbondata-performance" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Configurations for Optimizing CarbonData Performance</h2> - -<p>Recently we did some performance POC on CarbonData for Finance and telecommunication Field. It involved detailed queries and aggregation +<p>Recently we did some performance POC on CarbonData for Finance and telecommunication Field. It involved detailed queries and aggregation scenarios. After the completion of POC, some of the configurations impacting the performance have been identified and tabulated below :</p> - <table> <thead> <tr>