[13/42] incubator-carbondata-site git commit: Update website

jbonofre Tue, 13 Dec 2016 07:35:23 -0800

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/installation.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/installation.html 
b/src/main/webapp/documents/installation.html
new file mode 100644
index 0000000..e308dfe
--- /dev/null
+++ b/src/main/webapp/documents/installation.html
@@ -0,0 +1,282 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled 
Document.md</title><style></style></head><body id="preview">
+<p>&lt;!â<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+âLicenseâ); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>â&gt;</p>
+<h1><a id="Installation_Guide_19"></a>Installation Guide</h1>
+<p>This tutorial will guide you through the installation and configuration of 
CarbonData in the following two modes :</p>
+<ul>
+<li><a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide";>On
 Standalone Spark cluster</a></li>
+<li><a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide";>On
 Spark on Yarn cluster</a></li>
+</ul>
+<p>followed by :</p>
+<ul>
+<li><a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide";>Query
 Execution using Carbon Thrift Server</a></li>
+</ul>
+<h2><a 
id="Installing_and_Configuring_CarbonData_on_Standalone_Spark_Cluster_28"></a>Installing
 and Configuring CarbonData on âStandalone Sparkâ Cluster</h2>
+<h3><a id="Prerequisite_30"></a>Prerequisite</h3>
+<ul>
+<li>Hadoop HDFS and Yarn should be installed and running.</li>
+<li>Spark should be installed and running in all the clients.</li>
+<li>CarbonData user should have permission to access HDFS.</li>
+</ul>
+<h3><a id="Procedure_35"></a>Procedure</h3>
+<p>The following steps are only for Driver Nodes.(Driver nodes are the one 
which starts the spark context.)</p>
+<ol>
+<li>
+<p><a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration";>Build
 the CarbonData</a> project and get the assembly jar from 
â./assembly/target/scala-2.10/carbondata_xxx.jarâ and put in the 
â&lt;SPARK_HOME&gt;/carbonlibâ folder.</p>
+<p>(Note: - Create the carbonlib folder if does not exists inside 
â&lt;SPARK_HOME&gt;â path.)</p>
+</li>
+<li>
+<p>carbonlib folder path must be added in Spark classpath. (Edit 
â&lt;SPARK_HOME&gt;/conf/spark-env.shâ file and modify the value of 
SPARK_CLASSPATH by appending â&lt;SPARK_HOME&gt;/carbonlib/*â to the 
existing value)</p>
+</li>
+<li>
+<p>Copy the carbon.properties.template to 
â&lt;SPARK_HOME&gt;/conf/carbon.propertiesâ folder from â./conf/â of 
CarbonData repository.</p>
+</li>
+<li>
+<p>Copy âcarbonpluginsâ folder  to â&lt;SPARK_HOME&gt;/carbonlibâ 
folder from â./processing/â folder of CarbonData repository.</p>
+<p>(Note: -carbonplugins will contain .kettle folder.)</p>
+</li>
+<li>
+<p>In Spark node, configure the properties mentioned as the below table in 
â&lt;SPARK_HOME&gt;/conf/spark-defaults.confâ file</p>
+</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Description</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.kettle.home</td>
+<td>Path that will be used by CarbonData internally to create graph for 
loading the data</td>
+<td>$SPARK_HOME /carbonlib/carbonplugins</td>
+</tr>
+<tr>
+<td>spark.driver.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to the driver. For instance, GC 
settings or other logging.</td>
+<td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>spark.executor.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to executors. For instance, GC 
settings or other logging. NOTE: You can enter multiple values separated by 
space.</td>
+<td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+</tr>
+</tbody>
+</table>
+<ol start="6">
+<li>Add the following properties in â&lt;SPARK_HOME&gt;/conf/â 
carbon.properties:</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Required</th>
+<th>Description</th>
+<th>Example</th>
+<th>Remark</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.storelocation</td>
+<td>NO</td>
+<td>Location where data Carbon will create the store and write the data in its 
own format.</td>
+<td>hdfs://IP:PORT/Opt/CarbonStore</td>
+<td>Propose</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>YES</td>
+<td>Path that will used by Carbon internally to create graph for loading the 
data.</td>
+<td>$SPARK_HOME/carbonlib/carbonplugins</td>
+<td></td>
+</tr>
+</tbody>
+</table>
+<ol start="7">
+<li>Installation verification,for example:</li>
+</ol>
+<pre><code>./spark-shell --master spark://IP:PORT --total-executor-cores 2 
--executor-memory 2G
+</code></pre>
+<p>Note: Make sure that user should have permission of carbon jars and files 
through which driver and executor will start.</p>
+<p>To get started with CarbonData : <a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start";>Quick 
Start</a> ,<a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData";>DDL
 Operations</a></p>
+<h2><a 
id="Installing_and_Configuring_Carbon_on_Spark_on_YARN_Cluster_72"></a>Installing
 and Configuring Carbon on âSpark on YARNâ Cluster</h2>
+<p>This section provides the procedure to install Carbon on âSpark on 
YARNâ cluster.</p>
+<h3><a id="Prerequisite_75"></a>Prerequisite</h3>
+<ul>
+<li>Hadoop HDFS and Yarn should be installed and running.</li>
+<li>Spark should be installed and running in all the clients.</li>
+<li>CarbonData user should have permission to access HDFS.</li>
+</ul>
+<h3><a id="Procedure_80"></a>Procedure</h3>
+<ol>
+<li>
+<p><a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration";>Build
 the CarbonData</a> project and get the assembly jar from 
â./assembly/target/scala-2.10/carbondata_xxx.jarâ and put in the 
â&lt;SPARK_HOME&gt;/carbonlibâ folder.</p>
+<p>(Note: - Create the carbonlib folder if does not exists inside 
â&lt;SPARK_HOME&gt;â path.)</p>
+</li>
+<li>
+<p>Copy the carbon.properties.template to 
â&lt;SPARK_HOME&gt;/conf/carbon.propertiesâ folder from â./conf/â of 
CarbonData repository.<br>
+carbonplugins will contain .kettle folder.</p>
+</li>
+<li>
+<p>Copy the âcarbon.properties.templateâ to 
â&lt;SPARK_HOME&gt;/conf/carbon.propertiesâ folder from conf folder of 
carbondata repository.</p>
+</li>
+<li>
+<p>Modify the parameters in âspark-default.confâ located in the 
â&lt;SPARK_HOME&gt;/confâ</p>
+</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Description</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>spark.master</td>
+<td>Set this value to run the Spark in yarn cluster mode.</td>
+<td>Set âyarn-clientâ to run the Spark in yarn cluster mode.</td>
+</tr>
+<tr>
+<td>spark.yarn.dist.files</td>
+<td>Comma-separated list of files to be placed in the working directory of 
each executor.</td>
+<td>â&lt;YOUR_SPARK_HOME_PATH&gt;â/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>spark.yarn.dist.archives</td>
+<td>Comma-separated list of archives to be extracted into the working 
directory of each executor.</td>
+<td>â&lt;YOUR_SPARK_HOME_PATH&gt;â/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.executor.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to executors. For instance  NOTE: 
You can enter multiple values separated by space.</td>
+<td>-Dcarbon.properties.filepath=carbon.properties</td>
+</tr>
+<tr>
+<td>spark.executor.extraClassPath</td>
+<td>Extra classpath entries to prepend to the classpath of executors. NOTE: If 
SPARK_CLASSPATH is defined in <a href="http://spark-env.sh";>spark-env.sh</a>, 
then comment it and append the values in below parameter 
spark.driver.extraClassPath</td>
+<td>â&lt;YOUR_SPARK_HOME_PATH&gt;â/carbonlib/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.driver.extraClassPath</td>
+<td>Extra classpath entries to prepend to the classpath of the driver. NOTE: 
If SPARK_CLASSPATH is defined in <a 
href="http://spark-env.sh";>spark-env.sh</a>, then comment it and append the 
value in below parameter spark.driver.extraClassPath.</td>
+<td>â&lt;YOUR_SPARK_HOME_PATH&gt;â/carbonlib/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.driver.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to the driver. For instance, GC 
settings or other logging.</td>
+<td>-Dcarbon.properties.filepath=&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>Path that will used by Carbon internally to create graph for loading the 
data.</td>
+<td>â&lt;YOUR_SPARK_HOME_PATH&gt;â/carbonlib/carbonplugins</td>
+</tr>
+</tbody>
+</table>
+<ol start="5">
+<li>Add the following properties in &lt;SPARK_HOME&gt;/conf/ 
carbon.properties:</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Required</th>
+<th>Description</th>
+<th>Example</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.storelocation</td>
+<td>NO</td>
+<td>Location where data Carbon will create the store and write the data in its 
own format.</td>
+<td>hdfs://IP:PORT/Opt/CarbonStore</td>
+<td>Propose</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>YES</td>
+<td>Path that will used by Carbon internally to create graph for loading the 
data.</td>
+<td>$SPARK_HOME/carbonlib/carbonplugins</td>
+<td></td>
+</tr>
+</tbody>
+</table>
+<ol start="6">
+<li>Installation verification</li>
+</ol>
+<pre><code>./bin/spark-shell --master yarn-client --driver-memory 1g 
--executor-cores 2 --executor-memory 2G 
+</code></pre>
+<p>Note: Make sure that user should have permission of carbon jars and files 
through which driver and executor will start.</p>
+<p>To get started with CarbonData : <a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start";>Quick 
Start</a> ,<a 
href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData";>DDL
 Operations</a></p>
+<h2><a id="Query_execution_using_Carbon_thrift_server_118"></a>Query execution 
using Carbon thrift server</h2>
+<h3><a id="Start_Thrift_server_120"></a>Start Thrift server</h3>
+<p>a. cd &lt;SPARK_HOME&gt;</p>
+<p>b. Run below command to start the Carbon thrift server</p>
+<pre><code>./bin/spark-submit --conf 
spark.sql.hive.thriftServer.singleSession=true --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR &lt;carbon_store_path&gt;
+</code></pre>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Description</th>
+<th>Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>CARBON_ASSEMBLY_JAR</td>
+<td>Carbon assembly jar name present in the ââ/carbonlib/ folder.</td>
+<td>carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar</td>
+</tr>
+<tr>
+<td>carbon_store_path</td>
+<td>This is parameter to the CarbonThriftServer class. This a HDFS path where 
carbon files will be kept. Strongly Recommended to put same as 
carbon.storelocation parameter of carbon.proeprties.</td>
+<td>hdfs//hacluster/user/hive/warehouse/carbon.storehdfs//10.10.10.10:54310 
/user/hive/warehouse/carbon.store</td>
+</tr>
+</tbody>
+</table>
+<h3><a id="Examples_133"></a>Examples</h3>
+<ol>
+<li>Start with default memory and executors</li>
+</ol>
+<pre><code>./bin/spark-submit --conf 
spark.sql.hive.thriftServer.singleSession=true --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
$SPARK_HOME/carbonlib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
 hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre>
+<ol start="2">
+<li>Start with Fixed executors and resources</li>
+</ol>
+<pre><code>./bin/spark-submit --conf 
spark.sql.hive.thriftServer.singleSession=true --class 
org.apache.carbondata.spark.thriftserver.CarbonThriftServer --num-executors 3 
--driver-memory 20g --executor-memory 250g --executor-cores 32 
/srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
 hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre>
+<h3><a 
id="Connecting_to_Carbon_Thrift_Server_Using_Beeline_142"></a>Connecting to 
Carbon Thrift Server Using Beeline</h3>
+<pre><code>cd &lt;SPARK_HOME&gt;
+./bin/beeline jdbc:hive2://&lt;thrftserver_host&gt;:port
+ 
+Example 
+./bin/beeline jdbc:hive2://10.10.10.10:10000
+</code></pre>
+
+</body></html>


http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/overview.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/overview.html 
b/src/main/webapp/documents/overview.html
new file mode 100644
index 0000000..c23b883
--- /dev/null
+++ b/src/main/webapp/documents/overview.html
@@ -0,0 +1,184 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled 
Document.md</title><style>
+p img{ max-width: 100% }
+</style></head><body id="preview">
+<!--<p>&lt;!â-<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+âLicenseâ); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>-â&gt;</p>-->
+<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" 
alt="CarbonData_Logo"></p>
+<p>This tutorial provides a detailed overview about :</p>
+<ul>
+<li>CarbonData,</li>
+<li>Working and File Format</li>
+<li>Features</li>
+<li>Supported Data Types</li>
+<li>Compatibility</li>
+<li>Packaging and Interfaces.</li>
+</ul>
+<h2><a id="Introduction_30"></a>Introduction</h2>
+<p>CarbonData is a fully indexed columnar and Hadoop native data-store for 
processing heavy analytical workloads and detailed queries on big data. 
CarbonData allows  faster interactive query using advanced columnar storage, 
index, compression and encoding techniques to improve computing efficiency, in 
turn it will help speedup queries an order of magnitude faster over PetaBytes 
of data.</p>
+<p>In customer benchmarks, CarbonData has proven to manage Petabyte of data 
running on extraordinarily low-cost hardware and answers queries around 10 
times faster than the current open source solutions (column-oriented SQL on 
Hadoop data-stores).</p>
+<p>Some of the Salient features of CarbonData are :</p>
+<ul>
+<li>Low-Latency for various types of data access patterns like 
Sequential,Random and OLAP.</li>
+<li>Allows fast query on fast data.</li>
+<li>Ensures Space Efficiency.</li>
+<li>General format available on Hadoop-ecosystem.</li>
+</ul>
+<h2><a id="CarbonData_File_Structure_42"></a>CarbonData File Structure</h2>
+<p>CarbonData file contains groups of data called blocklet, along with all 
required information like schema, offsets and indices, etc, in a file footer, 
co-located in HDFS.</p>
+<p>The file footer can be read once to build the indices in memory, which can 
be utilized for optimizing the scans and processing for all subsequent 
queries.</p>
+<p>Each blocklet in the file is further divided into chunks of data called 
Data Chunks. Each data chunk is organized either in columnar format or row 
format, and stores the data of either a single column or a set of columns. All 
blocklets in one file contain the same number and type of Data Chunks.</p>
+<p><img 
src="../docs/images/format/carbon_data_file_structure_new.png?raw=true" 
alt="Carbon File Structure"></p>
+<p>Each Data Chunk contains multiple groups of data called as Pages. There are 
three types of pages.</p>
+<ul>
+<li>Data Page: Contains the encoded data of a column/group of columns.</li>
+<li>Row ID Page (optional): Contains the row id mappings used when the Data 
Page is stored as an inverted index.</li>
+<li>RLE Page (optional): Contains additional metadata used when the Data Page 
in RLE coded.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_format_new.png?raw=true" 
alt="Carbon File Format"></p>
+<h2><a id="Features_59"></a>Features</h2>
+<p>CarbonData file format is a columnar store in HDFS, it has many features 
that a modern columnar format has, such as splittable, compression schema 
,complex data type etc, and CarbonData has following unique features:</p>
+<ul>
+<li>Stores data along with index: it can significantly accelerate query 
performance and reduces the I/O scans and CPU resources, where there are 
filters in the query. CarbonData index consists of multiple level of indices, a 
processing framework can leverage this index to reduce the task it needs to 
schedule and process, and it can also do skip scan in more finer grain unit 
(called blocklet) in task side scanning instead of scanning the whole file.</li>
+<li>Operable encoded data :Through supporting efficient compression and global 
encoding schemes, can query on compressed/encoded data, the data can be 
converted just before returning the results to the users, which is âlate 
materializedâ.</li>
+<li>Column group: Allow multiple columns to form a column group that would be 
stored as row format. This reduces the row reconstruction cost at query 
time.</li>
+<li>Supports for various use cases with one single Data format : like 
interactive OLAP-style query, Sequential Access (big scan), Random Access 
(narrow scan).</li>
+</ul>
+<h2><a id="Data_Types_68"></a>Data Types</h2>
+<p>The following types are supported :</p>
+<ul>
+<li>
+<p>Numeric Types</p>
+<ul>
+<li>SMALLINT</li>
+<li>INT/INTEGER</li>
+<li>BIGINT</li>
+<li>DOUBLE</li>
+<li>DECIMAL</li>
+</ul>
+</li>
+<li>
+<p>Date/Time Types</p>
+<ul>
+<li>TIMESTAMP</li>
+</ul>
+</li>
+<li>
+<p>String Types</p>
+<ul>
+<li>STRING</li>
+</ul>
+</li>
+<li>
+<p>Complex Types</p>
+<ul>
+<li>arrays: ARRAY&lt;data_type&gt;</li>
+<li>structs: STRUCT&lt;col_name : data_type [COMMENT col_comment], â¦&gt;</li>
+</ul>
+</li>
+</ul>
+<h2><a id="Compatibility_89"></a>Compatibility</h2>
+<h2><a id="Packaging_and_Interfaces_92"></a>Packaging and Interfaces</h2>
+<h3><a id="Packaging_94"></a>Packaging</h3>
+<p>Carbon provides following JAR packages:</p>
+<p><img 
src="https://cloud.githubusercontent.com/assets/6500698/14255195/831c6e90-fac5-11e5-87ab-3b16d84918fb.png";
 alt="carbon modules2"></p>
+<ul>
+<li>
+<p><strong>carbon-store.jar or carbondata-assembly.jar:</strong> This is the 
main Jar for carbon project, the target user of it are both user and 
developer.<br>
+- For MapReduce application users, this jar provides API to read and write 
carbon files through CarbonInput/OutputFormat in carbon-hadoop module.<br>
+- For developer, this jar can be used to integrate carbon with processing 
engine like spark and hive, by leveraging API in carbon-processing module.</p>
+</li>
+<li>
+<p><strong>carbon-spark.jar(Currently it is part of assembly jar):</strong> 
provides support for spark user, spark user can manipulate carbon data files by 
using native spark DataFrame/SQL interface. Apart from this, in order to 
leverage carbonâs builtin lifecycle management function, higher level concept 
like Managed Carbon Table, Database and corresponding DDL are introduced.</p>
+</li>
+<li>
+<p><strong>carbon-hive.jar(not yet provided):</strong> similar to 
carbon-spark, which provides integration to carbon and hive.</p>
+</li>
+</ul>
+<h3><a id="Interfaces_107"></a>Interfaces</h3>
+<h4><a id="API_109"></a>API</h4>
+<p>Carbon can be used in following scenarios:</p>
+<ol>
+<li>
+<p>For MapReduce application user<br>
+This User API is provided by carbon-hadoop. In this scenario, user can process 
carbon files in his MapReduce application by choosing CarbonInput/OutputFormat, 
and is responsible using it correctly.Currently only CarbonInputFormat is 
provided and OutputFormat will be provided soon.</p>
+</li>
+<li>
+<p>For Spark user<br>
+This User API is provided by the Spark itself. There are also two levels of 
APIs</p>
+<ul>
+<li>
+<p><strong>Carbon File</strong></p>
+<p>Similar to parquet, json, or other data source in Spark, carbon can be used 
with data source API. For example(please refer to DataFrameAPIExample for the 
more detail):</p>
+<pre><code>// User can create a DataFrame from any data source or 
transformation.
+val df = ...
+
+// Write data
+// User can write a DataFrame to a carbon file
+df.write
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.mode(SaveMode.Overwrite)
+.save()
+
+
+// read carbon data by data source API
+df = carbonContext.read
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.load(&quot;/path&quot;)
+
+// User can then use DataFrame for analysis
+df.count
+SVMWithSGD.train(df, numIterations)
+
+// User can also register the DataFrame with a table name, and use SQL for 
analysis
+df.registerTempTable(&quot;t1&quot;)  // register temporary table in SparkSQL 
catalog
+df.registerHiveTable(&quot;t2&quot;)  // Or, use a implicit funtion to 
register to Hive metastore
+sqlContext.sql(&quot;select count(*) from t1&quot;).show
+</code></pre>
+</li>
+<li>
+<p><strong>Managed Carbon Table</strong></p>
+<p>Carbon has in built support for high level concept like Table, Database, 
and supports full data lifecycle management, instead of dealing with just 
files, user can use carbon specific DDL to manipulate data in Table and 
Database level. Please refer <a 
href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DDL";>DDL</a>
 and <a 
href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DML";>DML</a></p>
+<pre><code>// Use SQL to manage table and query data
+create database db1;
+use database db1;
+show databases;
+create table tbl1 using org.apache.carbondata.spark;
+load data into table tlb1 path 'some_files';
+select count(*) from tbl1;
+</code></pre>
+</li>
+</ul>
+</li>
+<li>
+<p>For developer who want to integrate carbon into a processing engines like 
spark,hive or flink, use API provided by carbon-hadoop and 
carbon-processing:</p>
+<ul>
+<li>
+<p><strong>Query</strong> : integrate carbon-hadoop with engine specific API, 
like spark data source API</p>
+</li>
+<li>
+<p><strong>Data life cycle management</strong> : carbon provides utility 
functions in carbon-processing to manage data life cycle, like data loading, 
compact, retention, schema evolution. Developer can implement DDLs of their 
choice and leverage these utility function to do data life cycle management.</p>
+</li>
+</ul>
+</li>
+</ol>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/overviewdashboardpages.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/overviewdashboardpages.html 
b/src/main/webapp/documents/overviewdashboardpages.html
new file mode 100644
index 0000000..c38640b
--- /dev/null
+++ b/src/main/webapp/documents/overviewdashboardpages.html
@@ -0,0 +1,282 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <title>CarbonData</title>
+
+    <!-- Bootstrap -->
+       
+    <link rel="stylesheet" href="css/bootstrap.min.css">
+    <link href="css/style.css" rel="stylesheet">       
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media 
queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script 
src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js";></script>
+      <script 
src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js";></script>
+    <![endif]-->
+  </head>
+  <body>
+    <header>
+     <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper" >
+      <div class="container">
+        <div class="navbar-header">
+          <button aria-controls="navbar" aria-expanded="false" 
data-target="#navbar" data-toggle="collapse" class="navbar-toggle collapsed" 
type="button">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a href="index.html" class="logo">
+             <img src="images/CarbonDataLogo.png" alt="CarbonData logo" 
title="CarbocnData logo"  />      
+          </a>
+        </div>
+        <div class="navbar-collapse collapse cd_navcontnt" id="navbar">        
 
+          <ul class="nav navbar-nav navbar-right navlist-custom">
+              <li><a href="index.html"><i class="fa fa-home" 
aria-hidden="true"></i> </a></li>
+              <li><a href="#">Download  </a></li>
+              <li><a href="#">OverView </a></li>
+              <li><a href="dashboard.html" class="active" 
target="blank">Documents </a></li>
+              <li><a href="#">Community </a></li>
+              <li><a href="#" class="apache_link">apache</a>
+           </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+     </header> <!-- end Header part -->
+   
+   <div class="fixed-padding"></div> <!--  top padding with fixde header  -->
+ 
+   <section><!-- Dashboard nav -->
+    <div class="container-fluid">
+      <div class="row">
+        <div class="col-sm-3 col-md-2 sidebar">
+          <ul class="nav nav-sidebar">
+            <li class="active"><a href="#">Overview <span 
class="sr-only">(current)</span></a></li>
+            <li><a href="#">Contributing to CarbonData</a></li>
+            <li><a href="#">Quick start</a></li>
+            <li><a href="#">User Guide</a></li>
+            <li><a href="#">Using CarbonData</a></li>
+            <li><a href="#">FAQ</a></li>
+          </ul>        
+        </div>
+        <div class="col-sm-9 col-sm-offset-3 col-md-10 col-md-offset-2 
maindashboard">
+           <div class="row placeholders">            
+            <section>
+              <div style="padding:40px;">
+
+                <p>This tutorial provides a detailed overview about :</p>
+<ul>
+<li>CarbonData,</li>
+<li>Working and File Format</li>
+<li>Features</li>
+<li>Supported Data Types</li>
+<li>Compatibility</li>
+<li>Packaging and Interfaces.</li>
+</ul>
+<h2><a id="Introduction_30"></a>Introduction</h2>
+<p>CarbonData is a fully indexed columnar and Hadoop native data-store for 
processing heavy analytical workloads and detailed queries on big data. 
CarbonData allows  faster interactive query using advanced columnar storage, 
index, compression and encoding techniques to improve computing efficiency, in 
turn it will help speedup queries an order of magnitude faster over PetaBytes 
of data.</p>
+<p>In customer benchmarks, CarbonData has proven to manage Petabyte of data 
running on extraordinarily low-cost hardware and answers queries around 10 
times faster than the current open source solutions (column-oriented SQL on 
Hadoop data-stores).</p>
+<p>Some of the Salient features of CarbonData are :</p>
+<ul>
+<li>Low-Latency for various types of data access patterns like 
Sequential,Random and OLAP.</li>
+<li>Allows fast query on fast data.</li>
+<li>Ensures Space Efficiency.</li>
+<li>General format available on Hadoop-ecosystem.</li>
+</ul>
+<h2><a id="CarbonData_File_Structure_42"></a>CarbonData File Structure</h2>
+<p>CarbonData file contains groups of data called blocklet, along with all 
required information like schema, offsets and indices, etc, in a file footer, 
co-located in HDFS.</p>
+<p>The file footer can be read once to build the indices in memory, which can 
be utilized for optimizing the scans and processing for all subsequent 
queries.</p>
+<p>Each blocklet in the file is further divided into chunks of data called 
Data Chunks. Each data chunk is organized either in columnar format or row 
format, and stores the data of either a single column or a set of columns. All 
blocklets in one file contain the same number and type of Data Chunks.</p>
+<p><img 
src="../docs/images/format/carbon_data_file_structure_new.png?raw=true" 
alt="Carbon File Structure"></p>
+<p>Each Data Chunk contains multiple groups of data called as Pages. There are 
three types of pages.</p>
+<ul>
+<li>Data Page: Contains the encoded data of a column/group of columns.</li>
+<li>Row ID Page (optional): Contains the row id mappings used when the Data 
Page is stored as an inverted index.</li>
+<li>RLE Page (optional): Contains additional metadata used when the Data Page 
in RLE coded.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_format_new.png?raw=true" 
alt="Carbon File Format"></p>
+<h2><a id="Features_59"></a>Features</h2>
+<p>CarbonData file format is a columnar store in HDFS, it has many features 
that a modern columnar format has, such as splittable, compression schema 
,complex data type etc, and CarbonData has following unique features:</p>
+<ul>
+<li>Stores data along with index: it can significantly accelerate query 
performance and reduces the I/O scans and CPU resources, where there are 
filters in the query. CarbonData index consists of multiple level of indices, a 
processing framework can leverage this index to reduce the task it needs to 
schedule and process, and it can also do skip scan in more finer grain unit 
(called blocklet) in task side scanning instead of scanning the whole file.</li>
+<li>Operable encoded data :Through supporting efficient compression and global 
encoding schemes, can query on compressed/encoded data, the data can be 
converted just before returning the results to the users, which is âlate 
materializedâ.</li>
+<li>Column group: Allow multiple columns to form a column group that would be 
stored as row format. This reduces the row reconstruction cost at query 
time.</li>
+<li>Supports for various use cases with one single Data format : like 
interactive OLAP-style query, Sequential Access (big scan), Random Access 
(narrow scan).</li>
+</ul>
+<h2><a id="Data_Types_68"></a>Data Types</h2>
+<p>The following types are supported :</p>
+<ul>
+<li>
+<p>Numeric Types</p>
+<ul>
+<li>SMALLINT</li>
+<li>INT/INTEGER</li>
+<li>BIGINT</li>
+<li>DOUBLE</li>
+<li>DECIMAL</li>
+</ul>
+</li>
+<li>
+<p>Date/Time Types</p>
+<ul>
+<li>TIMESTAMP</li>
+</ul>
+</li>
+<li>
+<p>String Types</p>
+<ul>
+<li>STRING</li>
+</ul>
+</li>
+<li>
+<p>Complex Types</p>
+<ul>
+<li>arrays: ARRAY&lt;data_type&gt;</li>
+<li>structs: STRUCT&lt;col_name : data_type [COMMENT col_comment], â¦&gt;</li>
+</ul>
+</li>
+</ul>
+<h2><a id="Compatibility_89"></a>Compatibility</h2>
+<h2><a id="Packaging_and_Interfaces_92"></a>Packaging and Interfaces</h2>
+<h3><a id="Packaging_94"></a>Packaging</h3>
+<p>Carbon provides following JAR packages:</p>
+<p><img 
src="https://cloud.githubusercontent.com/assets/6500698/14255195/831c6e90-fac5-11e5-87ab-3b16d84918fb.png";
 alt="carbon modules2"></p>
+<ul>
+<li>
+<p><strong>carbon-store.jar or carbondata-assembly.jar:</strong> This is the 
main Jar for carbon project, the target user of it are both user and 
developer.<br>
+- For MapReduce application users, this jar provides API to read and write 
carbon files through CarbonInput/OutputFormat in carbon-hadoop module.<br>
+- For developer, this jar can be used to integrate carbon with processing 
engine like spark and hive, by leveraging API in carbon-processing module.</p>
+</li>
+<li>
+<p><strong>carbon-spark.jar(Currently it is part of assembly jar):</strong> 
provides support for spark user, spark user can manipulate carbon data files by 
using native spark DataFrame/SQL interface. Apart from this, in order to 
leverage carbonâs builtin lifecycle management function, higher level concept 
like Managed Carbon Table, Database and corresponding DDL are introduced.</p>
+</li>
+<li>
+<p><strong>carbon-hive.jar(not yet provided):</strong> similar to 
carbon-spark, which provides integration to carbon and hive.</p>
+</li>
+</ul>
+<h3><a id="Interfaces_107"></a>Interfaces</h3>
+<h4><a id="API_109"></a>API</h4>
+<p>Carbon can be used in following scenarios:</p>
+<ol>
+<li>
+<p>For MapReduce application user<br>
+This User API is provided by carbon-hadoop. In this scenario, user can process 
carbon files in his MapReduce application by choosing CarbonInput/OutputFormat, 
and is responsible using it correctly.Currently only CarbonInputFormat is 
provided and OutputFormat will be provided soon.</p>
+</li>
+<li>
+<p>For Spark user<br>
+This User API is provided by the Spark itself. There are also two levels of 
APIs</p>
+<ul>
+<li>
+<p><strong>Carbon File</strong></p>
+<p>Similar to parquet, json, or other data source in Spark, carbon can be used 
with data source API. For example(please refer to DataFrameAPIExample for the 
more detail):</p>
+<pre><code>// User can create a DataFrame from any data source or 
transformation.
+val df = ...
+
+// Write data
+// User can write a DataFrame to a carbon file
+df.write
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.mode(SaveMode.Overwrite)
+.save()
+
+
+// read carbon data by data source API
+df = carbonContext.read
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.load(&quot;/path&quot;)
+
+// User can then use DataFrame for analysis
+df.count
+SVMWithSGD.train(df, numIterations)
+
+// User can also register the DataFrame with a table name, and use SQL for 
analysis
+df.registerTempTable(&quot;t1&quot;)  // register temporary table in SparkSQL 
catalog
+df.registerHiveTable(&quot;t2&quot;)  // Or, use a implicit funtion to 
register to Hive metastore
+sqlContext.sql(&quot;select count(*) from t1&quot;).show
+</code></pre>
+</li>
+<li>
+<p><strong>Managed Carbon Table</strong></p>
+<p>Carbon has in built support for high level concept like Table, Database, 
and supports full data lifecycle management, instead of dealing with just 
files, user can use carbon specific DDL to manipulate data in Table and 
Database level. Please refer <a 
href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DDL";>DDL</a>
 and <a 
href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DML";>DML</a></p>
+<pre><code>// Use SQL to manage table and query data
+create database db1;
+use database db1;
+show databases;
+create table tbl1 using org.apache.carbondata.spark;
+load data into table tlb1 path 'some_files';
+select count(*) from tbl1;
+</code></pre>
+</li>
+</ul>
+</li>
+<li>
+<p>For developer who want to integrate carbon into a processing engines like 
spark,hive or flink, use API provided by carbon-hadoop and 
carbon-processing:</p>
+<ul>
+<li>
+<p><strong>Query</strong> : integrate carbon-hadoop with engine specific API, 
like spark data source API</p>
+</li>
+<li>
+<p><strong>Data life cycle management</strong> : carbon provides utility 
functions in carbon-processing to manage data life cycle, like data loading, 
compact, retention, schema evolution. Developer can implement DDLs of their 
choice and leverage these utility function to do data life cycle management.</p>
+</li>
+</ul>
+</li>
+</ol>
+
+              </div>
+            </section>
+            <footer>
+    <div class="topcontant">
+      <div class="container-fluid">
+          <div class="col-md-4 col-sm-4">
+            <p class="footext">
+              Apache CarbonData, CarbonData, Apache, the Apache feather logo, 
and the Apache CarbonData project logo are trademarks of The Apache Software 
Foundation
+            </p>
+ 
+          </div>
+          <div class="col-md-8 col-sm-8">
+             <ul class="footer-nav">
+              <li><a href="">Site Map</a></li>
+              <li><a href="">Service</a></li>
+              <li><a href="">Contact us</a></li>
+             </ul>
+          </div>
+       </div>
+    </div>
+    <div class="bottomcontant">
+       <div class="container-fluid">
+          <div class="col-md-8 col-sm-8">
+            <p class="copyright-txt">Copyright Â© 2016. All rights reserved  
&nbsp;&nbsp;|&nbsp;&nbsp;
+              <a href="#" class="term-links">Apache Software Foundation  
</a>&nbsp;&nbsp;| &nbsp;&nbsp; <a href="#" class="term-links"> Privacy Policy 
</a>
+            </p>
+
+          </div>
+          <div class="col-md-4 col-sm-4">
+                 <div class="social-icon">
+                  <a href="#" class="icons"><i class="fa fa-facebook" 
aria-hidden="true"></i></a>
+                  <a href="#" class="icons"><i class="fa fa-twitter" 
aria-hidden="true"></i></a>
+                  <a href="#" class="icons"><i class="fa fa-linkedin" 
aria-hidden="true"></i></a>
+                 </div>
+          </div>
+    </div>
+     </div>
+
+  </footer>
+          </div>           
+
+        </div>
+      </div>
+    </div>
+   </section><!-- End systemblock part -->
+
+  <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+
+    <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js";></script>
+    <!-- Include all compiled plugins (below), or include individual files as 
needed -->
+    <script src="js/bootstrap.min.js"></script>
+  </body>
+  </html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/quickstart.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/quickstart.html 
b/src/main/webapp/documents/quickstart.html
new file mode 100644
index 0000000..2d2ad93
--- /dev/null
+++ b/src/main/webapp/documents/quickstart.html
@@ -0,0 +1,120 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled 
Document.md</title><style>
+p img{ max-width:100%}
+pre {
+    width: 80% !important;
+    white-space: normal;
+}
+</style>
+</head><body id="preview">
+<!--<p>&lt;!â<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+âLicenseâ); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>â&gt;</p>-->
+<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" 
alt="CarbonData_Logo"></p>
+<h1><a id="Quick_Start_20"></a>Quick Start</h1>
+<p>This tutorial provides a quick introduction to using CarbonData.</p>
+<h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with 
Apache CarbonData</h2>
+<ul>
+<li><a href="#installation">Installation</a></li>
+<li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive 
Analysis with Carbon-Spark Shell</a>
+<ul>
+<li><a href="#basics">Basics</a></li>
+<li><a href="#executing-queries">Executing Queries</a>
+<ul>
+<li><a href="#prerequisites">Prerequisites</a></li>
+<li><a href="#create-table">Create Table</a></li>
+<li><a href="#load-data-to-table">Load data to Table</a></li>
+<li><a href="#query-data-from-table">Query data from table</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#carbon-sql-cli">Carbon SQL CLI</a>
+<ul>
+<li><a href="#basics">Basics</a></li>
+<li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li>
+</ul>
+</li>
+<li><a href="">Building CarbonData</a></li>
+</ul>
+<h2><a id="Installation_39"></a>Installation</h2>
+<ul>
+<li>Download released package of <a 
href="http://spark.apache.org/downloads.html";>Spark 1.5.0 to 1.6.2</a></li>
+<li>Download and install <a 
href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html";>Apache 
Thrift 0.9.3</a>, make sure thrift is added to system path.</li>
+<li>Download <a href="https://github.com/apache/incubator-carbondata";>Apache 
CarbonData code</a> and build it. Please visit <a 
href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And 
IDE Configuration</a> for more information.</li>
+</ul>
+<h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive 
Analysis with Carbon-Spark Shell</h2>
+<p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a 
simple way to learn the API, as well as a powerful tool to analyze data 
interactively. Please visit <a 
href="http://spark.apache.org/docs/latest/";>Apache Spark Documentation</a> for 
more details on Spark shell.</p>
+<h4><a id="Basics_47"></a>Basics</h4>
+<p>Start Spark shell by running the following in the Carbon directory:</p>
+<pre><code>./bin/carbon-spark-shell
+</code></pre>
+<p><em>Note</em>: In this shell SparkContext is readily available as sc and 
CarbonContext is available as cc.</p>
+<p>CarbonData stores and writes the data in its specified format at the 
default location on the hdfs.<br>
+By default carbon.storelocation is set as :</p>
+<pre><code>hdfs://IP:PORT/Opt/CarbonStore
+</code></pre>
+<p>And you can provide your own store location by providing configuration 
using --conf option like:</p>
+<pre><code>./bin/carbon-spark-sql --conf 
spark.carbon.storepath=&lt;storelocation&gt;
+</code></pre>
+<h4><a id="Executing_Queries_64"></a>Executing Queries</h4>
+<p><strong>Prerequisites</strong></p>
+<p>Create sample.csv file in CarbonData directory. The CSV is required for 
loading data into Carbon.</p>
+<pre><code>$ cd carbondata
+$ cat &gt; sample.csv &lt;&lt; EOF
+  id,name,city,age
+  1,david,shenzhen,31
+  2,eason,shenzhen,27
+  3,jarry,wuhan,35
+  EOF
+</code></pre>
+<p><strong>Create table</strong></p>
+<pre><code>scala&gt;cc.sql(&quot;create table if not exists test_table (id 
string, name string, city string, age Int) STORED BY 'carbondata'&quot;)
+</code></pre>
+<p><strong>Load data to table</strong></p>
+<pre><code>scala&gt;val dataFilePath = new 
File(&quot;../carbondata/sample.csv&quot;).getCanonicalPath
+scala&gt;cc.sql(s&quot;load data inpath '$dataFilePath' into table 
test_table&quot;)
+</code></pre>
+<p><strong>Query data from table</strong></p>
+<pre><code>scala&gt;cc.sql(&quot;select * from test_table&quot;).show
+scala&gt;cc.sql(&quot;select city, avg(age), sum(age) from test_table group by 
city&quot;).show
+</code></pre>
+<h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2>
+<p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a 
convenient tool to execute queries input from the command line. Please visit <a 
href="http://spark.apache.org/docs/latest/";>Apache Spark Documentation</a> for 
more information Spark SQL CLI.</p>
+<h4><a id="Basics_101"></a>Basics</h4>
+<p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory 
:</p>
+<pre><code>./bin/carbon-spark-sql
+</code></pre>
+<p>CarbonData stores and writes the data in its specified format at the 
default location on the hdfs.<br>
+By default carbon.storelocation is set as :</p>
+<pre><code>hdfs://IP:PORT/Opt/CarbonStore
+</code></pre>
+<p>And you can provide your own store location by providing configuration 
using --conf option like:</p>
+<pre><code>./bin/carbon-spark-sql --conf 
spark.carbon.storepath=/home/root/carbonstore
+</code></pre>
+<h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4>
+<pre><code>spark-sql&gt; create table if not exists test_table (id string, 
name string, city string, age Int) STORED BY 'carbondata'
+spark-sql&gt; load data inpath '../sample.csv' into table test_table
+spark-sql&gt; select city, avg(age), sum(age) from test_table group by city
+</code></pre>
+<h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2>
+<p>To get started, get CarbonData from the <a href="">downloads</a> on the <a 
href="http://carbondata.incubator.apache.org.";>http://carbondata.incubator.apache.org.</a><br>
+CarbonData uses Hadoopâs client libraries for HDFS and YARN and Sparkâs 
libraries. Downloads are pre-packaged for a handful of popular Spark 
versions.</p>
+<p>If youâd like to build CarbonData from source,  Please visit <a 
href="">Building CarbonData And IDE Configuration</a></p>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/troubleshooting.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/troubleshooting.html 
b/src/main/webapp/documents/troubleshooting.html
new file mode 100644
index 0000000..8d71dfa
--- /dev/null
+++ b/src/main/webapp/documents/troubleshooting.html
@@ -0,0 +1,42 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled 
Document.md</title><style></style></head><body id="preview">
+<p>&lt;!â<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+âLicenseâ); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>â&gt;</p>
+<h1><a id="Troubleshooting_19"></a>Troubleshooting</h1>
+<p>This tutorial is designed to provide troubleshooting for end users and 
developers<br>
+who are building, deploying, and using CarbonData.</p>
+<ul>
+<li>
+<h2><a id="Prerequisites_for_Developers_23"></a>Prerequisites for 
Developers</h2>
+</li>
+<li>
+<h2><a id="Prerequisites_for_End_Users_25"></a>Prerequisites for End Users</h2>
+</li>
+<li>
+<h2><a id="General_Prevention_and_Best_Practices_27"></a>General Prevention 
and Best Practices</h2>
+</li>
+<li>
+<h2><a id="Procedures_29"></a>Procedure(s)</h2>
+</li>
+<li>
+<h2><a id="References_31"></a>References</h2>
+</li>
+</ul>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/6b9f3f2c/src/main/webapp/documents/usecases.html
----------------------------------------------------------------------
diff --git a/src/main/webapp/documents/usecases.html 
b/src/main/webapp/documents/usecases.html
new file mode 100644
index 0000000..d4bac49
--- /dev/null
+++ b/src/main/webapp/documents/usecases.html
@@ -0,0 +1,93 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled 
Document.md</title><style></style></head><body id="preview">
+<p>&lt;!â<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+âLicenseâ); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>â&gt;</p>
+<h1><a id="CarbonData_Use_Cases_19"></a>CarbonData Use Cases</h1>
+<p>This tutorial will discuss about the problems that CarbonData <a 
href="http://address.It";>address.It</a> shall take you through the identified 
top use cases of Carbon.</p>
+<h2><a id="Introduction_22"></a>Introduction</h2>
+<p>For big data interactive analysis scenarios, many customers expect 
sub-second response to query TB-PB level data on general hardware clusters with 
just a few nodes.</p>
+<p>In the current big data ecosystem, there are few columnar storage formats 
such as ORC and Parquet that are designed for SQL on Big Data. Apache Hiveâs 
ORC format is<br>
+a columnar storage format with basic indexing capability. However, ORC cannot 
meet the sub-second query response expectation on TB level data, because ORC 
format<br>
+performs only stride level dictionary encoding and all analytical operations 
such as filtering and aggregation is done on the actual data. Apache Parquet is 
columnar<br>
+storage can improve performance in comparison to ORC, because of more 
efficient storage organization. Though Parquet can provide query response on TB 
level data in a<br>
+few seconds, it is still far from the sub-second expectation of interactive 
analysis users. Cloudera Kudu can effectively solve some query performance 
issues, but kudu<br>
+is not hadoop native, canât seamlessly integrate historic HDFS data into new 
kudu system.</p>
+<p>However, CarbonData uses specially engineered optimizations targeted to 
improve performance of analytical queries which can include filters, 
aggregation and distinct counts,<br>
+the required data to be stored in an indexed, well organized, read-optimized 
format, CarbonDataâs query performance can achieve sub-second response.</p>
+<h2><a 
id="Motivation_Single_Format_to_provide_low_latency_response_for_all_use_cases_35"></a>Motivation:
 Single Format to provide low latency response for all use cases</h2>
+<p>The main motivation behind CarbonData is to provide a single storage format 
for all the usecases of querying big data on Hadoop. Thus CarbonData is able to 
cover all use-cases<br>
+into a single storage format.</p>
+<p><img src="../docs/images/format/carbon_data_motivation.png?raw=true" 
alt="Motivation"></p>
+<h2><a id="Use_Cases_41"></a>Use Cases</h2>
+<ul>
+<li>
+<h3><a id="Sequential_Access_42"></a>Sequential Access</h3>
+<ul>
+<li>Supports queries that select only a few columns with a group by clause but 
do not contain any filters.<br>
+This results in full scan over the complete store for the selected 
columns.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_full_scan.png?raw=true" 
alt="Sequential_Scan"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>ETL jobs</li>
+<li>Log Analysis</li>
+</ul>
+</li>
+<li>
+<h3><a id="Random_Access_53"></a>Random Access</h3>
+<ul>
+<li>Supports Point Query. These are queries used from operational applications 
and usually select all or most of the columns but do involve a large number 
of<br>
+filters which reduce the result to a small size. Such queries generally do not 
involve any aggregation or group by clause.
+<ul>
+<li>Row-key query(like HBase)</li>
+<li>Narrow Scan</li>
+<li>Requires second/sub-second level low latency</li>
+</ul>
+</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_random_scan.png?raw=true" 
alt="random_access"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>Operational Query</li>
+<li>User Profiling</li>
+</ul>
+</li>
+<li>
+<h3><a id="Olap_Style_Query_67"></a>Olap Style Query</h3>
+<ul>
+<li>Supports Interactive data analysis for any dimensions. These are queries 
which are typically fired from Interactive Analysis tools.<br>
+Such queries often select a few columns but involve filters and group by on a 
column or a grouping expression.<br>
+It also supports queries that :
+<ul>
+<li>involves aggregation/join</li>
+<li>Roll-up,Drill-down,Slicing and Dicing</li>
+<li>Low-latency ad-hoc query</li>
+</ul>
+</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_olap_scan.png?raw=true" 
alt="Olap_style_query"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>Dash-board reporting</li>
+<li>Fraud &amp; Ad-hoc Analysis</li>
+</ul>
+</li>
+</ul>
+
+</body></html>

[13/42] incubator-carbondata-site git commit: Update website

Reply via email to