[19/50] [abbrv] carbondata git commit: [HOTFIX] Correct CI url and add standard partition usage

ravipesala Mon, 05 Feb 2018 07:09:12 -0800

[HOTFIX] Correct CI url and add standard partition usage

This closes #1889



Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/24ba2fe2
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/24ba2fe2
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/24ba2fe2

Branch: refs/heads/fgdatamap
Commit: 24ba2fe2226f9168dcde6c216948f8656488293d
Parents: 8a86d3f
Author: chenliang613 <chenliang...@huawei.com>
Authored: Tue Jan 30 22:35:02 2018 +0800
Committer: Jacky Li <jacky.li...@qq.com>
Committed: Wed Jan 31 19:18:26 2018 +0800

----------------------------------------------------------------------
 README.md                                       | 12 +++----
 docs/data-management-on-carbondata.md           | 38 ++++++++++++++++++--
 .../examples/StandardPartitionExample.scala     |  7 ++--
 .../preaggregate/TestPreAggCreateCommand.scala  | 17 +++++++++
 4 files changed, 61 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 15dba93..3b6792e 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@
 
 <img src="/docs/images/CarbonData_logo.png" width="200" height="40">
 
-Apache CarbonData is an indexed columnar data format for fast analytics on big 
data platform, e.g.Apache Hadoop, Apache Spark, etc.
+Apache CarbonData is an indexed columnar data store solution for fast 
analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.
 
 You can find the latest CarbonData document and learn more at:
 [http://carbondata.apache.org](http://carbondata.apache.org/)
@@ -25,14 +25,9 @@ You can find the latest CarbonData document and learn more 
at:
 [CarbonData cwiki](https://cwiki.apache.org/confluence/display/CARBONDATA/)
 
 ## Status
-Spark2.1:
-[![Build 
Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.1)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.1/badge/icon)
+Spark2.2:
+[![Build 
Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.2)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport)
 [![Coverage 
Status](https://coveralls.io/repos/github/apache/carbondata/badge.svg?branch=master)](https://coveralls.io/github/apache/carbondata?branch=master)
-## Features
-CarbonData file format is a columnar store in HDFS, it has many features that 
a modern columnar format has, such as splittable, compression schema ,complex 
data type etc, and CarbonData has following unique features:
-* Stores data along with index: it can significantly accelerate query 
performance and reduces the I/O scans and CPU resources, where there are 
filters in the query.  CarbonData index consists of multiple level of indices, 
a processing framework can leverage this index to reduce the task it needs to 
schedule and process, and it can also do skip scan in more finer grain unit 
(called blocklet) in task side scanning instead of scanning the whole file. 
-* Operable encoded data :Through supporting efficient compression and global 
encoding schemes, can query on compressed/encoded data, the data can be 
converted just before returning the results to the users, which is "late 
materialized". 
-* Supports for various use cases with one single Data format : like 
interactive OLAP-style query, Sequential Access (big scan), Random Access 
(narrow scan). 
 
 ## Building CarbonData
 CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com/apache/carbondata/blob/master/build)
@@ -50,6 +45,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 
 ## Other Technical Material
 [Apache CarbonData meetup 
material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)
+[Use Case 
Articles](https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Articles)
 
 ## Fork and Contribute
 This is an active open source project for everyone, and we are always open to 
people who want to use this system or contribute to it. 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md 
b/docs/data-management-on-carbondata.md
index 3af95ac..d7954e1 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -567,9 +567,43 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
   ALTER TABLE table_name COMPACT 'MAJOR'
   ```
 
-## PARTITION
+  - **CLEAN SEGMENTS AFTER Compaction**
+  
+  Clean the segments which are compacted:
+  ```
+  CLEAN FILES FOR TABLE carbon_table
+  ```
+
+## STANDARD PARTITION
+
+  The partition is same as Spark, the creation partition command as below:
+  
+  ```
+  CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type , ...)]
+  PARTITIONED BY (partition_col_name data_type)
+  STORED BY 'carbondata'
+  [TBLPROPERTIES (property_name=property_value, ...)]
+  ```
+
+  Example:
+  ```
+  CREATE TABLE partitiontable0
+                  (id Int,
+                  vin String,
+                  phonenumber Long,
+                  area String,
+                  salary Int)
+                  PARTITIONED BY (country String)
+                  STORED BY 'org.apache.carbondata.format'
+                  TBLPROPERTIES('SORT_COLUMNS'='id,vin')
+                  )
+  ```
+
+
+## CARBONDATA PARTITION(HASH,RANGE,LIST)
 
-  Similar to other system's partition features, CarbonData's partition feature 
also can be used to improve query performance by filtering on the partition 
column.
+  The partition supports three type:(Hash,Range,List), similar to other 
system's partition features, CarbonData's partition feature can be used to 
improve query performance by filtering on the partition column.
 
 ### Create Hash Partition Table
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala
 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala
index 5a8e3f5..1126ecc 100644
--- 
a/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala
+++ 
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala
@@ -47,6 +47,7 @@ object StandardPartitionExample {
                 | salary Int)
                 | PARTITIONED BY (country String)
                 | STORED BY 'org.apache.carbondata.format'
+                | TBLPROPERTIES('SORT_COLUMNS'='id,vin')
               """.stripMargin)
 
     spark.sql(s"""
@@ -55,7 +56,7 @@ object StandardPartitionExample {
 
     spark.sql(
       s"""
-         | SELECT *
+         | SELECT country,id,vin,phonenumver,area,salary
          | FROM partitiontable0
       """.stripMargin).show()
 
@@ -65,8 +66,8 @@ object StandardPartitionExample {
     import scala.util.Random
     import spark.implicits._
     val r = new Random()
-    val df = spark.sparkContext.parallelize(1 to 10 * 1000 * 1000)
-      .map(x => ("No." + r.nextInt(100000), "country" + x % 8, "city" + x % 
50, x % 300))
+    val df = spark.sparkContext.parallelize(1 to 10 * 1000 * 10)
+      .map(x => ("No." + r.nextInt(1000), "country" + x % 8, "city" + x % 50, 
x % 300))
       .toDF("ID", "country", "city", "population")
 
     // Create table without partition

http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
----------------------------------------------------------------------
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
index 303abf4..23132de 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 package org.apache.carbondata.integration.spark.testsuite.preaggregate
 
 import scala.collection.JavaConverters._

[19/50] [abbrv] carbondata git commit: [HOTFIX] Correct CI url and add standard partition usage

Reply via email to