[HOTFIX] Correct CI url and add standard partition usage This closes #1889
Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/24ba2fe2 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/24ba2fe2 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/24ba2fe2 Branch: refs/heads/fgdatamap Commit: 24ba2fe2226f9168dcde6c216948f8656488293d Parents: 8a86d3f Author: chenliang613 <chenliang...@huawei.com> Authored: Tue Jan 30 22:35:02 2018 +0800 Committer: Jacky Li <jacky.li...@qq.com> Committed: Wed Jan 31 19:18:26 2018 +0800 ---------------------------------------------------------------------- README.md | 12 +++---- docs/data-management-on-carbondata.md | 38 ++++++++++++++++++-- .../examples/StandardPartitionExample.scala | 7 ++-- .../preaggregate/TestPreAggCreateCommand.scala | 17 +++++++++ 4 files changed, 61 insertions(+), 13 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md index 15dba93..3b6792e 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ <img src="/docs/images/CarbonData_logo.png" width="200" height="40"> -Apache CarbonData is an indexed columnar data format for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. +Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. You can find the latest CarbonData document and learn more at: [http://carbondata.apache.org](http://carbondata.apache.org/) @@ -25,14 +25,9 @@ You can find the latest CarbonData document and learn more at: [CarbonData cwiki](https://cwiki.apache.org/confluence/display/CARBONDATA/) ## Status -Spark2.1: -[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.1)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.1/badge/icon) +Spark2.2: +[![Build Status](https://builds.apache.org/buildStatus/icon?job=carbondata-master-spark-2.2)](https://builds.apache.org/view/A-D/view/CarbonData/job/carbondata-master-spark-2.2/lastBuild/testReport) [![Coverage Status](https://coveralls.io/repos/github/apache/carbondata/badge.svg?branch=master)](https://coveralls.io/github/apache/carbondata?branch=master) -## Features -CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features: -* Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file. -* Operable encoded data :Through supporting efficient compression and global encoding schemes, can query on compressed/encoded data, the data can be converted just before returning the results to the users, which is "late materialized". -* Supports for various use cases with one single Data format : like interactive OLAP-style query, Sequential Access (big scan), Random Access (narrow scan). ## Building CarbonData CarbonData is built using Apache Maven, to [build CarbonData](https://github.com/apache/carbondata/blob/master/build) @@ -50,6 +45,7 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com ## Other Technical Material [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609) +[Use Case Articles](https://cwiki.apache.org/confluence/display/CARBONDATA/CarbonData+Articles) ## Fork and Contribute This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/docs/data-management-on-carbondata.md ---------------------------------------------------------------------- diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md index 3af95ac..d7954e1 100644 --- a/docs/data-management-on-carbondata.md +++ b/docs/data-management-on-carbondata.md @@ -567,9 +567,43 @@ This tutorial is going to introduce all commands and data operations on CarbonDa ALTER TABLE table_name COMPACT 'MAJOR' ``` -## PARTITION + - **CLEAN SEGMENTS AFTER Compaction** + + Clean the segments which are compacted: + ``` + CLEAN FILES FOR TABLE carbon_table + ``` + +## STANDARD PARTITION + + The partition is same as Spark, the creation partition command as below: + + ``` + CREATE TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type , ...)] + PARTITIONED BY (partition_col_name data_type) + STORED BY 'carbondata' + [TBLPROPERTIES (property_name=property_value, ...)] + ``` + + Example: + ``` + CREATE TABLE partitiontable0 + (id Int, + vin String, + phonenumber Long, + area String, + salary Int) + PARTITIONED BY (country String) + STORED BY 'org.apache.carbondata.format' + TBLPROPERTIES('SORT_COLUMNS'='id,vin') + ) + ``` + + +## CARBONDATA PARTITION(HASH,RANGE,LIST) - Similar to other system's partition features, CarbonData's partition feature also can be used to improve query performance by filtering on the partition column. + The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column. ### Create Hash Partition Table http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala ---------------------------------------------------------------------- diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala index 5a8e3f5..1126ecc 100644 --- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala +++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/StandardPartitionExample.scala @@ -47,6 +47,7 @@ object StandardPartitionExample { | salary Int) | PARTITIONED BY (country String) | STORED BY 'org.apache.carbondata.format' + | TBLPROPERTIES('SORT_COLUMNS'='id,vin') """.stripMargin) spark.sql(s""" @@ -55,7 +56,7 @@ object StandardPartitionExample { spark.sql( s""" - | SELECT * + | SELECT country,id,vin,phonenumver,area,salary | FROM partitiontable0 """.stripMargin).show() @@ -65,8 +66,8 @@ object StandardPartitionExample { import scala.util.Random import spark.implicits._ val r = new Random() - val df = spark.sparkContext.parallelize(1 to 10 * 1000 * 1000) - .map(x => ("No." + r.nextInt(100000), "country" + x % 8, "city" + x % 50, x % 300)) + val df = spark.sparkContext.parallelize(1 to 10 * 1000 * 10) + .map(x => ("No." + r.nextInt(1000), "country" + x % 8, "city" + x % 50, x % 300)) .toDF("ID", "country", "city", "population") // Create table without partition http://git-wip-us.apache.org/repos/asf/carbondata/blob/24ba2fe2/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala ---------------------------------------------------------------------- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala index 303abf4..23132de 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala @@ -1,3 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + package org.apache.carbondata.integration.spark.testsuite.preaggregate import scala.collection.JavaConverters._