[ANNOUNCE] Apache CarbonData 1.5.3 release

2019-04-09 Thread Raghunandan S
Hi All,

Apache CarbonData community is pleased to announce the release of the
Version 1.5.3 in The Apache Software Foundation (ASF).

CarbonData is a high-performance data solution that supports various data
analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter
lookup on detail record, streaming analytics, and so on. CarbonData has
been deployed in many enterprise production environments, in one of the
largest scenario, it supports queries on a single table with 3PB data (more
than 5 trillion records) with response time less than 3 seconds!

We encourage you to use the release
https://dist.apache.org/repos/dist/release/carbondata/1.5.3/, and feedback
through the CarbonData user mailing lists !

This release note provides information on the new features, improvements,
and bug fixes of this release.
What’s New in CarbonData Version 1.5.3?

CarbonData 1.5.3 intention was to move closer to unified analytics. We are
allowing DDL to operate on LRU cache for the user to handle LRU cache as
per his requirement. We have also upgraded the integration support for
Presto latest version. More importantly, we have further improved the
CarbonData performance.

In this version of CarbonData, more than 20 JIRA tickets related to new
features, improvements, and bugs have been resolved. Following are the
summary.
CarbonData CoreDDL Support on CarbonData LRU Cache

Before, though the user could set the cache size, the functionality was
limited as the user did not have a clear picture of how much cache should
be set for his/her requirement.

>From this version, we support DDL on CarbonData LRU Cache which allows you
to do the following operations:

   - Show the current cache used per table.
   - Showing current cache used for a specific table.
   - Clearing cache for a specific table.

Supports SDK Read from Different Schema

This version allows the user to read two or more CarbonData files in a
location with different schema.
Performance ImprovementsImproved Single/Concurrent Query Performance

When the number of segments are more, query performance reduces due to
higher memory footprint, multi-thread pruning, retrieval from unsafe
Datamap, and so on.

In this version we have improved the  query performance by following
modifications:

   - Reduced memory footprints during the query.
   - Added multi-thread pruning in case of nonfilter query.
   - Updated driver cache unsafe storage format for faster retrieval of
   data.

Improved Count(*) Query Performance

Before for count(*), the prune used to be the same as a select * query
which is very time-consuming due to different processes involved.

In this version, we have optimized the count(*) query performance by
reading blocklet row count directly from DataMapRow. This reduces query
time and improves the query performance.
Other ImprovementsPresto Version Upgrade

Now CarbonData integrates with the Presto version 0.217.
Behavior Change

None


Please find the detailed JIRA list:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344322
Bug

   - [CARBONDATA-3202
   ] - updated
   schema is not updated in session catalog after add, drop or rename column.
   - [CARBONDATA-3223
   ] - Datasize and
   Indexsize showing 0B for 1.1 store when show segments is done
   - [CARBONDATA-3284
   ] - Workaround
   for Create-PreAgg Datamap Fail
   - [CARBONDATA-3287
   ] - Remove the
   validation of same chema data files in location for external table and file
   format
   - [CARBONDATA-3298
   ] - Logs are
   getting printed when clean files is executed for old stores
   - [CARBONDATA-3301
   ] - Array
   column is giving null data in case of spark carbon file format
   - [CARBONDATA-3313
   ] - count(*) is
   not invalidating the invalid segments cache
   - [CARBONDATA-3314
   ] - Index Cache
   Size printed in SHOW METACACHE ON TABLE DDL is not accurate
   - [CARBONDATA-3315
   ] - Range Filter
   query with two between clauses with an OR gives wrong results
   - [CARBONDATA-3320
   ] - number of
   partitions are always zero in describe formatted for hive native partition
   - [CARBONDATA-3322
   ] - After
   renaming table, "SHOW METACACHE ON TABLE" still works for old table
   - [CARBONDATA-3323
   ] - Output is
   null when cache is empty
   - [CARBONDATA-3328
   

Why metadata path didn't show up on my local disk

2019-04-09 Thread Kuai Yu
Hi Carbondata experts,

I'm new to Spark, also to Carbondata.

I'm trying to leverage Carbondata to store some key-value pairs on HDFS. To
start with, I issued a few commands on Spark shell to help me better
understand the behavior.

Here is how I launched spark shell:
=
spark-shell --spark-version 2.3.0 spark.hive.support=true --driver-memory
2G --num-executors 50 --executor-cores 2 --executor-memory 2G --jars
apache-carbondata-1.5.2-bin-spark2.3.2-hadoop2.7.2.jar

Here is how i issued the commands:
===
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("
hdfs://:9000/user/kuyu/carbondata3", "/export/home/kuyu/wenye")

val schema = StructType(Array(
  StructField("keyCol", StringType, false),
  StructField("deltaCol", LongType, false),
  StructField("__opalSegmentId", IntegerType, false),
  StructField("__opalSegmentOffset", IntegerType, false)))

val keyStoreDF = carbon.read.format("csv").option("header",
"true").schema(schema).load("hdfs://:9000/user/kuyu/keystore.csv")

val carbonDFWriter = new CarbonDataFrameWriter(carbon.sqlContext,
keyStoreDF)
val options = Map("tableName" -> "wenye_xyz")
carbonDFWriter.saveAsCarbonFile(options)

What I found:

'Fact', 'LockFiles', 'Metadata' are created under
hdfs://:9000/user/kuyu/carbondata3/wenye_xyz. However I couldn't
find /export/home/kuyu/wenye was created anywhere. I saw Carbon used  derby
DB by default, which should create the /export/home/kuyu/wenye on local
disk. Is my understanding correct?

Thanks,
KY


Re: [VOTE] Apache CarbonData 1.5.3(RC1) release

2019-04-09 Thread Raghunandan S
Hi all


PMC vote has passed for Apache Carbondata 1.5.3 release, the result

as below:


+1(binding): 5(Jacky, Kumar Vishal, Ravindra, David CaiQiang, Liang Chen)


+1(non-binding) : 1


Thanks all for your vote.


Regards

On Mon, Apr 8, 2019 at 6:31 PM Ravindra Pesala 
wrote:

> +1
>
> Regards,
> Ravindra.
>
> On Wed, 3 Apr 2019 at 1:23 PM, Raghunandan S <
> carbondatacontributi...@gmail.com> wrote:
>
> > Hi
> >
> >
> > I submit the Apache CarbonData 1.5.3 (RC1) for your vote.
> >
> >
> > 1.Release Notes:
> >
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344322
> >
> >
> > Some key features and improvements in this release:
> >
> >
> >1. Supported DDL to operate on CarbonData LRU Cache
> >
> >2. Improved single, concurrent query performance.
> >
> >3. Count(*) query performance enhanced by optimising datamaps pruning
> >
> >4. Supported adding new columns through SDK
> >
> >5. Presto version upgraded to 0.217
> >
> >
> >
> >
> > [Behavior Changes]
> >
> >1. None
> >
> >
> >  2. The tag to be voted upon : apache-carbondata-1.5.3-rc1 (commit:
> >
> > 7f271d0aba272f9fbe9642a4900cd4da61eb43bb)
> >
> >
> >
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.3-rc1
> >
> >
> >
> > 3. The artifacts to be voted on are located here:
> >
> > https://dist.apache.org/repos/dist/dev/carbondata/1.5.3-rc1/
> >
> >
> >
> > 4. A staged Maven repository is available for review at:
> >
> >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1039/
> >
> >
> >
> > 5. Release artifacts are signed with the following key:
> >
> >
> > *https://people.apache.org/keys/committer/raghunandan.asc*
> >
> >
> >
> > Please vote on releasing this package as Apache CarbonData 1.5.3,  The
> vote
> >
> >
> > will be open for the next 72 hours and passes if a majority of
> >
> >
> > at least three +1 PMC votes are cast.
> >
> >
> >
> > [ ] +1 Release this package as Apache CarbonData 1.5.3
> >
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> >
> > [ ] -1 Do not release this package because...
> >
> >
> >
> > Regards,
> >
> > Raghunandan.
> >
> --
> Thanks & Regards,
> Ravi
>


Re: [Discussion] is it necessary to support SORT_COLUMNS modification

2019-04-09 Thread David CaiQiang
please check JIRA and find the design doc:
https://issues.apache.org/jira/browse/CARBONDATA-3347



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/