Vinay created CARBONDATA-3240:
---------------------------------

             Summary: Performance Report CD vs parquet
                 Key: CARBONDATA-3240
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3240
             Project: CarbonData
          Issue Type: Bug
          Components: sql
    Affects Versions: 1.5.1
         Environment: 3 node cluster, 32GB each, 8 core per machine. Install 
spark 2.3.2, hadoop and hive with Mysql.
            Reporter: Vinay


Hi, 

With report published on site its exciting to use CarbonData in our projects. 

We did tpc-ds test on 100GB of data for both parquet and CarbonData, but the 
results are not upto the mark, on average carbon data is slower than parquet 
when we use getorCreateCarbonSession. We used 

SparkSession spark = 
SparkSession.builder().config(sparkConf).appName("WritetocarbonData").enableHiveSupport().getOrCreate();
 SparkSession.Builder builder = 
SparkSession.builder().config(sparkConf).master("local").appName("WritetocarbonData")
 .config(sparkConf);
 SparkSession carbon = new 
CarbonSession.CarbonBuilder(builder).getOrCreateCarbonSession("/home/ec2-user/efs/mysql");

We don't see CarbonData is performing @query level better than parquet or any 
significant difference.

I would like to know how did you perform bench marking and results are better 
than Parquet.

Latest ppts presented by Huwaie in one of China Conference, showcased 
CarbonData is 10x to 20x faster. 

Can any one share the detailed bencmarking steps and code.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to