Re: CarbonData performance benchmkaring

Liang Chen Wed, 12 Apr 2017 16:00:54 -0700

Hi

1.Did you use the latest master version , or 1.0 ?  suggest you use master
to test
2.Have you tested other TPC-H query which including where/filter?
3.In your case, the query is slow ? or the below "write.format" is slow ?
write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/")


4.Use master version to do query test , and set "ENABLE_VECTOR_READER" to
true.
import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.constants.CarbonCommonConstants
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER,
"true")

Community is doing TPC-H test also currently, do you want to participate in
test together?

Regards
Liang

2017-04-13 1:18 GMT+05:30 Rana Faisal Munir <fmu...@essi.upc.edu>:

> Dear all,
>
>
>
> I am running some experiments to benchmark the performance of both Parquet
> and CarbonData. I am using TPC-H lineitem table of size 8GB. It has 16
> columns and I am running different projection queries where I am reading
> different number of columns (3,6,9,12,15). I am facing some problem with
> CarbonData and it seems to be very slow when I select more than 8 columns.
> It takes almost hours to process my request whereas Parquet is very quick.
> Could please anybody helps me to know this behavior.
>
>
>
>
>
>
>
>
>
>   This is my configuration of cluster:
>
>
>
> 3 Machines
>
> 1 Driver Machine (128 GB, 24 cores)
>
> 2 Worker Machines  (128GB, 24 cores)
>
>
>
> My configuration settings for Spark are:
>
>
>
> spark.executor.instances        12
>
> spark.executor.memory   18g
>
> spark.driver.memory     57g
>
> spark.executor.cores    3
>
> spark.driver.cores      5
>
> spark.default.parallelism       72
>
>
>
> carbon.sort.file.buffer.size=20
>
> carbon.graph.rowset.size=100000
>
> carbon.number.of.cores.while.loading=6
>
> carbon.sort.size=500000
>
> carbon.enableXXHash=true
>
> carbon.number.of.cores.while.compacting=2
>
> carbon.compaction.level.threshold=4,3
>
> carbon.major.compaction.size=1024
>
> carbon.number.of.cores=4
>
> carbon.inmemory.record.size=120000
>
> carbon.enable.quick.filter=false
>
>
>
>
>
> My Queries:
>
>
>
> carbon.sql("CREATE TABLE IF NOT EXISTS lineitem_4  (orderkey BIGINT,
> partkey
> BIGINT, suppkey BIGINT, linenumber BIGINT, quantity DOUBLE, extendedprice
> DOUBLE, discount DOUBLE, tax DOUBLE, returnflag STRING, linestatus STRING,
> shipdate DATE, commitdate DATE, receiptdate DATE, shipinstruct STRING,
> shipmode STRING, comment STRING) STORED BY 'carbondata'
> TBLPROPERTIES('TABLE_BLOCKSIZE'='128 MB')")
>
>
>
> carbon.sql("LOAD DATA INPATH 'hdfs://hdfsmaster/input/lineitem/' INTO
> TABLE
> lineitem_4 OPTIONS ('FILEHEADER' =
> 'orderkey,partkey,suppkey,linenumber,quantity,
> extendedprice,discount,tax,ret
> urnflag,linestatus,shipdate,commitdate,receiptdate,
> shipinstruct,shipmode,com
> ment', 'USE_KETTLE' = 'false', 'DELIMITER'='|')")
>
>
>
>
>
> val proj1 = carbon.sql("SELECT orderkey,partkey,linenumber FROM
> lineitem_4"))
>
> proj1.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/")
>
>
>
> val proj2 = carbon.sql("SELECT
> orderkey,partkey,linenumber,quantity,discount,returnflag FROM lineitem_4")
>
> proj2.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj2/")
>
>
>
> val proj3 = carbon.sql("SELECT
> orderkey,partkey,linenumber,quantity,discount,returnflag,
> linestatus,commitda
> te,receiptdate FROM lineitem_4")
>
> proj3.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj3/")
>
>
>
> val proj4 = carbon.sql("SELECT
> orderkey,partkey,linenumber,quantity,discount,returnflag,
> linestatus,commitda
> te,receiptdate,shipinstruct,shipmode,comment FROM lineitem_4")
>
> proj4.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj4/")
>
>
>
> Thank you
>
>
>
> Regards
>
> Faisal
>
>
>
>


-- 
Regards
Liang

Re: CarbonData performance benchmkaring

Reply via email to