Hi Rana That would be very nice, you could participate in us to test TPC-H. One contributor will contact you and share with you about the script and DDL of TPC-H.
Actually, you are using old version(Jan version), the current master has done many optimization for TPC-H , maybe you need to clone the master version. Regards Liang 2017-04-13 4:38 GMT+05:30 Rana Faisal Munir <fmu...@essi.upc.edu>: > Hi Liang, > > Thank you very much for your reply. I am giving answers side by side to > your questions > > > 1.Did you use the latest master version , or 1.0 ? suggest you use master > to test > > I have downloaded the latest version from GIT and compile it. It is > carbondata_2.11-1.0.0-incubating-shade-hadoop2.2.0 > > 2.Have you tested other TPC-H query which including where/filter? > > I just started recently and my future plan is to move towards whole TPCH > queries to see CarbonData performance improvements over Parquet. But right > now, I am just running my own queries with different projected columns to > see how well CarbonData can push down the projection. > > 3.In your case, the query is slow ? or the below "write.format" is slow ? > write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/") > > I have the same line for Parquet and Parquet is working perfectly fine. I > don't think , this writing is causing any problem. > > 4.Use master version to do query test , and set "ENABLE_VECTOR_READER" to > true. > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.constants.CarbonCommonConstants > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_ > VECTOR_READER, > "true") > > Thanks for this suggestion. I will enable it and will share with you the > updated results. > > Community is doing TPC-H test also currently, do you want to participate > in test together? > > It would be nice to be part of this. Could you please guide me how I can > contribute. > > Thank you > > Regards > Faisal > -----Original Message----- > From: Liang Chen [mailto:chenliang6...@gmail.com] > Sent: Thursday, April 13, 2017 1:00 AM > To: dev@carbondata.incubator.apache.org > Subject: Re: CarbonData performance benchmkaring > > Hi > > 1.Did you use the latest master version , or 1.0 ? suggest you use master > to test 2.Have you tested other TPC-H query which including where/filter? > 3.In your case, the query is slow ? or the below "write.format" is slow ? > write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/") > > 4.Use master version to do query test , and set "ENABLE_VECTOR_READER" to > true. > import org.apache.carbondata.core.util.CarbonProperties > import org.apache.carbondata.core.constants.CarbonCommonConstants > CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_ > VECTOR_READER, > "true") > > Community is doing TPC-H test also currently, do you want to participate > in test together? > > Regards > Liang > > 2017-04-13 1:18 GMT+05:30 Rana Faisal Munir <fmu...@essi.upc.edu>: > > > Dear all, > > > > > > > > I am running some experiments to benchmark the performance of both > > Parquet and CarbonData. I am using TPC-H lineitem table of size 8GB. > > It has 16 columns and I am running different projection queries where > > I am reading different number of columns (3,6,9,12,15). I am facing > > some problem with CarbonData and it seems to be very slow when I select > more than 8 columns. > > It takes almost hours to process my request whereas Parquet is very > quick. > > Could please anybody helps me to know this behavior. > > > > > > > > > > > > > > > > > > > > This is my configuration of cluster: > > > > > > > > 3 Machines > > > > 1 Driver Machine (128 GB, 24 cores) > > > > 2 Worker Machines (128GB, 24 cores) > > > > > > > > My configuration settings for Spark are: > > > > > > > > spark.executor.instances 12 > > > > spark.executor.memory 18g > > > > spark.driver.memory 57g > > > > spark.executor.cores 3 > > > > spark.driver.cores 5 > > > > spark.default.parallelism 72 > > > > > > > > carbon.sort.file.buffer.size=20 > > > > carbon.graph.rowset.size=100000 > > > > carbon.number.of.cores.while.loading=6 > > > > carbon.sort.size=500000 > > > > carbon.enableXXHash=true > > > > carbon.number.of.cores.while.compacting=2 > > > > carbon.compaction.level.threshold=4,3 > > > > carbon.major.compaction.size=1024 > > > > carbon.number.of.cores=4 > > > > carbon.inmemory.record.size=120000 > > > > carbon.enable.quick.filter=false > > > > > > > > > > > > My Queries: > > > > > > > > carbon.sql("CREATE TABLE IF NOT EXISTS lineitem_4 (orderkey BIGINT, > > partkey BIGINT, suppkey BIGINT, linenumber BIGINT, quantity DOUBLE, > > extendedprice DOUBLE, discount DOUBLE, tax DOUBLE, returnflag STRING, > > linestatus STRING, shipdate DATE, commitdate DATE, receiptdate DATE, > > shipinstruct STRING, shipmode STRING, comment STRING) STORED BY > > 'carbondata' > > TBLPROPERTIES('TABLE_BLOCKSIZE'='128 MB')") > > > > > > > > carbon.sql("LOAD DATA INPATH 'hdfs://hdfsmaster/input/lineitem/' INTO > > TABLE > > lineitem_4 OPTIONS ('FILEHEADER' = > > 'orderkey,partkey,suppkey,linenumber,quantity, > > extendedprice,discount,tax,ret > > urnflag,linestatus,shipdate,commitdate,receiptdate, > > shipinstruct,shipmode,com > > ment', 'USE_KETTLE' = 'false', 'DELIMITER'='|')") > > > > > > > > > > > > val proj1 = carbon.sql("SELECT orderkey,partkey,linenumber FROM > > lineitem_4")) > > > > proj1.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj1/ > > ") > > > > > > > > val proj2 = carbon.sql("SELECT > > orderkey,partkey,linenumber,quantity,discount,returnflag FROM > > lineitem_4") > > > > proj2.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj2/ > > ") > > > > > > > > val proj3 = carbon.sql("SELECT > > orderkey,partkey,linenumber,quantity,discount,returnflag, > > linestatus,commitda > > te,receiptdate FROM lineitem_4") > > > > proj3.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj3/ > > ") > > > > > > > > val proj4 = carbon.sql("SELECT > > orderkey,partkey,linenumber,quantity,discount,returnflag, > > linestatus,commitda > > te,receiptdate,shipinstruct,shipmode,comment FROM lineitem_4") > > > > proj4.write.format("csv").save("hdfs://hdfsmaster/output/carbon/proj4/ > > ") > > > > > > > > Thank you > > > > > > > > Regards > > > > Faisal > > > > > > > > > > > -- > Regards > Liang > > -- Regards Liang