Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Rajesh Balamohan
ate` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8 > seconds. > > > thanks > > From: Mich Talebzadeh <mich.talebza...@gmail.com> > Date: Sunday, April 17, 2016 at 2:52 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
gt; > Date: Sunday, April 17, 2016 at 2:52 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > hang on so it takes 15 seconds

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
<user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow hang on so it takes 15 seconds to switch the database context with HiveContext.sql("use myDatabase") ? Dr Mich Talebzadeh LinkedIn https://www.li

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
unday, April 17, 2016 at 2:22 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > Hi Maurin, > > Have you tried to create your table in Hive as parque

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
gmail.com<mailto:mich.talebza...@gmail.com>> Date: Sunday, April 17, 2016 at 2:22 PM To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>> Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggrega

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh
read.format(‘orc').load(‘mytableFiles’).registerAsTable(‘myTable’) > The queries done on myTable take at least twice the amount of time > compared to queries done on the table loaded with hive directly. > For technical reasons my pipeline is not fully migrated to use hive > tables, and in a lo

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart
ich.talebza...@gmail.com>> Date: Saturday, April 16, 2016 at 4:14 AM To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>>, "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, o

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh
Saturday, April 16, 2016 at 12:32 AM > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > Have you analysed statistics on the ORC table? How many rows are the

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
cuberonlabs.com<mailto:mau...@cuberonlabs.com>> Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Generally a recommendation (besides the issue) - Do not put dates as String. I recommend he

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of desc formatted statistics HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/prof

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Jörn Franke
Generally a recommendation (besides the issue) - Do not put dates as String. I recommend here to make them ints. It will be in both cases much faster. It could be that you load them differently in the tables. Generally for these tables you should insert them in both cases sorted into the

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh
Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of desc formatted statistics HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart
Hi, I am executing one query : “SELECT `event_date` as `event_date`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2” My table was created something like : CREATE