subject:"orc vs parquet aggregation, orc is really slow"

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Rajesh Balamohan

ate` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8 > seconds. > > > thanks > > From: Mich Talebzadeh <mich.talebza...@gmail.com> > Date: Sunday, April 17, 2016 at 2:52 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh

gt; > Date: Sunday, April 17, 2016 at 2:52 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > hang on so it takes 15 seconds

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart

<user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow hang on so it takes 15 seconds to switch the database context with HiveContext.sql("use myDatabase") ? Dr Mich Talebzadeh LinkedIn https://www.li

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh

unday, April 17, 2016 at 2:22 PM > > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > Hi Maurin, > > Have you tried to create your table in Hive as parque

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart

gmail.com<mailto:mich.talebza...@gmail.com>> Date: Sunday, April 17, 2016 at 2:22 PM To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>> Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggrega

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Mich Talebzadeh

read.format(‘orc').load(‘mytableFiles’).registerAsTable(‘myTable’) > The queries done on myTable take at least twice the amount of time > compared to queries done on the table loaded with hive directly. > For technical reasons my pipeline is not fully migrated to use hive > tables, and in a lo

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Maurin Lenglart

ich.talebza...@gmail.com>> Date: Saturday, April 16, 2016 at 4:14 AM To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>>, "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, o

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh

Saturday, April 16, 2016 at 12:32 AM > To: maurin lenglart <mau...@cuberonlabs.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: orc vs parquet aggregation, orc is really slow > > Have you analysed statistics on the ORC table? How many rows are the

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart

cuberonlabs.com<mailto:mau...@cuberonlabs.com>> Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Generally a recommendation (besides the issue) - Do not put dates as String. I recommend he

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart

org<mailto:user@spark.apache.org>> Subject: Re: orc vs parquet aggregation, orc is really slow Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of desc formatted statistics HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/prof

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Jörn Franke

Generally a recommendation (besides the issue) - Do not put dates as String. I recommend here to make them ints. It will be in both cases much faster. It could be that you load them differently in the tables. Generally for these tables you should insert them in both cases sorted into the

Re: orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Mich Talebzadeh

Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of desc formatted statistics HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

orc vs parquet aggregation, orc is really slow

2016-04-16 Thread Maurin Lenglart

Hi, I am executing one query : “SELECT `event_date` as `event_date`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2” My table was created something like : CREATE

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

Re: orc vs parquet aggregation, orc is really slow

orc vs parquet aggregation, orc is really slow

13 matches

Site Navigation

Mail list logo

Footer information