ate` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”) take 8
> seconds.
>
>
> thanks
>
> From: Mich Talebzadeh <mich.talebza...@gmail.com>
> Date: Sunday, April 17, 2016 at 2:52 PM
>
> To: maurin lenglart <mau...@cuberonlabs.com>
> Cc: "user @
gt;
> Date: Sunday, April 17, 2016 at 2:52 PM
>
> To: maurin lenglart <mau...@cuberonlabs.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: orc vs parquet aggregation, orc is really slow
>
> hang on so it takes 15 seconds
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: orc vs parquet aggregation, orc is really slow
hang on so it takes 15 seconds to switch the database context with
HiveContext.sql("use myDatabase") ?
Dr Mich Talebzadeh
LinkedIn
https://www.li
unday, April 17, 2016 at 2:22 PM
>
> To: maurin lenglart <mau...@cuberonlabs.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: orc vs parquet aggregation, orc is really slow
>
> Hi Maurin,
>
> Have you tried to create your table in Hive as parque
gmail.com<mailto:mich.talebza...@gmail.com>>
Date: Sunday, April 17, 2016 at 2:22 PM
To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>>
Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: orc vs parquet aggrega
read.format(‘orc').load(‘mytableFiles’).registerAsTable(‘myTable’)
> The queries done on myTable take at least twice the amount of time
> compared to queries done on the table loaded with hive directly.
> For technical reasons my pipeline is not fully migrated to use hive
> tables, and in a lo
ich.talebza...@gmail.com>>
Date: Saturday, April 16, 2016 at 4:14 AM
To: maurin lenglart <mau...@cuberonlabs.com<mailto:mau...@cuberonlabs.com>>,
"user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: orc vs parquet aggregation, o
Saturday, April 16, 2016 at 12:32 AM
> To: maurin lenglart <mau...@cuberonlabs.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: orc vs parquet aggregation, orc is really slow
>
> Have you analysed statistics on the ORC table? How many rows are the
cuberonlabs.com<mailto:mau...@cuberonlabs.com>>
Cc: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: orc vs parquet aggregation, orc is really slow
Generally a recommendation (besides the issue) - Do not put dates as String. I
recommend he
org<mailto:user@spark.apache.org>>
Subject: Re: orc vs parquet aggregation, orc is really slow
Have you analysed statistics on the ORC table? How many rows are there?
Also send the outp of
desc formatted statistics
HTH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/prof
Generally a recommendation (besides the issue) - Do not put dates as String. I
recommend here to make them ints. It will be in both cases much faster.
It could be that you load them differently in the tables. Generally for these
tables you should insert them in both cases sorted into the
Have you analysed statistics on the ORC table? How many rows are there?
Also send the outp of
desc formatted statistics
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Hi,
I am executing one query :
“SELECT `event_date` as `event_date`,sum(`bookings`) as
`bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >=
'2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 2”
My table was created something like :
CREATE
13 matches
Mail list logo