Re: GroupBy on DataFrame taking too much time

Gaini Rajeshwar Mon, 11 Jan 2016 00:48:08 -0800

There is no problem with the sql read. When i do the following it is
working fine.


*val dataframe1 = sqlContext.load("jdbc", Map("url" ->
"jdbc:postgresql://localhost/customerlogs?user=postgres&password=postgres",
"dbtable" -> "customer"))*

*dataframe1.filter("country = 'BA'").show()*

On Mon, Jan 11, 2016 at 1:41 PM, Xingchi Wang <regrec...@gmail.com> wrote:

> Error happend at the "Lost task 0.0 in stage 0.0", I think it is not the
> "groupBy" problem, it's the sql read the "customer" table issue,
> please check the jdbc link and the data is loaded successfully??
>
> Thanks
> Xingchi
>
> 2016-01-11 15:43 GMT+08:00 Gaini Rajeshwar <raja.rajeshwar2...@gmail.com>:
>
>> Hi All,
>>
>> I have a table named *customer *(customer_id, event, country, .... ) in
>> postgreSQL database. This table is having more than 100 million rows.
>>
>> I want to know number of events from each country. To achieve that i am
>> doing groupBY using spark as following.
>>
>> *val dataframe1 = sqlContext.load("jdbc", Map("url" ->
>> "jdbc:postgresql://localhost/customerlogs?user=postgres&password=postgres",
>> "dbtable" -> "customer"))*
>>
>>
>> *dataframe1.groupBy("country").count().show()*
>>
>> above code seems to be getting complete customer table before doing
>> groupBy. Because of that reason it is throwing the following error
>>
>> *16/01/11 12:49:04 WARN HeartbeatReceiver: Removing executor 0 with no
>> recent heartbeats: 170758 ms exceeds timeout 120000 ms*
>> *16/01/11 12:49:04 ERROR TaskSchedulerImpl: Lost executor 0 on 10.2.12.59
>> <http://10.2.12.59>: Executor heartbeat timed out after 170758 ms*
>> *16/01/11 12:49:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>> 0, 10.2.12.59): ExecutorLostFailure (executor 0 exited caused by one of the
>> running tasks) Reason: Executor heartbeat timed out after 170758 ms*
>>
>> I am using spark 1.6.0
>>
>> Is there anyway i can solve this ?
>>
>> Thanks,
>> Rajeshwar Gaini.
>>
>
>

Re: GroupBy on DataFrame taking too much time

Reply via email to