Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Xingchi Wang
Error happend at the "Lost task 0.0 in stage 0.0", I think it is not the "groupBy" problem, it's the sql read the "customer" table issue, please check the jdbc link and the data is loaded successfully?? Thanks Xingchi 2016-01-11 15:43 GMT+08:00 Gaini Rajeshwar : >

Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Gaini Rajeshwar
There is no problem with the sql read. When i do the following it is working fine. *val dataframe1 = sqlContext.load("jdbc", Map("url" -> "jdbc:postgresql://localhost/customerlogs?user=postgres=postgres", "dbtable" -> "customer"))* *dataframe1.filter("country = 'BA'").show()* On Mon, Jan 11,

Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Todd Nist
Hi Rajeshwar Gaini, dbtable can be any valid sql query, simple define it as a sub query, something like: val query = "(SELECT country, count(*) FROM customer group by country) as X" val df1 = sqlContext.read .format("jdbc") .option("url", url) .option("user", username)

GroupBy on DataFrame taking too much time

2016-01-10 Thread Gaini Rajeshwar
Hi All, I have a table named *customer *(customer_id, event, country, ) in postgreSQL database. This table is having more than 100 million rows. I want to know number of events from each country. To achieve that i am doing groupBY using spark as following. *val dataframe1 =