Re: Fw:multiple group by action

2018-08-24 Thread Reynold Xin
Use rollout and cube. On Fri, Aug 24, 2018 at 7:55 PM 崔苗 wrote: > > > > > > > Forwarding messages > From: "崔苗" > Date: 2018-08-25 10:54:31 > To: d...@spark.apache.org > Subject: multiple group by action > > Hi, > we have some user data with >

Fw:multiple group by action

2018-08-24 Thread 崔苗
Forwarding messages From: "崔苗" Date: 2018-08-25 10:54:31 To: d...@spark.apache.org Subject: multiple group by action Hi, we have some user data with columns(userId,company,client,country,region,city), now we want to count userId by multiple column,such as : select

CBO not working for Parquet Files

2018-08-24 Thread rajat mishra
Hi All, I have an external table in spark whose underlying data files are in parquet format. The table is partitioned. When I try to computed the statistics for a query where partition column is in where clause, the statistics returned contains only the sizeInBytes and not the no of rows count.

Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-24 Thread Sonal Goyal
Without knowing too much about your application, it would be hard to say. Maybe it is working faster in local as there is no shuffling etc? The spark.ui would be your best bet to know what stage is slowing things down. On Fri 24 Aug, 2018, 3:26 PM Guillermo Ortiz, wrote: > Another test I just

Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-24 Thread Guillermo Ortiz
Another test I just did it's to execute with local[X] and this problem doesn't happen. Communication problems? 2018-08-23 22:43 GMT+02:00 Guillermo Ortiz : > it's a complex DAG before the point I cache the RDD, they are some joins, > filter and maps before caching data, but most of the times it