Use rollout and cube.
On Fri, Aug 24, 2018 at 7:55 PM 崔苗 wrote:
>
>
>
>
>
>
> Forwarding messages
> From: "崔苗"
> Date: 2018-08-25 10:54:31
> To: d...@spark.apache.org
> Subject: multiple group by action
>
> Hi,
> we have some user data with
>
Forwarding messages
From: "崔苗"
Date: 2018-08-25 10:54:31
To: d...@spark.apache.org
Subject: multiple group by action
Hi,
we have some user data with columns(userId,company,client,country,region,city),
now we want to count userId by multiple column,such as :
select
Hi All,
I have an external table in spark whose underlying data files are in
parquet format.
The table is partitioned. When I try to computed the statistics for a query
where partition column is in where clause, the statistics returned contains
only the sizeInBytes and not the no of rows count.
Without knowing too much about your application, it would be hard to say.
Maybe it is working faster in local as there is no shuffling etc? The
spark.ui would be your best bet to know what stage is slowing things down.
On Fri 24 Aug, 2018, 3:26 PM Guillermo Ortiz, wrote:
> Another test I just
Another test I just did it's to execute with local[X] and this problem
doesn't happen. Communication problems?
2018-08-23 22:43 GMT+02:00 Guillermo Ortiz :
> it's a complex DAG before the point I cache the RDD, they are some joins,
> filter and maps before caching data, but most of the times it