What's the plan if you run explain?
In 1.5 the default should be TungstenAggregate, which does spill (switching
from hash-based aggregation to sort-based aggregation).
On Mon, Sep 21, 2015 at 5:34 PM, Matt Cheah wrote:
> Hi everyone,
>
> I’m debugging some slowness and
t;
Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Mingyu Kim
<m...@palantir.com>, Peter Faiman <peterfai...@palantir.com>
Subject: Re: DataFrames Aggregate does not spill?
What's the plan if you run explain?
In 1.5 the default should be TungstenAggregate, whi
Hi everyone,
I¹m debugging some slowness and apparent memory pressure + GC issues after I
ported some workflows from raw RDDs to Data Frames. In particular, I¹m
looking into an aggregation workflow that computes many aggregations per key
at once.
My workflow before was doing a fairly