Hi Reynold,
That's what I was told few times already (most notably by Adam on
twitter), but couldn't understand what it meant exactly. It's only now
when I understand what you're saying, Reynold :)
Does this put DataFrame's Column-based or SQL-based queries usually
faster than Datasets with
The UDF is a black box so Spark can't know what it is dealing with. There
are simple cases in which we can analyze the UDFs byte code and infer what
it is doing, but it is pretty difficult to do in general.
On Tuesday, August 30, 2016, Jacek Laskowski wrote:
> Hi,
>
> I've been
Hi,
I've been playing with UDFs and why they're a blackbox for Spark's
optimizer and started with filters to showcase the optimizations in
play.
My current understanding is that the predicate pushdowns are supported
by the following data sources:
1. Hive tables
2. Parquet files
3. ORC files
4.