subject:"Dataframe vs Dataset dilemma\: either Row parsing or no filter push\-down"

Re: Dataframe vs Dataset dilemma: either Row parsing or no filter push-down

2018-06-18 Thread Koert Kuipers

we use DataFrame and RDD. Dataset not only has issues with predicate pushdown, it also adds shufffles at times where it shouldn't. and there is some overhead from the encoders themselves, because under the hood it is still just Row objects. On Mon, Jun 18, 2018 at 5:00 PM, Valery Khamenya

Dataframe vs Dataset dilemma: either Row parsing or no filter push-down

2018-06-18 Thread Valery Khamenya

Hi Spark gurus, I was surprised to read here: https://stackoverflow.com/questions/50129411/why-is-predicate-pushdown-not-used-in-typed-dataset-api-vs-untyped-dataframe-ap that filters are not pushed down in typed Datasets and one should rather stick to Dataframes. But writing code for