Re: Current state of dataset api

Koert Kuipers Tue, 05 Oct 2021 07:13:20 -0700

the encoder api remains a pain point due to its lack of composability.
serialization overhead is also still there i believe. i dont remember what
has happened to the predicate pushdown issues, i think they are mostly
resolved?
we tend to use dataset api on our methods/interfaces where its fitting but
then switch to dataframe for the actual work.



On Mon, Oct 4, 2021 at 6:55 AM Magnus Nilsson <ma...@kth.se> wrote:

> Hi,
>
> I tried using the (typed) Dataset API about three years ago. Then
> there were limitations with predicate pushdown, overhead serialization
> and maybe more things I've forgotten. Ultimately we chose the
> Dataframe API as the sweet spot.
>
> Does anyone know of a good overview of the current state of the
> Dataset API, pros/cons as of Spark 3?
>
> Is it fully usable, do you get the advantages of a strongly typed
> dataframe? Any known limitations or drawbacks to take into account?
>
> br,
>
> Magnus
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
CONFIDENTIALITY NOTICE: This electronic communication and any files 
transmitted with it are confidential, privileged and intended solely for 
the use of the individual or entity to whom they are addressed. If you are 
not the intended recipient, you are hereby notified that any disclosure, 
copying, distribution (electronic or otherwise) or forwarding of, or the 
taking of any action in reliance on the contents of this transmission is 
strictly prohibited. Please notify the sender immediately by e-mail if you 
have received this email by mistake and delete this email from your system.


Is it necessary to print this email? If you care about the environment 
like we do, please refrain from printing emails. It helps to keep the 
environment forested and litter-free.

Re: Current state of dataset api

Reply via email to