the encoder api remains a pain point due to its lack of composability. serialization overhead is also still there i believe. i dont remember what has happened to the predicate pushdown issues, i think they are mostly resolved? we tend to use dataset api on our methods/interfaces where its fitting but then switch to dataframe for the actual work.
On Mon, Oct 4, 2021 at 6:55 AM Magnus Nilsson <ma...@kth.se> wrote: > Hi, > > I tried using the (typed) Dataset API about three years ago. Then > there were limitations with predicate pushdown, overhead serialization > and maybe more things I've forgotten. Ultimately we chose the > Dataframe API as the sweet spot. > > Does anyone know of a good overview of the current state of the > Dataset API, pros/cons as of Spark 3? > > Is it fully usable, do you get the advantages of a strongly typed > dataframe? Any known limitations or drawbacks to take into account? > > br, > > Magnus > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- CONFIDENTIALITY NOTICE: This electronic communication and any files transmitted with it are confidential, privileged and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution (electronic or otherwise) or forwarding of, or the taking of any action in reliance on the contents of this transmission is strictly prohibited. Please notify the sender immediately by e-mail if you have received this email by mistake and delete this email from your system. Is it necessary to print this email? If you care about the environment like we do, please refrain from printing emails. It helps to keep the environment forested and litter-free.