Hey team, I have been working for a while on data frames support. Right now, I’m pretty confident that a PR can be requested in the next few days. I noticed a lot of scaffolding around RDDs to use DataFrames. Multiple existing operators already wrap temporary DataFrames. I decided to refactor those too. The idea is that a user sets a switch which is either RDD or DataFrames.
In-depth ParquetSource now carries a preferDatasetOutput flag with `preferDatasetOutput(boolean)/isDatasetOutputPreferred()` so we can request Dataset-backed execution from higher APIs. Thoughts? —Alex -- Alexander Alten CTO & co-founder Scalytics - We Connect the World’s Data Subscribe to our newsletter at LinkedIn e: [email protected] ln: www.linkedin.com/in/alexanderalten/ Book a meeting! Disclaimer: Human written, please excuse typos. -- *Scalytics Connect* The foundation for secure, scalable, and transparent AI. www.scalytics.io <http://www.scalytics.io> -- Please consider the environment before printing this email -- Disclaimer: The content of this message is confidential. If you have received it by mistake, please inform us by an email reply and then delete the message. It is forbidden to copy, forward, or in any way reveal the contents of this message to anyone. The integrity and security of this email cannot be guaranteed over the Internet. Therefore, the sender will not be held liable for any damage caused by the message.
