I think moving the Spark platform to dataframes makes a lot of sense.
Some newer operators, like reading parquet files, use the dataframes with
wrap/unwrapping to RDD. We just need to be sure if there is a mapping from each
of our operators to a Spark dataframe operator.
Is that what you meant Alex?
Best
--
Zoi
Στις Δευτέρα 15 Δεκεμβρίου 2025 στις 07:55:39 π.μ. GMT-6, ο χρήστης
Alexander Alten <[email protected]> έγραψε:
Hey team,
I have been working for a while on data frames support. Right now, I’m pretty
confident that a PR can be requested in the next few days. I noticed a lot of
scaffolding around RDDs to use DataFrames. Multiple existing operators already
wrap temporary DataFrames. I decided to refactor those too. The idea is that a
user sets a switch which is either RDD or DataFrames.
In-depth ParquetSource now carries a preferDatasetOutput flag with
`preferDatasetOutput(boolean)/isDatasetOutputPreferred()` so we can request
Dataset-backed execution from higher APIs.
Thoughts?
—Alex
--
Alexander Alten
CTO & co-founder
Scalytics - We Connect the World’s Data
Subscribe to our newsletter at LinkedIn
e: [email protected]
ln: www.linkedin.com/in/alexanderalten/
Book a meeting!
Disclaimer: Human written, please excuse typos.
--
*Scalytics Connect*
The foundation for secure, scalable, and transparent
AI.
www.scalytics.io <http://www.scalytics.io>
-- Please consider the
environment before printing this email --
Disclaimer:
The content of this
message is confidential. If you have received it by mistake, please inform
us by an email reply and then delete the message. It is forbidden to copy,
forward, or in any way reveal the contents of this message to anyone. The
integrity and security of this email cannot be guaranteed over the
Internet. Therefore, the sender will not be held liable for any damage
caused by the message.