Re: Spark Data Frames support

Zoi Kaoudi via dev Mon, 15 Dec 2025 08:09:05 -0800

 I think moving the Spark platform to dataframes makes a lot of sense.
Some newer operators, like reading parquet files, use the dataframes with 
wrap/unwrapping to RDD. We just need to be sure if there is a mapping from each 
of our operators to a Spark dataframe operator. 
Is that what you meant Alex?
Best
--
Zoi
    Στις Δευτέρα 15 Δεκεμβρίου 2025 στις 07:55:39 π.μ. GMT-6, ο χρήστης 
Alexander Alten <[email protected]> έγραψε:  
 
 Hey team,


I have been working for a while on data frames support. Right now, I’m pretty 
confident that a PR can be requested in the next few days. I noticed a lot of 
scaffolding around RDDs to use DataFrames. Multiple existing operators already 
wrap temporary DataFrames. I decided to refactor those too. The idea is that a 
user sets a switch which is either RDD or DataFrames. 

In-depth ParquetSource now carries a preferDatasetOutput flag with 
`preferDatasetOutput(boolean)/isDatasetOutputPreferred()` so we can request 
Dataset-backed execution from higher APIs.

Thoughts?

—Alex 

--
Alexander Alten
CTO & co-founder
Scalytics - We Connect the World’s Data

Subscribe to our newsletter at LinkedIn

e: [email protected]
ln: www.linkedin.com/in/alexanderalten/‬ 
Book a meeting!

Disclaimer: Human written, please excuse typos.


-- 
*Scalytics Connect*
The foundation for secure, scalable, and transparent 
AI.
www.scalytics.io <http://www.scalytics.io>

--  Please consider the 
environment before printing this email --

Disclaimer:
The content of this 
message is confidential. If you have received it by mistake, please inform 
us by an email reply and then delete the message. It is forbidden to copy, 
forward, or in any way reveal the contents of this message to anyone. The 
integrity and security of this email cannot be guaranteed over the 
Internet. Therefore, the sender will not be held liable for any damage 
caused by the message.

Re: Spark Data Frames support

Reply via email to