GitHub user Yicong-Huang created a discussion: Use Polars to avoid data 
conversion from arrow to pandas

Currently our Tuple and Table in Python are implemented as pandas 
series/datafram, respectively. And we need to convert from arrow data to 
pandas, for processing UDF, then convert the results back to arrow to send back 
to jvm. 

[Polars](https://github.com/pola-rs/polars) is a data frame API support arrow 
as the data format (columnar). I think we could switch to Polars data frame 
(possibly as a drop in replace of pandas data frame), for the two benefits:
1. to avoid the two extra data conversions. 
2. to reduce processing speed, as Polars is written in Rust and benchmark shows 
its processing speed is faster than pandas.


GitHub link: https://github.com/apache/texera/discussions/4033

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to