wjones127 commented on code in PR #4806:
URL: https://github.com/apache/arrow-rs/pull/4806#discussion_r1322224496
##########
arrow/src/pyarrow.rs:
##########
@@ -270,25 +270,13 @@ impl FromPyArrow for RecordBatch {
impl ToPyArrow for RecordBatch {
fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
- let mut py_arrays = vec![];
-
- let schema = self.schema();
- let columns = self.columns().iter();
-
- for array in columns {
- py_arrays.push(array.to_data().to_pyarrow(py)?);
- }
-
- let py_schema = schema.to_pyarrow(py)?;
-
- let module = py.import("pyarrow")?;
- let class = module.getattr("RecordBatch")?;
- let args = (py_arrays,);
- let kwargs = PyDict::new(py);
- kwargs.set_item("schema", py_schema)?;
Review Comment:
Basically, PyArrow considers there to be a mismatch if passed a schema with
an extension type but the arrays passed are all storage arrays.
I created an issue in PyArrow's tracker to fix this:
https://github.com/apache/arrow/issues/37669
In theory, this code should be fine, so we can consider this a workaround
for a bug in PyArrow 😁
##########
arrow/src/pyarrow.rs:
##########
@@ -270,25 +270,13 @@ impl FromPyArrow for RecordBatch {
impl ToPyArrow for RecordBatch {
fn to_pyarrow(&self, py: Python) -> PyResult<PyObject> {
- let mut py_arrays = vec![];
-
- let schema = self.schema();
- let columns = self.columns().iter();
-
- for array in columns {
- py_arrays.push(array.to_data().to_pyarrow(py)?);
- }
-
- let py_schema = schema.to_pyarrow(py)?;
-
- let module = py.import("pyarrow")?;
- let class = module.getattr("RecordBatch")?;
- let args = (py_arrays,);
- let kwargs = PyDict::new(py);
- kwargs.set_item("schema", py_schema)?;
Review Comment:
Basically, PyArrow considers there to be a mismatch if passed a schema with
an extension type but the arrays passed are all storage arrays.
I created an issue in PyArrow's tracker to fix this:
https://github.com/apache/arrow/issues/37669
In theory, this code should be fine, so we can consider this a workaround
for a bug in PyArrow 😁
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]