Re: [PR] GH-40062: [C++][Python] Conversion of Table to Arrow Tensor [arrow]

via GitHub Mon, 01 Jun 2026 05:50:43 -0700


Copilot commented on code in PR #41870:
URL: https://github.com/apache/arrow/pull/41870#discussion_r3334236002



##########
python/pyarrow/table.pxi:
##########
@@ -5084,6 +5080,81 @@ cdef class Table(_Tabular):
 
         return result
 
+    def to_tensor(self, c_bool null_to_nan=False, c_bool row_major=True, 
MemoryPool memory_pool=None):
+        """
+        Convert to a :class:`~pyarrow.Tensor`.
+
+        Tables that can be converted have fields of type signed or unsigned 
integer or float,
+        including all bit-widths.
+
+        ``null_to_nan`` is ``False`` by default and this method will raise an 
error in case
+        any nulls are present. Tables with nulls can be converted with 
``null_to_nan`` set to
+        ``True``. In this case null values are converted to ``NaN`` and 
integer type arrays are
+        promoted to the appropriate float type.
+
+        Parameters
+        ----------
+        null_to_nan : bool, default False
+            Whether to write null values in the result as ``NaN``.
+        row_major : bool, default True
+            Whether resulting Tensor is row-major or column-major
+        memory_pool : MemoryPool, default None
+            For memory allocations, if required, otherwise use default pool
+
+        Examples
+        --------
+        >>> import pyarrow as pa
+        >>> table = pa.table(
+        ...    [
+        ...       pa.chunked_array([[1, 2], [3, 4, None]], type=pa.int32()),
+        ...       pa.chunked_array([[10, 20, 30], [40, None]], 
type=pa.float32()),
+        ...    ], names = ["a", "b"]
+        ... )
+
+        >>> table
+        pyarrow.Table
+        a: int32
+        b: float
+        ----
+        a: [[1,2],[3,4,null]]
+        b: [[10,20,30],[40,null]]
+
+        Convert a Table to row-major Tensor with null values written as 
``NaN``:
+
+        >>> table.to_tensor(null_to_nan=True)
+        <pyarrow.Tensor>
+        type: double
+        shape: (5, 2)
+        strides: (16, 8)
+        >>> table.to_tensor(null_to_nan=True).to_numpy()
+        array([[ 1., 10.],
+               [ 2., 20.],
+               [ 3., 30.],
+               [ 4., 40.],
+               [nan, nan]])
+
+        Convert a Table to column-major Tensor
+
+        >>> table.to_tensor(null_to_nan=True, row_major=False)
+        <pyarrow.Tensor>
+        type: double
+        shape: (5, 2)
+        strides: (8, 40)
+        >>> table.to_tensor(null_to_nan=True, row_major=False).to_numpy()
+        array([[ 1., 10.],
+               [ 2., 20.],
+               [ 3., 30.],
+               [ 4., 40.],
+               [nan, nan]])
+        """
+        cdef:
+            shared_ptr[CTensor] c_tensor
+            CMemoryPool* pool = maybe_unbox_memory_pool(memory_pool)
+

Review Comment:
   `Table.to_tensor()` should call `self._assert_cpu()` before invoking the C++ 
conversion, consistent with other Table methods (e.g. `flatten`, 
`combine_chunks`) and with `RecordBatch.to_tensor()`. Without this, calling 
`to_tensor()` on non-CPU-backed data (device allocations) can lead to undefined 
behavior rather than a clear error.



##########
cpp/src/arrow/table.cc:
##########
@@ -36,10 +36,12 @@
 #include "arrow/record_batch.h"
 #include "arrow/result.h"
 #include "arrow/status.h"
+#include "arrow/tensor.h"
 #include "arrow/type.h"
 #include "arrow/type_fwd.h"
 #include "arrow/type_traits.h"
 #include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
 #include "arrow/util/logging_internal.h"
 #include "arrow/util/vector.h"

Review Comment:
   `#include "arrow/util/logging.h"` was added here but isn't used anywhere in 
this file (and `arrow/util/logging_internal.h` is already included). Dropping 
the unused include reduces compile work and avoids potential 
include-order/style issues.



##########
cpp/src/arrow/tensor.cc:
##########
@@ -28,8 +28,8 @@
 #include <type_traits>
 #include <vector>
 
-#include "arrow/record_batch.h"
 #include "arrow/status.h"
+#include "arrow/table.h"
 #include "arrow/type.h"

Review Comment:
   `tensor.cc` instantiates `ToTensorImpl` for `RecordBatch` and calls 
`RecordBatch` members (e.g. `num_columns()`, `column_data()`, `schema()`), but 
the translation unit no longer includes `arrow/record_batch.h` (it was removed 
in this PR). With only a forward declaration available, this will fail to 
compile when `RecordBatchToTensor` instantiates the template. Add the missing 
include.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-40062: [C++][Python] Conversion of Table to Arrow Tensor [arrow]

Reply via email to