Copilot commented on code in PR #41870:
URL: https://github.com/apache/arrow/pull/41870#discussion_r3332535436


##########
cpp/src/arrow/tensor_benchmark.cc:
##########
@@ -51,6 +52,34 @@ static void BatchToTensorSimple(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * ty->byte_width() * num_rows * 
num_cols);
 }
 
+template <typename ValueType, bool row_major>
+static void TableToTensorSimple(benchmark::State& state) {
+  using CType = typename ValueType::c_type;
+  std::shared_ptr<DataType> ty = TypeTraits<ValueType>::type_singleton();
+
+  const int64_t num_cols = state.range(1);
+  const int64_t num_rows = state.range(0) / num_cols / sizeof(CType);
+  arrow::random::RandomArrayGenerator gen_{42};
+
+  std::vector<std::shared_ptr<Field>> fields = {};
+  std::vector<std::shared_ptr<ChunkedArray>> columns = {};
+
+  for (int64_t i = 0; i < num_cols; ++i) {
+    fields.push_back(field("f" + std::to_string(i), ty));
+    ArrayVector arrays = {gen_.ArrayOf(ty, num_rows / 2), gen_.ArrayOf(ty, 
num_rows / 2)};
+    auto chunks = std::make_shared<ChunkedArray>(arrays, ty);
+    columns.push_back(chunks);
+  }
+  auto schema = std::make_shared<Schema>(std::move(fields));
+  auto table = Table::Make(schema, columns);
+
+  for (auto _ : state) {
+    ASSERT_OK_AND_ASSIGN(auto tensor, 
table->ToTensor(/*row_major=*/row_major));

Review Comment:
   `Table::ToTensor` takes `(null_to_nan, row_major, pool)`, so passing a 
single boolean argument here sets `null_to_nan` (and leaves `row_major` at its 
default `true`). This makes the `row_major=false` benchmark still run 
row-major, and `row_major=true` also enables `null_to_nan` (which can promote 
integer output to float and skew results).



##########
cpp/src/arrow/tensor_benchmark.cc:
##########
@@ -51,6 +52,34 @@ static void BatchToTensorSimple(benchmark::State& state) {
   state.SetBytesProcessed(state.iterations() * ty->byte_width() * num_rows * 
num_cols);
 }
 
+template <typename ValueType, bool row_major>
+static void TableToTensorSimple(benchmark::State& state) {
+  using CType = typename ValueType::c_type;
+  std::shared_ptr<DataType> ty = TypeTraits<ValueType>::type_singleton();
+
+  const int64_t num_cols = state.range(1);
+  const int64_t num_rows = state.range(0) / num_cols / sizeof(CType);
+  arrow::random::RandomArrayGenerator gen_{42};
+
+  std::vector<std::shared_ptr<Field>> fields = {};
+  std::vector<std::shared_ptr<ChunkedArray>> columns = {};
+
+  for (int64_t i = 0; i < num_cols; ++i) {
+    fields.push_back(field("f" + std::to_string(i), ty));
+    ArrayVector arrays = {gen_.ArrayOf(ty, num_rows / 2), gen_.ArrayOf(ty, 
num_rows / 2)};

Review Comment:
   The table is constructed with two chunks of length `num_rows / 2`, which 
drops a row when `num_rows` is odd (and makes the table's actual row count 
smaller than `num_rows`, skewing the benchmark counters and possibly triggering 
validation issues). Split the remainder into the second chunk so the chunks sum 
to `num_rows`.



##########
python/pyarrow/tests/test_table.py:
##########
@@ -1269,6 +1276,70 @@ def test_recordbatch_to_tensor_unsupported():
         batch.to_tensor()
 
 
[email protected]
[email protected]('typ_str', [
+    "uint8", "uint16", "uint32", "uint64",
+    "int8", "int16", "int32", "int64",
+    "float32", "float64",
+])

Review Comment:
   `Table.to_tensor` supports `float16` (uniform) via the new C++ 
implementation, but this parametrized test omits `float16`, so that dtype path 
isn't exercised for chunked Tables (including slice/offset cases covered 
below). Adding `float16` here would improve coverage of the newly added API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to