westonpace commented on code in PR #14024:
URL: https://github.com/apache/arrow/pull/14024#discussion_r963800613


##########
cpp/src/arrow/engine/substrait/util.cc:
##########
@@ -107,19 +111,36 @@ class SubstraitExecutor {
   bool plan_started_;
   compute::ExecContext exec_context_;
   std::shared_ptr<SubstraitSinkConsumer> sink_consumer_;
+  const ConversionOptions& conversion_options_;
 };
 
 }  // namespace
 
 Result<std::shared_ptr<RecordBatchReader>> ExecuteSerializedPlan(
-    const Buffer& substrait_buffer, const ExtensionIdRegistry* extid_registry,
-    compute::FunctionRegistry* func_registry) {
+    const Buffer& substrait_buffer, const PythonTableProvider& table_provider,
+    const ExtensionIdRegistry* registry, compute::FunctionRegistry* 
func_registry) {

Review Comment:
   ```suggestion
       const Buffer& substrait_buffer,
       const ExtensionIdRegistry* registry, compute::FunctionRegistry* 
func_registry, const ConversionOptions& conversion_options = {}) {
   ```
   
   I suspect it will only be a matter of time before we want to specify other 
conversion options.  Also, by defaulting to `{}` we can avoid changing existing 
tests that are unrelated to the table provider.



##########
python/pyarrow/_substrait.pyx:
##########
@@ -25,14 +26,84 @@ from pyarrow.includes.libarrow cimport *
 from pyarrow.includes.libarrow_substrait cimport *
 
 
-def run_query(plan):
+cdef shared_ptr[CTable] _process_named_table(dict named_args, const 
std_vector[c_string]& names):
+    cdef:
+        c_string c_name
+        shared_ptr[CTable] empty_table
+    py_names = []
+
+    # provider function could raise an exception
+    try:
+        for i in range(names.size()):
+            c_name = names[i]
+            py_names.append(frombytes(c_name))
+        return pyarrow_unwrap_table(named_args["provider"](py_names))
+    except Exception:
+        return empty_table

Review Comment:
   Actually, no, not quite.  I'm still not sure why we are catching the 
exception and rethrowing.  Wouldn't it be better to just not catch it at all?



##########
cpp/src/arrow/engine/substrait/serde_test.cc:
##########
@@ -1000,15 +1000,17 @@ TEST(Substrait, GetRecordBatchReader) {
   GTEST_SKIP() << "ARROW-16392: Substrait File URI not supported for Windows";
 #else
   ASSERT_OK_AND_ASSIGN(std::string substrait_json, GetSubstraitJSON());
-  test_with_registries([&substrait_json](ExtensionIdRegistry* ext_id_reg,
-                                         compute::FunctionRegistry* 
func_registry) {
-    ASSERT_OK_AND_ASSIGN(auto buf, SerializeJsonPlan(substrait_json));
-    ASSERT_OK_AND_ASSIGN(auto reader, ExecuteSerializedPlan(*buf));
-    ASSERT_OK_AND_ASSIGN(auto table, 
Table::FromRecordBatchReader(reader.get()));
-    // Note: assuming the binary.parquet file contains fixed amount of records
-    // in case of a test failure, re-evalaute the content in the file
-    EXPECT_EQ(table->num_rows(), 12);
-  });
+  PythonTableProvider table_provider;
+  test_with_registries(
+      [&substrait_json, table_provider](ExtensionIdRegistry* ext_id_reg,
+                                        compute::FunctionRegistry* 
func_registry) {
+        ASSERT_OK_AND_ASSIGN(auto buf, SerializeJsonPlan(substrait_json));
+        ASSERT_OK_AND_ASSIGN(auto reader, ExecuteSerializedPlan(*buf, 
table_provider));
+        ASSERT_OK_AND_ASSIGN(auto table, 
Table::FromRecordBatchReader(reader.get()));
+        // Note: assuming the binary.parquet file contains fixed amount of 
records
+        // in case of a test failure, re-evalaute the content in the file
+        EXPECT_EQ(table->num_rows(), 12);
+      });

Review Comment:
   ```suggestion
     test_with_registries(
         [&substrait_json](ExtensionIdRegistry* ext_id_reg,
                                     compute::FunctionRegistry* func_registry) {
           ASSERT_OK_AND_ASSIGN(auto buf, SerializeJsonPlan(substrait_json));
           ASSERT_OK_AND_ASSIGN(auto reader, ExecuteSerializedPlan(*buf, {}));
           ASSERT_OK_AND_ASSIGN(auto table, 
Table::FromRecordBatchReader(reader.get()));
           // Note: assuming the binary.parquet file contains fixed amount of 
records
           // in case of a test failure, re-evalaute the content in the file
           EXPECT_EQ(table->num_rows(), 12);
         });
   ```
   Although...I don't think this is the best change.  We should make 
`table_provider` an optional argument to `ExecuteSerializedPlan`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to