Re: [PR] Store example data directly inside the datafusion-examples (#19141) [datafusion]

via GitHub Sun, 28 Dec 2025 13:02:41 -0800


alamb commented on code in PR #19319:
URL: https://github.com/apache/datafusion/pull/19319#discussion_r2649929217



##########
datafusion-examples/examples/data_io/parquet_exec_visitor.rs:
##########
@@ -29,23 +31,47 @@ use datafusion::physical_plan::metrics::MetricValue;
 use datafusion::physical_plan::{
     ExecutionPlan, ExecutionPlanVisitor, execute_stream, visit_execution_plan,
 };
+use datafusion::prelude::CsvReadOptions;
 use futures::StreamExt;
+use tempfile::TempDir;
+use tokio::fs::create_dir_all;
 
 /// Example of collecting metrics after execution by visiting the 
`ExecutionPlan`
 pub async fn parquet_exec_visitor() -> datafusion::common::Result<()> {
     let ctx = SessionContext::new();
 
-    let test_data = datafusion::test_util::parquet_test_data();
+    // Load CSV into an in-memory DataFrame, then materialize it to Parquet.

Review Comment:
   I think this repeated  code fragment to write out a parquet file gets in the 
way of the example -- it is like 20 lines of setup that is unrelated to what 
the example is trying to show and I fear that it will be confusing for first 
time users (imagine if this is the first exposure to datafusion)
   
   Could we move this into a function (something like `fn write_csv_to_parquet` 
for example?) I think it is ok to have the code replicated (and thus the 
examples be self contained) but not inline like this
   
   I am sorry I have been away for a few days and I haven't been able to give 
you more timeley feedback



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Store example data directly inside the datafusion-examples (#19141) [datafusion]

Reply via email to