drin commented on a change in pull request #9810:
URL: https://github.com/apache/arrow/pull/9810#discussion_r604466376



##########
File path: cpp/examples/arrow/dataset-documentation-example.cc
##########
@@ -217,24 +229,29 @@ std::shared_ptr<arrow::Table> SelectAndProjectDataset(
   auto scan_builder = dataset->NewScan().ValueOrDie();
   std::vector<std::string> names;
   std::vector<ds::Expression> exprs;
+  // Read all the original columns.
   for (const auto& field : dataset->schema()->fields()) {
     names.push_back(field->name());
     exprs.push_back(ds::field_ref(field->name()));
   }
+  // Also derive a new column.
   names.push_back("b_large");
   exprs.push_back(ds::greater(ds::field_ref("b"), ds::literal(1)));
   ABORT_ON_FAILURE(scan_builder->Project(exprs, names));

Review comment:
       From what I've seen, we've tried to do this directly on the 
ScannerBuilder:
   
   ```cpp
       std::vector<std::string>   projection_attrs;
       arrow::dataset::Expression selection_expr;
   
       ...
   
       ARROW_RETURN_NOT_OK(scanbuilder->Project(projection_attrs));
       ARROW_RETURN_NOT_OK(scanbuilder->Filter(selection_expr));
   
       ...
   ```
   
   I am looking at the code first, so I may just need to read the actual 
documentation included in this PR, but it may be nice to include a comment that 
references the documentation of why to prefer one approach to another or if 
they're equivalent mechanisms.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to