jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073391100
########## r/src/expression.cpp: ########## @@ -46,13 +46,26 @@ std::shared_ptr<compute::Expression> compute___expr__call(std::string func_name, compute::call(std::move(func_name), std::move(arguments), std::move(options_ptr))); } +// [[arrow::export]] +bool compute___expr__is_field_ref(const std::shared_ptr<compute::Expression>& x) { + return x->field_ref() != nullptr; +} + // [[arrow::export]] std::vector<std::string> field_names_in_expression( const std::shared_ptr<compute::Expression>& x) { std::vector<std::string> out; + std::vector<arrow::FieldRef> nested; + auto field_refs = FieldsInExpression(*x); for (auto f : field_refs) { - out.push_back(*f.name()); + if (f.IsNested()) { + // We keep the top-level field name. Review Comment: You can also specify field refs (well, generic expressions), but then you also need to pass the resulting name for the schema. See the second Project signature at https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/cpp/src/arrow/dataset/scanner.h#L463-L484 which gets translated to ScanOptions.projection. It seems that is also what the R bindings actually do inside `ExecNode_Scan` (it will convert the materialized_field_names back to FieldRefs). Now, the scanner itself will also just use the top-level name of a nested field ref to do pruning of what it needs to read, so right now preserving the nested field ref is not useful. But ideally in the future we would optimize that for formats that can do that (like parquet, cfr https://github.com/apache/arrow/issues/33167) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org