[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073391100 ## r/src/expression.cpp: ## @@ -46,13 +46,26 @@ std::shared_ptr compute___expr__call(std::string func_name, compute::call(std::move(func_name), std::move(arguments), std::move(options_ptr))); } +// [[arrow::export]] +bool compute___expr__is_field_ref(const std::shared_ptr& x) { + return x->field_ref() != nullptr; +} + // [[arrow::export]] std::vector field_names_in_expression( const std::shared_ptr& x) { std::vector out; + std::vector nested; + auto field_refs = FieldsInExpression(*x); for (auto f : field_refs) { -out.push_back(*f.name()); +if (f.IsNested()) { + // We keep the top-level field name. Review Comment: You can also specify field refs (well, generic expressions), but then you also need to pass the resulting name for the schema. See the second Project signature at https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/cpp/src/arrow/dataset/scanner.h#L463-L484 which gets translated to ScanOptions.projection. It seems that is also what the R bindings actually do inside `ExecNode_Scan` (it will convert the materialized_field_names back to FieldRefs). Now, the scanner itself will also just use the top-level name of a nested field ref to do pruning of what it needs to read, so right now preserving the nested field ref is not useful. But ideally in the future we would optimize that for formats that can do that (like parquet, cfr https://github.com/apache/arrow/issues/33167) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073377526 ## r/R/expression.R: ## @@ -89,6 +92,56 @@ Expression$create <- function(function_name, expr } + +#' @export +`[[.Expression` <- function(x, i, ...) { + # TODO: integer (positional) field refs are supported in C++ + assert_that(is.string(i)) + get_nested_field(x, i) +} + +#' @export +`$.Expression` <- function(x, name, ...) { + assert_that(is.string(name)) + if (name %in% ls(x)) { +get(name, x) + } else { +get_nested_field(x, name) + } +} + +get_nested_field <- function(expr, name) { + if (expr$is_field_ref()) { +# Make a nested field ref +out <- compute___expr__nested_field_ref(expr, name) + } else { +# Use the struct_field kernel, but that only works if: +# * expr has a knowable type (has a schema set) +# * that type is struct +# * `name` exists in the struct (bc we have to map to an integer position) Review Comment: Opened https://github.com/apache/arrow/issues/33745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073352175 ## r/R/expression.R: ## @@ -89,6 +92,56 @@ Expression$create <- function(function_name, expr } + +#' @export +`[[.Expression` <- function(x, i, ...) { + # TODO: integer (positional) field refs are supported in C++ + assert_that(is.string(i)) + get_nested_field(x, i) +} + +#' @export +`$.Expression` <- function(x, name, ...) { + assert_that(is.string(name)) + if (name %in% ls(x)) { +get(name, x) + } else { +get_nested_field(x, name) + } +} + +get_nested_field <- function(expr, name) { + if (expr$is_field_ref()) { +# Make a nested field ref +out <- compute___expr__nested_field_ref(expr, name) + } else { +# Use the struct_field kernel, but that only works if: +# * expr has a knowable type (has a schema set) +# * that type is struct +# * `name` exists in the struct (bc we have to map to an integer position) Review Comment: Ah, yes, that documentation is indeed outdated (wasn't aware it's described in such detail there, so we didn't update that when updating the kernel in https://github.com/apache/arrow/pull/14495) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1068333553 ## r/R/expression.R: ## @@ -89,6 +92,56 @@ Expression$create <- function(function_name, expr } + +#' @export +`[[.Expression` <- function(x, i, ...) { + # TODO: integer (positional) field refs are supported in C++ + assert_that(is.string(i)) + get_nested_field(x, i) +} + +#' @export +`$.Expression` <- function(x, name, ...) { + assert_that(is.string(name)) + if (name %in% ls(x)) { +get(name, x) + } else { +get_nested_field(x, name) + } +} + +get_nested_field <- function(expr, name) { + if (expr$is_field_ref()) { +# Make a nested field ref +out <- compute___expr__nested_field_ref(expr, name) + } else { +# Use the struct_field kernel, but that only works if: +# * expr has a knowable type (has a schema set) +# * that type is struct +# * `name` exists in the struct (bc we have to map to an integer position) Review Comment: FYI, nowadays it shouldn't be need to map it to an integer position, the "struct_field" kernel now also accepts a string name field ref ## r/src/expression.cpp: ## @@ -46,13 +46,26 @@ std::shared_ptr compute___expr__call(std::string func_name, compute::call(std::move(func_name), std::move(arguments), std::move(options_ptr))); } +// [[arrow::export]] +bool compute___expr__is_field_ref(const std::shared_ptr& x) { + return x->field_ref() != nullptr; +} + // [[arrow::export]] std::vector field_names_in_expression( const std::shared_ptr& x) { std::vector out; + std::vector nested; + auto field_refs = FieldsInExpression(*x); for (auto f : field_refs) { -out.push_back(*f.name()); +if (f.IsNested()) { + // We keep the top-level field name. Review Comment: This might not be used in practice (in a `mutate` call where you select the field, you also directly specify the resulting column name), but otherwise it might also make sense to keep the innermost field name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org