[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox


jorisvandenbossche commented on code in PR #19706:
URL: https://github.com/apache/arrow/pull/19706#discussion_r1073391100


##
r/src/expression.cpp:
##
@@ -46,13 +46,26 @@ std::shared_ptr 
compute___expr__call(std::string func_name,
   compute::call(std::move(func_name), std::move(arguments), 
std::move(options_ptr)));
 }
 
+// [[arrow::export]]
+bool compute___expr__is_field_ref(const std::shared_ptr& 
x) {
+  return x->field_ref() != nullptr;
+}
+
 // [[arrow::export]]
 std::vector field_names_in_expression(
 const std::shared_ptr& x) {
   std::vector out;
+  std::vector nested;
+
   auto field_refs = FieldsInExpression(*x);
   for (auto f : field_refs) {
-out.push_back(*f.name());
+if (f.IsNested()) {
+  // We keep the top-level field name.

Review Comment:
   You can also specify field refs (well, generic expressions), but then you 
also need to pass the resulting name for the schema. See the second Project 
signature at 
   
   
https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/cpp/src/arrow/dataset/scanner.h#L463-L484
   
   which gets translated to ScanOptions.projection. It seems that is also what 
the R bindings actually do inside `ExecNode_Scan` (it will convert the 
materialized_field_names back to FieldRefs). Now, the scanner itself will also 
just use the top-level name of a nested field ref to do pruning of what it 
needs to read, so right now preserving the nested field ref is not useful. But 
ideally in the future we would optimize that for formats that can do that (like 
parquet, cfr https://github.com/apache/arrow/issues/33167)
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox


jorisvandenbossche commented on code in PR #19706:
URL: https://github.com/apache/arrow/pull/19706#discussion_r1073377526


##
r/R/expression.R:
##
@@ -89,6 +92,56 @@ Expression$create <- function(function_name,
   expr
 }
 
+
+#' @export
+`[[.Expression` <- function(x, i, ...) {
+  # TODO: integer (positional) field refs are supported in C++
+  assert_that(is.string(i))
+  get_nested_field(x, i)
+}
+
+#' @export
+`$.Expression` <- function(x, name, ...) {
+  assert_that(is.string(name))
+  if (name %in% ls(x)) {
+get(name, x)
+  } else {
+get_nested_field(x, name)
+  }
+}
+
+get_nested_field <- function(expr, name) {
+  if (expr$is_field_ref()) {
+# Make a nested field ref
+out <- compute___expr__nested_field_ref(expr, name)
+  } else {
+# Use the struct_field kernel, but that only works if:
+# * expr has a knowable type (has a schema set)
+# * that type is struct
+# * `name` exists in the struct (bc we have to map to an integer position)

Review Comment:
   Opened https://github.com/apache/arrow/issues/33745



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox


jorisvandenbossche commented on code in PR #19706:
URL: https://github.com/apache/arrow/pull/19706#discussion_r1073352175


##
r/R/expression.R:
##
@@ -89,6 +92,56 @@ Expression$create <- function(function_name,
   expr
 }
 
+
+#' @export
+`[[.Expression` <- function(x, i, ...) {
+  # TODO: integer (positional) field refs are supported in C++
+  assert_that(is.string(i))
+  get_nested_field(x, i)
+}
+
+#' @export
+`$.Expression` <- function(x, name, ...) {
+  assert_that(is.string(name))
+  if (name %in% ls(x)) {
+get(name, x)
+  } else {
+get_nested_field(x, name)
+  }
+}
+
+get_nested_field <- function(expr, name) {
+  if (expr$is_field_ref()) {
+# Make a nested field ref
+out <- compute___expr__nested_field_ref(expr, name)
+  } else {
+# Use the struct_field kernel, but that only works if:
+# * expr has a knowable type (has a schema set)
+# * that type is struct
+# * `name` exists in the struct (bc we have to map to an integer position)

Review Comment:
   Ah, yes, that documentation is indeed outdated (wasn't aware it's described 
in such detail there, so we didn't update that when updating the kernel in 
https://github.com/apache/arrow/pull/14495)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-17 Thread GitBox


jorisvandenbossche commented on code in PR #19706:
URL: https://github.com/apache/arrow/pull/19706#discussion_r1068333553


##
r/R/expression.R:
##
@@ -89,6 +92,56 @@ Expression$create <- function(function_name,
   expr
 }
 
+
+#' @export
+`[[.Expression` <- function(x, i, ...) {
+  # TODO: integer (positional) field refs are supported in C++
+  assert_that(is.string(i))
+  get_nested_field(x, i)
+}
+
+#' @export
+`$.Expression` <- function(x, name, ...) {
+  assert_that(is.string(name))
+  if (name %in% ls(x)) {
+get(name, x)
+  } else {
+get_nested_field(x, name)
+  }
+}
+
+get_nested_field <- function(expr, name) {
+  if (expr$is_field_ref()) {
+# Make a nested field ref
+out <- compute___expr__nested_field_ref(expr, name)
+  } else {
+# Use the struct_field kernel, but that only works if:
+# * expr has a knowable type (has a schema set)
+# * that type is struct
+# * `name` exists in the struct (bc we have to map to an integer position)

Review Comment:
   FYI, nowadays it shouldn't be need to map it to an integer position, the 
"struct_field" kernel now also accepts a string name field ref



##
r/src/expression.cpp:
##
@@ -46,13 +46,26 @@ std::shared_ptr 
compute___expr__call(std::string func_name,
   compute::call(std::move(func_name), std::move(arguments), 
std::move(options_ptr)));
 }
 
+// [[arrow::export]]
+bool compute___expr__is_field_ref(const std::shared_ptr& 
x) {
+  return x->field_ref() != nullptr;
+}
+
 // [[arrow::export]]
 std::vector field_names_in_expression(
 const std::shared_ptr& x) {
   std::vector out;
+  std::vector nested;
+
   auto field_refs = FieldsInExpression(*x);
   for (auto f : field_refs) {
-out.push_back(*f.name());
+if (f.IsNested()) {
+  // We keep the top-level field name.

Review Comment:
   This might not be used in practice (in a `mutate` call where you select the 
field, you also directly specify the resulting column name), but otherwise it 
might also make sense to keep the innermost field name?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org