dragosmg commented on code in PR #13541:
URL: https://github.com/apache/arrow/pull/13541#discussion_r919899240
##########
r/R/dplyr.R:
##########
@@ -219,6 +219,31 @@ tail.arrow_dplyr_query <- function(x, n = 6L, ...) {
x
}
+#' Show the details of an Arrow Execution Plan
+#'
+#' This is a function which gives more details about the Execution Plan
(`ExecPlan`)
+#' of an `arrow_dplyr_query` object. It is similar to `dplyr::explain()`.
+#'
+#' @param x an `arrow_dplyr_query` to print the `ExecPlan` for.
+#'
+#' @return The argument, invisibly.
+#' @export
+#'
+#' @examplesIf arrow_with_dataset() & requireNamespace("dplyr", quietly = TRUE)
+#' library(dplyr)
+#' mtcars %>%
+#' arrow_table() %>%
+#' filter(mpg > 20) %>%
+#' mutate(x = gear/carb) %>%
+#' show_exec_plan()
+show_exec_plan <- function(x) {
Review Comment:
Here's a link to the [design
doc](https://docs.google.com/document/d/1Ep8aV4jDsNCCy9uv1bjWY_JF17nzHQogv0EnGJvraQI/edit),
also linked in the PR description and the Jira ticket.
In short, in my opinion, a _first step_ would be to have `show_exec_plan()`
as a function that outputs exactly what we get from C++. That's what I
interpreted the scope of the
[ticket](https://issues.apache.org/jira/browse/ARROW-15016) to be.
My proposal rests on this point-of-view: I think we need, for accessibility
reasons, 2 separate ways of surfacing details regarding the ExecPlan, for 2
separate audiences (one readable by the seasoned Arrow developer and one
readable by the regular R user).
1. `show_exec_plan()` would cater to the first audience and, thus, focus on
minimising cognitive load by keeping things the same across languages.
2. `explain()` should probably not aim to be both and would be targeted
towards the second audience and do it well. By _well_ I understand here as in a
way that makes sense to the regular R user, who expects arrow to just work. In
this context, `explain()` would be a tool allowing them to inspect what is
going on in a dplyr-like pipeline, but do so in a language they are familiar
with.
Having an `explain()` method would absolutely be super useful, but I do not
think that is fully scoped out yet. Probably, before we start the
implementation of `explain()`, we should flesh out what the output of
`explain()` will look like. I strongly believe we should start from the
description of the generic which says: "gives more details about an object
(...) and is more focused on _human readable_ output" (my underlining).
In this PR there are already some really good suggestions of stuff we could
include in the output of a function like `explain()` (e.g. to cover
`query_can_stream()`).
If we add to the above the time constraint of the impending release, I do
not think extending this PR to cover `explain()` would be the best course of
action.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]