[GitHub] [spark] huonw commented on a change in pull request #24414: [SPARK-22044][SQL] Add `cost` and `codegen` arguments to `explain`

GitBox Wed, 24 Apr 2019 22:24:28 -0700

huonw commented on a change in pull request #24414: [SPARK-22044][SQL] Add 
`cost` and `codegen` arguments to `explain`
URL: https://github.com/apache/spark/pull/24414#discussion_r278398094


 ##########
 File path: R/pkg/R/DataFrame.R
 ##########
 @@ -147,19 +155,16 @@ setMethod("schema",
 #' sparkR.session()
 #' path <- "path/to/file.json"
 #' df <- read.json(path)
-#' explain(df, TRUE)
+#' explain(df)
+#' explain(df, extended = TRUE)
+#' explain(df, codegen = TRUE)
+#' explain(df, cost = TRUE)
 #'}
 #' @note explain since 1.4.0
 setMethod("explain",
           signature(x = "SparkDataFrame"),
-          function(x, extended = FALSE) {
-            queryExec <- callJMethod(x@sdf, "queryExecution")
-            if (extended) {
-              cat(callJMethod(queryExec, "toString"))
-            } else {
-              execPlan <- callJMethod(queryExec, "executedPlan")
-              cat(callJMethod(execPlan, "toString"))
-            }
+          function(x, extended = FALSE, codegen = FALSE, cost = FALSE) {
 
 Review comment:
   > does this change the result (by default when extended = FALSE, codegen = 
FALSE, cost = FALSE) from before?
   
   Yes,  but it changes it to match the output of Scala Spark's `.explain` and 
SQL's `EXPLAIN ...`. For instance, given a file `test.json` that contains:
   
   ```json
   {"a": 1,"b":1.2}
   {"a": 2,"b":3.4}
   {"a": 3,"b":4.5}
   ```
   
   2.4:
   
   ```
   > explain(read.json("/tmp/test.json"))
   *(1) FileScan json [a#24L,b#25] Batched: false, Format: JSON, Location: 
InMemoryFileIndex[file:/private/tmp/test.json], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<a:bigint,b:double>```
   ```
   
   This PR:
   
   ```
   > explain(read.json("/tmp/test.json"))
   == Physical Plan ==
   *(1) FileScan json [a#37L,b#38] Batched: false, DataFilters: [], Format: 
JSON, Location: InMemoryFileIndex[file:/private/tmp/test.json], 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:double>
   ```
   
   Scala(for reference). This is on `master`, but 2.4 is similar:
   
   ```
   scala> spark.read.json("/tmp/test.json").explain
   == Physical Plan ==
   *(1) FileScan json [a#19L,b#20] Batched: false, DataFilters: [], Format: 
JSON, Location: InMemoryFileIndex[file:/tmp/test.json], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct<a:bigint,b:double>
   ```
   
   > can you check / test explain(df, TRUE) if the same as explain(df, extended 
= TRUE)
    
   I added a test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huonw commented on a change in pull request #24414: [SPARK-22044][SQL] Add `cost` and `codegen` arguments to `explain`

Reply via email to