[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20477


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20477#discussion_r166175748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType
  */
 case class DataSourceV2ScanExec(
 fullOutput: Seq[AttributeReference],
-@transient reader: DataSourceReader)
+@transient reader: DataSourceReader,
+@transient sourceClass: Class[_ <: DataSourceV2])
   extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {
 
   override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2ScanExec]
 
+  override def simpleString: String = s"Scan $metadataString"
--- End diff --

I've replied on that PR. I don't think overwriting `nodeName` is the right 
way to fix the UI issue, as we need to overwrite more methods. We can discuss 
more on that PR about this problem, but it should not block this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-02 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/20477#discussion_r165728696
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType
  */
 case class DataSourceV2ScanExec(
 fullOutput: Seq[AttributeReference],
-@transient reader: DataSourceReader)
+@transient reader: DataSourceReader,
+@transient sourceClass: Class[_ <: DataSourceV2])
   extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {
 
   override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2ScanExec]
 
+  override def simpleString: String = s"Scan $metadataString"
--- End diff --

+1 for overriding nodeName.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20477#discussion_r165726915
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceReaderHolder.scala
 ---
@@ -65,4 +73,23 @@ trait DataSourceReaderHolder {
   lazy val output: Seq[Attribute] = reader.readSchema().map(_.name).map { 
name =>
 fullOutput.find(_.name == name).get
   }
+
+  def metadataString: String = {
+val entries = scala.collection.mutable.ArrayBuffer.empty[(String, 
String)]
+if (filters.nonEmpty) entries += "PushedFilter" -> 
filters.mkString("[", ", ", "]")
+
+val outputStr = Utils.truncatedString(output, "[", ", ", "]")
+
+val entriesStr = if (entries.nonEmpty) {
+  Utils.truncatedString(entries.map {
+case (key, value) => key + ": " + 
StringUtils.abbreviate(redact(value), 100)
+  }, " (", ", ", ")")
+} else ""
--- End diff --

Nit. style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20477#discussion_r165726645
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType
  */
 case class DataSourceV2ScanExec(
 fullOutput: Seq[AttributeReference],
-@transient reader: DataSourceReader)
+@transient reader: DataSourceReader,
+@transient sourceClass: Class[_ <: DataSourceV2])
   extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {
 
   override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2ScanExec]
 
+  override def simpleString: String = s"Scan $metadataString"
--- End diff --

For your info, 
https://github.com/apache/spark/pull/20226/files#diff-3e1258979e16f72a829abb8a1cd88bda
 is also updating the output of the explain. Overriding the nodeName looks 
better for UI.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-01 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20477

[SPARK-23303][SQL] improve the explain result for data source v2 relations

## What changes were proposed in this pull request?

The current explain result for data source v2 relation is unreadable:
```
== Parsed Logical Plan ==
'Filter ('i > 6)
+- AnalysisBarrier
  +- Project [j#1]
 +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Filter (i#0 > 6)
   +- Project [j#1, i#0]
  +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Optimized Logical Plan ==
Project [j#1]
+- Filter isnotnull(i#0)
   +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Filter isnotnull(i#0)
   +- *(1) DataSourceV2Scan [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940
```

after this PR
```
== Parsed Logical Plan ==
'Project [unresolvedalias('j, None)]
+- AnalysisBarrier
  +- Relation SimpleDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Relation SimpleDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Project [j#1]
+- Relation SimpleDataSourceV2[i#0, j#1]

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Scan SimpleDataSourceV2[i#0, j#1]
```
---
```
== Parsed Logical Plan ==
'Filter ('i > 3)
+- AnalysisBarrier
  +- Relation AdvancedDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
i: int, j: int
Filter (i#0 > 3)
+- Relation AdvancedDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[i#0, j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), 
GreaterThan(i,3)])
```

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark explain

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20477






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org