[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23018


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23018#discussion_r232960406
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +471,21 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val writer = new StringBuilderWriter()
+try {
+  treeString(writer, verbose, addSuffix, None)
+  writer.toString
+} finally {
+  writer.close()
+}
+  }
+
+  def treeString(
+  writer: Writer,
+  verbose: Boolean,
+  addSuffix: Boolean,
+  maxFields: Option[Int]): Unit = {
+generateTreeString(0, Nil, writer, verbose, "", addSuffix)
--- End diff --

I would prefer to avoid overcomplicating the PR again, frankly speaking.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23018#discussion_r232883084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +471,21 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val writer = new StringBuilderWriter()
+try {
+  treeString(writer, verbose, addSuffix, None)
+  writer.toString
+} finally {
+  writer.close()
+}
+  }
+
+  def treeString(
+  writer: Writer,
+  verbose: Boolean,
+  addSuffix: Boolean,
+  maxFields: Option[Int]): Unit = {
+generateTreeString(0, Nil, writer, verbose, "", addSuffix)
--- End diff --

If #22879 is merged first, we should add that function here. If this one is 
merged first, that PR better have the function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-12 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/23018#discussion_r232871356
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +471,21 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val writer = new StringBuilderWriter()
+try {
+  treeString(writer, verbose, addSuffix, None)
+  writer.toString
+} finally {
+  writer.close()
+}
+  }
+
+  def treeString(
+  writer: Writer,
+  verbose: Boolean,
+  addSuffix: Boolean,
+  maxFields: Option[Int]): Unit = {
+generateTreeString(0, Nil, writer, verbose, "", addSuffix)
--- End diff --

How about add another function only save `nodeName`? I'll use it in another 
PR: https://github.com/apache/spark/pull/22879


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-12 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/23018

[SPARK-26023][SQL] Dumping truncated plans and generated code to a file

## What changes were proposed in this pull request?

In the PR, I propose new method for debugging queries by dumping info about 
their execution to a file. It saves logical, optimized and physical plan 
similar to the `explain()` method + generated code. One of the advantages of 
the method over `explain` is it does not materializes full output as one string 
in memory which can cause OOMs.

## How was this patch tested?

Added a few tests to `QueryExecutionSuite` to check positive and negative 
scenarios.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 truncated-plan-to-file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23018


commit 090e8c25969f8e51d71c135c80f8e43f5c5101f9
Author: Maxim Gekk 
Date:   2018-11-12T21:28:45Z

Dumping truncated plans to a file




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org