[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-28 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
Closing this pull request in favor of #23169


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-19 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23076
  
> I assume we still want a config parameter for maximum size.  Any 
suggestions on name: spark.debug.maxPlanStringSizeInMB?

I would propose to make it more SQL specific, and put the config to the 
`spark.sql.debug` namespace. Especially in the light of moving 
`truncatedString` to `sql/catalyst` (see #23039 )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-19 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
Limiting on pure size makes sense.  I'd be willing to implement that in a
different PR.  Now that we are using a writer it should be pretty easy to do
that.  That was harder in the Spark 2.4 branch which is why I took the
approach I did.

 

A few questions about that:

*   I assume we still want a config parameter for maximum size.  Any
suggestions on name: spark.debug.maxPlanStringSizeInMB?
*   Is there a way to represent byte sizes in config?
*   Should the limit be on each element of the plan or the overall plan?
Logical Plan, Physical Plan, Optimized Logical Plan, etc?  I kind of like
the idea of putting it on the treeString method itself since then it gets
applied whereever that method is called rather than only in
queryExecution.toString.

 

Dave

 

From: Jungtaek Lim  
Sent: Sunday, November 18, 2018 11:07 PM
To: apache/spark 
Cc: David DeCaprio ; Mention

Subject: Re: [apache/spark] [SPARK-26103][SQL] Added maxDepth to limit the
length of text plans (#23076)

 

I'm seeing both sides of needs: while I think dumping full plan into file is
a good feature for debugging specific issue, retaining full plans for
representing them to UI page have been a headache and three regarding issues
(SPARK-23904  ,
SPARK-25380  ,
SPARK-26103  ) are filed
in 3 months which doesn't look like a thing we can say end users should take
a workaround.

One thing we may be aware is that huge plan is not generated not only from
nested join, but also from lots of columns, like SPARK-23904. For
SPARK-25380 we are not aware of which parts generate huge plan. So we might
feel easier and flexible to just truncate to specific size rather than
applying conditions.

-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
 , or
mute the thread
 .
 

 
 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/23076
  
I'm seeing both sides of needs: while I think dumping full plan into file 
is a good feature for debugging specific issue, retaining full plans for 
representing them to UI page have been a headache and three regarding issues 
([SPARK-23904](https://issues.apache.org/jira/browse/SPARK-23904), 
[SPARK-25380](https://issues.apache.org/jira/browse/SPARK-25380), 
[SPARK-26103](https://issues.apache.org/jira/browse/SPARK-26103)) are filed in 
3 months which doesn't look like a thing we can say end users should take a 
workaround.

One thing we may be aware is that huge plan is not generated not only from 
nested join, but also from lots of columns, like SPARK-23904. For SPARK-25380 
we are not aware of which parts generate huge plan. So we might feel easier and 
flexible to just truncate to specific size rather than applying conditions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
Any case where we use truncatedString to limit the output columns I also 
think makes sense to limit the plan depth.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
I originally wrote this against 2.3.0 where the toString was always called 
as part of newExecutionId.  I'm not sure if that still happens in master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
The goal of this is to avoid creation of massive strings if toString gets 
called on a QueryExecution.  In our use we have some large plans that generate 
plan strings in the hundreds of megabytes.  These plans are efficiently 
executed because of caching and partitioning, but we are running into out of 
memory errors because of the QueryExecution.toString.

In our case, writing to a file would help the memory issue, but we'd still 
be writing 200Mb+ files for each query.  (Note these queries take seconds to 
run because 99% of the work is cached, and we run a bunch of them).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23076
  
@DaveDeCaprio What is the practical use case for limiting the plans? My 
initial goal in the PRs #22429, #23018 and #23039 is to dump whole plans and 
gather more info for debugging. Your PR has an opposite purpose, it seems. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-18 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
@MaxGekk and @hvanhovell, I think this PR is complementary to the PR for 
SPARK-26023 which recently made allowed for file dumping of queries.  

It might make sense to only have the default depth apply to the toString 
function and not writing to a file.  We could have a parameter to toFile to 
control the depth, but that would default to Int.MaxValue. toString would use 
the configured parameter. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
This contribution is my original work and I license the work to the project 
under the project’s open source license.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org