This one seems to be relevant, but it's already fixed in 2.1.0.

One way to debug is to turn on trace log and check how the analyzer/optimizer behaves.


On 2/22/17 11:11 PM, StanZhai wrote:
Could this be related to https://issues.apache.org/jira/browse/SPARK-17733 ?


------------------ Original ------------------
*From: * "Cheng Lian-3 [via Apache Spark Developers List]";<[hidden email] </user/SendEmail.jtp?type=node&node=21054&i=0>>;
*Send time:* Thursday, Feb 23, 2017 9:43 AM
*To:* "Stan Zhai"<[hidden email] </user/SendEmail.jtp?type=node&node=21054&i=1>>;
*Subject: * Re: The driver hangs at DataFrame.rdd in Spark 2.1.0

Just from the thread dump you provided, it seems that this particular query plan jams our optimizer. However, it's also possible that the driver just happened to be running optimizer rules at that particular time point.

Since query planning doesn't touch any actual data, could you please try to minimize this query by replacing the actual relations with temporary views derived from Scala local collections? In this way, it would be much easier for others to reproduce issue.

Cheng


On 2/22/17 5:16 PM, Stan Zhai wrote:
Thanks for lian's reply.

Here is the QueryPlan generated by Spark 1.6.2(I can't get it in Spark 2.1.0):
|...|
||

------------------ Original ------------------
*Subject: * Re: The driver hangs at DataFrame.rdd in Spark 2.1.0

What is the query plan? We had once observed query plans that grow exponentially in iterative ML workloads and the query planner hangs forever. For example, each iteration combines 4 plan trees of the last iteration and forms a larger plan tree. The size of the plan tree can easily reach billions of nodes after 15 iterations.


On 2/22/17 9:29 AM, Stan Zhai wrote:
Hi all,

The driver hangs at DataFrame.rdd in Spark 2.1.0 when the DataFrame(SQL) is complex, Following thread dump of my driver:
...




------------------------------------------------------------------------
If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21053.html To start a new topic under Apache Spark Developers List, email [hidden email] </user/SendEmail.jtp?type=node&node=21054&i=2>
To unsubscribe from Apache Spark Developers List, click here.
NAML <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

------------------------------------------------------------------------
View this message in context: Re: The driver hangs at DataFrame.rdd in Spark 2.1.0 <http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21054.html> Sent from the Apache Spark Developers List mailing list archive <http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.

Reply via email to