Thanks for Cheng's help.

It must be something wrong with InferFiltersFromConstraints, I just removed 
InferFiltersFromConstraints from 
org/apache/spark/sql/catalyst/optimizer/Optimizer.scala to avoid this issue. I 
will analysis this issue with the method your provided.




------------------ Original ------------------
From:  "Cheng Lian [via Apache Spark Developers 
List]";<ml-node+s1001551n21069...@n3.nabble.com>;
Send time: Friday, Feb 24, 2017 2:28 AM
To: "Stan Zhai"<m...@zhaishidan.cn>; 

Subject:  Re: The driver hangs at DataFrame.rdd in Spark 2.1.0



                           
This one seems to be relevant, but it's already fixed in 2.1.0.
     
One way to debug is to turn on trace log and check how the       
analyzer/optimizer behaves.
     
     
     On 2/22/17 11:11 PM, StanZhai wrote:
     
            Could this be related to 
https://issues.apache.org/jira/browse/SPARK-17733 ?
                
         
         
         
         ------------------ Original ------------------
                    From:  "Cheng Lian-3 [via Apache Spark Developers           
  List]";<[hidden               email]>;
           Send time: Thursday, Feb 23, 2017 9:43 AM
           To: "Stan Zhai"<[hidden               email]>; 
           Subject:  Re: The driver hangs at DataFrame.rdd             in Spark 
2.1.0
         
         
         
         
Just from the thread dump you provided, it seems that this           particular 
query plan jams our optimizer. However, it's also           possible that the 
driver just happened to be running optimizer           rules at that particular 
time point.
         
         
Since query planning doesn't touch any actual data, could you           please 
try to minimize this query by replacing the actual           relations with 
temporary views derived from Scala local           collections? In this way, it 
would be much easier for others           to reproduce issue.
         
Cheng
         
         
         On 2/22/17 5:16 PM, Stan Zhai           wrote:
         
                    Thanks for lian's reply.
           
           
           Here is the QueryPlan generated by Spark 1.6.2(I can't             
get it in Spark 2.1.0):
                        ...           
                        
                        
             
             ------------------ Original ------------------
                            Subject:  Re: The driver hangs at                 
DataFrame.rdd in Spark 2.1.0
             
             
             
             
What is the query plan? We had once observed query plans               that 
grow exponentially in iterative ML workloads and the               query 
planner hangs forever. For example, each iteration               combines 4 
plan trees of the last iteration and forms a               larger plan tree. 
The size of the plan tree can easily               reach billions of nodes 
after 15 iterations.
             
             
             On 2/22/17 9:29 AM, Stan Zhai               wrote:
             
                            Hi all,
               
               
               The driver hangs at DataFrame.rdd in Spark 2.1.0 when            
     the DataFrame(SQL) is complex, Following thread dump of                 my 
driver:
               ...
                          
           
                  
         
         
         
                    If you reply to this email, your             message will 
be added to the discussion below:
           
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21053.html
         
                    To start a new topic under Apache Spark Developers List, 
email           [hidden email]           
           To unsubscribe from Apache Spark Developers List, click here.
           NAML 
       
       
       
       View this message in context: Re:         The driver hangs at 
DataFrame.rdd in Spark 2.1.0
       Sent from the Apache         Spark Developers List mailing list archive 
at Nabble.com.
          
                                
        
        
                        If you reply to this email, your message will be added 
to the discussion below:
                
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21069.html
        
                        To start a new topic under Apache Spark Developers 
List, email ml-node+s1001551n1...@n3.nabble.com 
                To unsubscribe from Apache Spark Developers List, click here.
                NAML



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21073.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to