Re: [VOTE] Spark 2.2.1 (RC2)
Hi Felix Cheung: When to pulish the new version 2.2.1 of spark doc to the website, now it's still the version 2.2.0. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Result obtained before the completion of Stages
Hi Reynold, I am running a Spark SQL query. val df = spark.sql("select * from table1 t1 join table2 t2 on t1.col1=t2.col1") df.count() -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Result obtained before the completion of Stages
What did you run? On Tue, Dec 26, 2017 at 10:21 PM, ckhari4uwrote: > Hi Sean, > > Thanks for the reply. I believe I am not facing the scenarios you > mentioned. > > Timestamp conflict: I see the Spark driver logs on the console (tried with > INFO and DEBUG). In all the scenarios, I see the result getting printed and > the application execution continues for 4 more minutes. > ie: I have seen scenarios where Spark History Server time stamp not > matching > with the Spark driver logs and all. In this case, I am checking only the > driver logs and I could see the logs getting printed on the console even > after the result is generated. > > Stages of a different action: I am performing a join on 2 tables and doing > a > count operation. So there is only one action. The stage which is taking > more > time is the join phase (Sort merge join specifically). To improve the join, > I tried to cache the smaller dataset. Then I do not see the issue. > > I am just wondering how Spark can get the result before the completion of > the join operation. > > PS: My actual query in the application has many operators, UDF's etc. The > above is the minimal operation query for which I am able to reproduce the > issue. > > > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Result obtained before the completion of Stages
My guess is that either they haven't actually finished before the result and something about timestamps you're comparing is misleading, or else, you're looking at stages executing that are part of a later part of the program. On Tue, Dec 26, 2017 at 3:49 PM ckhari4uwrote: > I found this interesting behavior while running some adhoc analysis query. > I > have a Spark SQL query where I am joining 2 tables and then performing a > count operation. In the Spark Web UI, I see there are 4 Stages getting > launched. > > The interesting behavior I see here is that I see the result before all > stages are executed. The Stage 2 which performs the Sort merge join is > running but I see the result in the Spark Shell before the completion of > Stage 2. However, the application still continues to run? > > Any thoughts on this behavior? > > > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Result obtained before the completion of Stages
I found this interesting behavior while running some adhoc analysis query. I have a Spark SQL query where I am joining 2 tables and then performing a count operation. In the Spark Web UI, I see there are 4 Stages getting launched. The interesting behavior I see here is that I see the result before all stages are executed. The Stage 2 which performs the Sort merge join is running but I see the result in the Spark Shell before the completion of Stage 2. However, the application still continues to run? Any thoughts on this behavior? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
[JDBC-WRITER] Support for Updates
Hey guys, Nowadays JDBC writer only support inserts. Are there plans for update support on jdbc writer? -- Diogo Munaro Vieira