[ 
https://issues.apache.org/jira/browse/SPARK-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096247#comment-15096247
 ] 

Allen Liang commented on SPARK-12691:
-------------------------------------

Hi Bo Meng, do you have any comments in response to my reply. Do we consider 
this a performance issue in Dataframe? Thanks.

> Multiple unionAll on Dataframe seems to cause repeated calculations in a 
> "Fibonacci" manner
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12691
>                 URL: https://issues.apache.org/jira/browse/SPARK-12691
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
>         Environment: Tested in Spark 1.3 and 1.4.
>            Reporter: Allen Liang
>
> Multiple unionAll on Dataframe seems to cause repeated calculations. Here is 
> the sample code to reproduce this issue.
> val dfs = for (i<-0 to 100) yield {
>   val df = sc.parallelize((0 to 10).zipWithIndex).toDF("A", "B")
>   df
> }
> var i = 1
> val s1 = System.currentTimeMillis()
> dfs.reduce{(a,b)=>{
>   val t1 = System.currentTimeMillis()
>   val dd = a unionAll b
>   val t2 = System.currentTimeMillis()
>   println("Round " + i + " unionAll took " + (t2 - t1) + " ms")
>   i = i + 1
>   dd
>   }
> }
> val s2 = System.currentTimeMillis()
> println((i - 1) + " unionAll took totally " + (s2 - s1) + " ms")
> And it printed as follows. And as you can see, it looks like each unionAll 
> seems to redo all the previous unionAll and therefore took self time plus all 
> previous time, which, not precisely speaking, makes each unionAll look like a 
> "Fibonacci" action.
> BTW, this behaviour doesn't happen if I directly union all the RDDs in 
> Dataframes.
> ----- output start ----
> Round 1 unionAll took 1 ms
> Round 2 unionAll took 1 ms
> Round 3 unionAll took 1 ms
> Round 4 unionAll took 1 ms
> Round 5 unionAll took 1 ms
> Round 6 unionAll took 1 ms
> Round 7 unionAll took 1 ms
> Round 8 unionAll took 2 ms
> Round 9 unionAll took 2 ms
> Round 10 unionAll took 2 ms
> Round 11 unionAll took 3 ms
> Round 12 unionAll took 3 ms
> Round 13 unionAll took 3 ms
> Round 14 unionAll took 3 ms
> Round 15 unionAll took 3 ms
> Round 16 unionAll took 4 ms
> Round 17 unionAll took 4 ms
> Round 18 unionAll took 4 ms
> Round 19 unionAll took 4 ms
> Round 20 unionAll took 4 ms
> Round 21 unionAll took 5 ms
> Round 22 unionAll took 5 ms
> Round 23 unionAll took 5 ms
> Round 24 unionAll took 5 ms
> Round 25 unionAll took 5 ms
> Round 26 unionAll took 6 ms
> Round 27 unionAll took 6 ms
> Round 28 unionAll took 6 ms
> Round 29 unionAll took 6 ms
> Round 30 unionAll took 6 ms
> Round 31 unionAll took 6 ms
> Round 32 unionAll took 7 ms
> Round 33 unionAll took 7 ms
> Round 34 unionAll took 7 ms
> Round 35 unionAll took 7 ms
> Round 36 unionAll took 7 ms
> Round 37 unionAll took 8 ms
> Round 38 unionAll took 8 ms
> Round 39 unionAll took 8 ms
> Round 40 unionAll took 8 ms
> Round 41 unionAll took 9 ms
> Round 42 unionAll took 9 ms
> Round 43 unionAll took 9 ms
> Round 44 unionAll took 9 ms
> Round 45 unionAll took 9 ms
> Round 46 unionAll took 9 ms
> Round 47 unionAll took 9 ms
> Round 48 unionAll took 9 ms
> Round 49 unionAll took 10 ms
> Round 50 unionAll took 10 ms
> Round 51 unionAll took 10 ms
> Round 52 unionAll took 10 ms
> Round 53 unionAll took 11 ms
> Round 54 unionAll took 11 ms
> Round 55 unionAll took 11 ms
> Round 56 unionAll took 12 ms
> Round 57 unionAll took 12 ms
> Round 58 unionAll took 12 ms
> Round 59 unionAll took 12 ms
> Round 60 unionAll took 12 ms
> Round 61 unionAll took 12 ms
> Round 62 unionAll took 13 ms
> Round 63 unionAll took 13 ms
> Round 64 unionAll took 13 ms
> Round 65 unionAll took 13 ms
> Round 66 unionAll took 14 ms
> Round 67 unionAll took 14 ms
> Round 68 unionAll took 14 ms
> Round 69 unionAll took 14 ms
> Round 70 unionAll took 14 ms
> Round 71 unionAll took 14 ms
> Round 72 unionAll took 14 ms
> Round 73 unionAll took 14 ms
> Round 74 unionAll took 15 ms
> Round 75 unionAll took 15 ms
> Round 76 unionAll took 15 ms
> Round 77 unionAll took 15 ms
> Round 78 unionAll took 16 ms
> Round 79 unionAll took 16 ms
> Round 80 unionAll took 16 ms
> Round 81 unionAll took 16 ms
> Round 82 unionAll took 17 ms
> Round 83 unionAll took 17 ms
> Round 84 unionAll took 17 ms
> Round 85 unionAll took 17 ms
> Round 86 unionAll took 17 ms
> Round 87 unionAll took 18 ms
> Round 88 unionAll took 17 ms
> Round 89 unionAll took 18 ms
> Round 90 unionAll took 18 ms
> Round 91 unionAll took 18 ms
> Round 92 unionAll took 18 ms
> Round 93 unionAll took 18 ms
> Round 94 unionAll took 19 ms
> Round 95 unionAll took 19 ms
> Round 96 unionAll took 20 ms
> Round 97 unionAll took 20 ms
> Round 98 unionAll took 20 ms
> Round 99 unionAll took 20 ms
> Round 100 unionAll took 20 ms
> 100 unionAll took totally 1337 ms
> ----- output end ----



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to