Performance Spark SQL vs Dataframe API faster

2015-09-22 Thread sanderg
Is there a difference in performance between writing a spark job using only SQL statements and writing it using the dataframe api or does it translate to the same thing under the hood? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-Spark-SQL-vs

RE: Performance Spark SQL vs Dataframe API faster

2015-09-22 Thread Cheng, Hao
-Spark-SQL-vs-Dataframe-API-faster-tp24768.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: SQL vs. DataFrame API

2015-06-23 Thread Bob Corsaro
Thanks! The solution: https://gist.github.com/dokipen/018a1deeab668efdf455 On Mon, Jun 22, 2015 at 4:33 PM Davies Liu dav...@databricks.com wrote: Right now, we can not figure out which column you referenced in `select`, if there are multiple row with the same name in the joined DataFrame

Re: SQL vs. DataFrame API

2015-06-23 Thread Ignacio Blasco
That issue happens only in python dsl? El 23/6/2015 5:05 p. m., Bob Corsaro rcors...@gmail.com escribió: Thanks! The solution: https://gist.github.com/dokipen/018a1deeab668efdf455 On Mon, Jun 22, 2015 at 4:33 PM Davies Liu dav...@databricks.com wrote: Right now, we can not figure out which

Re: SQL vs. DataFrame API

2015-06-23 Thread Bob Corsaro
I've only tried it in python On Tue, Jun 23, 2015 at 12:16 PM Ignacio Blasco elnopin...@gmail.com wrote: That issue happens only in python dsl? El 23/6/2015 5:05 p. m., Bob Corsaro rcors...@gmail.com escribió: Thanks! The solution: https://gist.github.com/dokipen/018a1deeab668efdf455 On

Re: SQL vs. DataFrame API

2015-06-23 Thread Ignacio Blasco
It seems that it doesn't happen in Scala API. Not exactly the same as in python, but pretty close. https://gist.github.com/elnopintan/675968d2e4be68958df8 2015-06-23 23:11 GMT+02:00 Davies Liu dav...@databricks.com: I think it also happens in DataFrames API of all languages. On Tue, Jun 23,

Re: SQL vs. DataFrame API

2015-06-23 Thread Davies Liu
I think it also happens in DataFrames API of all languages. On Tue, Jun 23, 2015 at 9:16 AM, Ignacio Blasco elnopin...@gmail.com wrote: That issue happens only in python dsl? El 23/6/2015 5:05 p. m., Bob Corsaro rcors...@gmail.com escribió: Thanks! The solution:

Re: SQL vs. DataFrame API

2015-06-23 Thread Davies Liu
If yo change to ```val numbers2 = numbers```, then it have the same problem On Tue, Jun 23, 2015 at 2:54 PM, Ignacio Blasco elnopin...@gmail.com wrote: It seems that it doesn't happen in Scala API. Not exactly the same as in python, but pretty close.

Re: SQL vs. DataFrame API

2015-06-22 Thread Ignacio Blasco
Sorry thought it was scala/spark El 22/6/2015 9:49 p. m., Bob Corsaro rcors...@gmail.com escribió: That's invalid syntax. I'm pretty sure pyspark is using a DSL to create a query here and not actually doing an equality operation. On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco

Re: SQL vs. DataFrame API

2015-06-22 Thread Davies Liu
Right now, we can not figure out which column you referenced in `select`, if there are multiple row with the same name in the joined DataFrame (for example, two `value`). A workaround could be: numbers2 = numbers.select(df.name, df.value.alias('other')) rows = numbers.join(numbers2,

Re: SQL vs. DataFrame API

2015-06-22 Thread Ignacio Blasco
Probably you should use === instead of == and !== instead of != Can anyone explain why the dataframe API doesn't work as I expect it to here? It seems like the column identifiers are getting confused. https://gist.github.com/dokipen/4b324a7365ae87b7b0e5

SQL vs. DataFrame API

2015-06-22 Thread Bob Corsaro
Can anyone explain why the dataframe API doesn't work as I expect it to here? It seems like the column identifiers are getting confused. https://gist.github.com/dokipen/4b324a7365ae87b7b0e5

Re: SQL vs. DataFrame API

2015-06-22 Thread Bob Corsaro
That's invalid syntax. I'm pretty sure pyspark is using a DSL to create a query here and not actually doing an equality operation. On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco elnopin...@gmail.com wrote: Probably you should use === instead of == and !== instead of != Can anyone explain why