Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Axel Dahl
still feels like a bug to have to create unique names before a join. On Fri, Jun 26, 2015 at 9:51 PM, ayan guha guha.a...@gmail.com wrote: You can declare the schema with unique names before creation of df. On 27 Jun 2015 13:01, Axel Dahl a...@whisperstream.com wrote: I have the following

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs 1.4? Nick 2015년 6월 27일 (토) 오전 2:51, Axel Dahl a...@whisperstream.com님이 작성: still feels like a bug to have to create unique names before a join. On Fri, Jun 26, 2015 at 9:51 PM, ayan

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Axel Dahl
I've only tested on 1.4, but imagine 1.3 is the same or a lot of people's code would be failing right now. On Saturday, June 27, 2015, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
I would test it against 1.3 to be sure, because it could -- though unlikely -- be a regression. For example, I recently stumbled upon this issue https://issues.apache.org/jira/browse/SPARK-8670 which was specific to 1.4. On Sat, Jun 27, 2015 at 12:28 PM Axel Dahl a...@whisperstream.com wrote:

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Axel Dahl
created as SPARK-8685 https://issues.apache.org/jira/browse/SPARK-8685 @Yin, thx, have fixed sample code with the correct names. On Sat, Jun 27, 2015 at 1:56 PM, Yin Huai yh...@databricks.com wrote: Axel, Can you file a jira and attach your code in the description of the jira? This looks

dataframe left joins are not working as expected in pyspark

2015-06-26 Thread Axel Dahl
I have the following code: from pyspark import SQLContext d1 = [{'name':'bob', 'country': 'usa', 'age': 1}, {'name':'alice', 'country': 'jpn', 'age': 2}, {'name':'carol', 'country': 'ire', 'age': 3}] d2 = [{'name':'bob', 'country': 'usa', 'colour':'red'}, {'name':'alice', 'country': 'ire',