I just tested your pr On 25 Apr 2015 10:18, "Ali Bajwa" <ali.ba...@gmail.com> wrote:
> Any ideas on this? Any sample code to join 2 data frames on two columns? > > Thanks > Ali > > On Apr 23, 2015, at 1:05 PM, Ali Bajwa <ali.ba...@gmail.com> wrote: > > > Hi experts, > > > > Sorry if this is a n00b question or has already been answered... > > > > Am trying to use the data frames API in python to join 2 dataframes > > with more than 1 column. The example I've seen in the documentation > > only shows a single column - so I tried this: > > > > ****Example code**** > > > > import pandas as pd > > from pyspark.sql import SQLContext > > hc = SQLContext(sc) > > A = pd.DataFrame({'year': ['1993', '2005', '1994'], 'month': ['5', > > '12', '12'], 'value': [100, 200, 300]}) > > a = hc.createDataFrame(A) > > B = pd.DataFrame({'year': ['1993', '1993'], 'month': ['12', '12'], > > 'value': [101, 102]}) > > b = hc.createDataFrame(B) > > > > print "Pandas" # try with Pandas > > print A > > print B > > print pd.merge(A, B, on=['year', 'month'], how='inner') > > > > print "Spark" > > print a.toPandas() > > print b.toPandas() > > print a.join(b, a.year==b.year and a.month==b.month, 'inner').toPandas() > > > > > > *****Output**** > > > > Pandas > > month value year > > 0 5 100 1993 > > 1 12 200 2005 > > 2 12 300 1994 > > > > month value year > > 0 12 101 1993 > > 1 12 102 1993 > > > > Empty DataFrame > > > > Columns: [month, value_x, year, value_y] > > > > Index: [] > > > > Spark > > month value year > > 0 5 100 1993 > > 1 12 200 2005 > > 2 12 300 1994 > > > > month value year > > 0 12 101 1993 > > 1 12 102 1993 > > > > month value year month value year > > 0 12 200 2005 12 102 1993 > > 1 12 200 2005 12 101 1993 > > 2 12 300 1994 12 102 1993 > > 3 12 300 1994 12 101 1993 > > > > It looks like Spark returns some results where an inner join should > > return nothing. > > > > Am I doing the join with two columns in the wrong way? If yes, what is > > the right syntax for this? > > > > Thanks! > > Ali > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >