Re: Join with multiple conditions (In reference to SPARK-7197)
Davies, I created an issue - SPARK-10246 https://issues.apache.org/jira/browse/SPARK-10246 On Tue, Aug 25, 2015 at 12:53 PM, Davies Liu dav...@databricks.com wrote: It's good to support this, could you create a JIRA for it and target for 1.6? On Tue, Aug 25, 2015 at 11:21 AM, Michal Monselise michal.monsel...@gmail.com wrote: Hello All, PySpark currently has two ways of performing a join: specifying a join condition or column names. I would like to perform a join using a list of columns that appear in both the left and right DataFrames. I have created an example in this question on Stack Overflow. Basically, I would like to do the following as specified in the documentation in /spark/python/pyspark/sql/dataframe.py row 560 and specify a list of column names: df.join(df4, ['name', 'age']).select(df.name, df.age).collect() However, this produces an error. In JIRA issue SPARK-7197, it is mentioned that the syntax is actually different from the one specified in the documentation for joining using a condition. Documentation: cond = [df.name == df3.name, df.age == df3.age] df.join(df3, cond, 'outer').select(df.name, df3.age).collect() JIRA Issue: a.join(b, (a.year==b.year) (a.month==b.month), 'inner') In other words. the join function cannot take a list. I was wondering if you could also clarify what is the correct syntax for providing a list of columns. Thanks, Michal
Fwd: Join with multiple conditions (In reference to SPARK-7197)
Hello All, PySpark currently has two ways of performing a join: specifying a join condition or column names. I would like to perform a join using a list of columns that appear in both the left and right DataFrames. I have created an example in this question on Stack Overflow http://stackoverflow.com/questions/32193488/joining-multiple-columns-in-pyspark . Basically, I would like to do the following as specified in the documentation in /spark/python/pyspark/sql/dataframe.py row 560 and specify a list of column names: df.join(df4, ['name', 'age']).select(df.name, df.age).collect() However, this produces an error. In JIRA issue SPARK-7197 https://issues.apache.org/jira/browse/SPARK-7197, it is mentioned that the syntax is actually different from the one specified in the documentation for joining using a condition. Documentation: cond = [df.name == df3.name, df.age == df3.age] df.join(df3, cond, 'outer').select(df.name, df3.age).collect() JIRA Issue: a.join(b, (a.year==b.year) (a.month==b.month), 'inner') In other words. the join function cannot take a list. I was wondering if you could also clarify what is the correct syntax for providing a list of columns. Thanks, Michal
Re: Join with multiple conditions (In reference to SPARK-7197)
It's good to support this, could you create a JIRA for it and target for 1.6? On Tue, Aug 25, 2015 at 11:21 AM, Michal Monselise michal.monsel...@gmail.com wrote: Hello All, PySpark currently has two ways of performing a join: specifying a join condition or column names. I would like to perform a join using a list of columns that appear in both the left and right DataFrames. I have created an example in this question on Stack Overflow. Basically, I would like to do the following as specified in the documentation in /spark/python/pyspark/sql/dataframe.py row 560 and specify a list of column names: df.join(df4, ['name', 'age']).select(df.name, df.age).collect() However, this produces an error. In JIRA issue SPARK-7197, it is mentioned that the syntax is actually different from the one specified in the documentation for joining using a condition. Documentation: cond = [df.name == df3.name, df.age == df3.age] df.join(df3, cond, 'outer').select(df.name, df3.age).collect() JIRA Issue: a.join(b, (a.year==b.year) (a.month==b.month), 'inner') In other words. the join function cannot take a list. I was wondering if you could also clarify what is the correct syntax for providing a list of columns. Thanks, Michal - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org