Oh, sorry you’re right. I looked at the doc for join() <http://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join> and didn’t realize you could do a cartesian join. But it turns out that df1.join(df2) does the job and matches the SQL equivalent too.
On Mon, Jul 25, 2016 at 6:45 PM Reynold Xin <r...@databricks.com> wrote: > DataFrame can do cartesian joins. > > > On July 25, 2016 at 3:43:19 PM, Nicholas Chammas ( > nicholas.cham...@gmail.com) wrote: > > It appears that RDDs can do a cartesian join, but not DataFrames. Is there > a fundamental reason why not, or is this just waiting for someone to > implement? > > I know you can get the RDDs underlying the DataFrames and do the cartesian > join that way, but you lose the schema of course. > > Nick > >