Hi,
After taking a look at the code, I found out the problem:
As spark will use broadcastNestedLoopJoin to treat nonequality condition.
And one of my dataframe(df1) is created from an existing RDD(logicalRDD),
so it uses defaultSizeInBytes * length to estimate the size. The other
dataframe(df2)
Hi,
I am sorry to bother again.
When I do join as follow:
df = sqlContext.sql(selet a.someItem, b.someItem from a full outer join b
on condition1 *or* condition2)
df.first()
The program failed at the result size is bigger than
spark.driver.maxResultSize.
It is really strange, as one record is no
Hi,
I might have a stupid question about sparksql's implementation of join on
not equality conditions, for instance condition1 or condition2.
In fact, Hive doesn't support such join, as it is very difficult to express
such conditions as a map/reduce job. However, sparksql supports such