[ 
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948511#comment-14948511
 ] 

Sean Owen commented on SPARK-10914:
-----------------------------------

I don't think having it on one machine necessarily matters. You still have two 
JVMs in play; whereas you can't reproduce when just one JVM is involved. What 
if the driver has oops on, but the executor does not? and the results from the 
executor are parsed somewhere as if oops are on? Normally this would be wholly 
transparent to JVM bytecode but tungsten / SizeEstimator are depending in part 
on the actual representation of the object in memory.

> Incorrect empty join sets when executor-memory >= 32g
> -----------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Using an inner join, to match together two integer columns, I generally get 
> no results when there should be matches.  But the results vary and depend on 
> whether the dataframes are coming from SQL, JSON, or cached, as well as the 
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on 
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
> http://spark.apache.org/downloads.html.)
> {code}
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to