[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945638#comment-14945638 ]
Sean Owen commented on SPARK-10914: ----------------------------------- Hm, it could be a valid lead after all. The size estimator code is aware of 32-bit vs 64-bit pointers but a next line of inquiry might be to determine if somehow in your case it's detected incorrectly and that causes an error. It sounds like compressed oops are off, but it thinks it's on. You could try adding "-Dspark.test.useCompressedOops=false" to see if it works then. That would pretty much confirm it. Then the question is what, for example SizeEstimator thinks for these values; if you can dig in to that and see what it comes up with that would help. I'm assuming it is related to the part of the code that looks for compressed oops, but I wonder if somehow this affects Tungsten? CC [~rxin] for what may be a dumb question. > Incorrect empty join sets when executor-memory >= 32g > ----------------------------------------------------- > > Key: SPARK-10914 > URL: https://issues.apache.org/jira/browse/SPARK-10914 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0, 1.5.1 > Environment: Ubuntu 14.04 (spark-slave), 12.04 (master) > Reporter: Ben Moran > > Using an inner join, to match together two integer columns, I generally get > no results when there should be matches. But the results vary and depend on > whether the dataframes are coming from SQL, JSON, or cached, as well as the > order in which I cache things and query them. > This minimal example reproduces it consistently for me in the spark-shell, on > new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from > http://spark.apache.org/downloads.html.) > /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */ > val x = sql("select 1 xx union all select 2") > val y = sql("select 1 yy union all select 2") > x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */ > /* If I cache both tables it works: */ > x.cache() > y.cache() > x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */ > /* but this still doesn't work: */ > x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org