[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Sean Owen (JIRA) Tue, 06 Oct 2015 12:43:46 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945638#comment-14945638
 ]


Sean Owen commented on SPARK-10914:
-----------------------------------

Hm, it could be a valid lead after all. The size estimator code is aware of 
32-bit vs 64-bit pointers but a next line of inquiry might be to determine if 
somehow in your case it's detected incorrectly and that causes an error.  It 
sounds like compressed oops are off, but it thinks it's on. You could try 
adding "-Dspark.test.useCompressedOops=false" to see if it works then. That 
would pretty much confirm it. Then the question is what, for example 
SizeEstimator thinks for these values; if you can dig in to that and see what 
it comes up with that would help.

I'm assuming it is related to the part of the code that looks for compressed 
oops, but I wonder if somehow this affects Tungsten? CC [~rxin] for what may be 
a dumb question.

> Incorrect empty join sets when executor-memory >= 32g
> -----------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Using an inner join, to match together two integer columns, I generally get 
> no results when there should be matches.  But the results vary and depend on 
> whether the dataframes are coming from SQL, JSON, or cached, as well as the 
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on 
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
> http://spark.apache.org/downloads.html.)
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Reply via email to