[ 
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-10914:
--------------------------------
    Description: 
Updated description (by rxin on Oct 8, 2015)

To reproduce, launch Spark using
{code}
MASTER=local-cluster[2,1,1024] bin/spark-shell --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
{code}





*Original bug report description*:

Using an inner join, to match together two integer columns, I generally get no 
results when there should be matches.  But the results vary and depend on 
whether the dataframes are coming from SQL, JSON, or cached, as well as the 
order in which I cache things and query them.

This minimal example reproduces it consistently for me in the spark-shell, on 
new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
http://spark.apache.org/downloads.html.)

{code}
/* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
val x = sql("select 1 xx union all select 2") 
val y = sql("select 1 yy union all select 2")

x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
/* If I cache both tables it works: */
x.cache()
y.cache()
x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */

/* but this still doesn't work: */
x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
{code}

  was:

Updated description (by rxin on Oct 8, 2015)

To reproduce, launch Spark using
{code}
MASTER=local-cluster[2,1,1024] bin/spark-shell --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
{code}





Original bug report description:

Using an inner join, to match together two integer columns, I generally get no 
results when there should be matches.  But the results vary and depend on 
whether the dataframes are coming from SQL, JSON, or cached, as well as the 
order in which I cache things and query them.

This minimal example reproduces it consistently for me in the spark-shell, on 
new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
http://spark.apache.org/downloads.html.)

{code}
/* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
val x = sql("select 1 xx union all select 2") 
val y = sql("select 1 yy union all select 2")

x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
/* If I cache both tables it works: */
x.cache()
y.cache()
x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */

/* but this still doesn't work: */
x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
{code}


> UnsafeRow serialization breaks when two machines have different Oops size
> -------------------------------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Updated description (by rxin on Oct 8, 2015)
> To reproduce, launch Spark using
> {code}
> MASTER=local-cluster[2,1,1024] bin/spark-shell --conf 
> "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
> {code}
> *Original bug report description*:
> Using an inner join, to match together two integer columns, I generally get 
> no results when there should be matches.  But the results vary and depend on 
> whether the dataframes are coming from SQL, JSON, or cached, as well as the 
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on 
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
> http://spark.apache.org/downloads.html.)
> {code}
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to