[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Ben Moran (JIRA) Thu, 08 Oct 2015 04:17:01 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948492#comment-14948492
 ]


Ben Moran commented on SPARK-10914:
-----------------------------------

I just tried moving the master to the worker box, so it's entirely on one 
machine.  (Ubuntu 14.04 + now Oracle JDK 1.8).

It still reproduces the bug.  So, entirely on spark-worker:

{code}
spark@spark-worker:~/spark-1.5.1-bin-hadoop2.6$ sbin/start-master.sh
spark@spark-worker:~/spark-1.5.1-bin-hadoop2.6$ sbin/start-slave.sh --master 
spark://spark-worker:7077
spark@spark-worker:~/spark-1.5.1-bin-hadoop2.6$ bin/spark-shell --master 
spark://spark-worker:7077  --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Using Spark's repl log4j profile: 
org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/08 12:15:12 WARN MetricsSystem: Using default name DAGScheduler for 
source because spark.app.id is not set.
Spark context available as sc.
15/10/08 12:15:14 WARN Connection: BoneCP specified but not present in 
CLASSPATH (or one of dependencies)
15/10/08 12:15:14 WARN Connection: BoneCP specified but not present in 
CLASSPATH (or one of dependencies)
15/10/08 12:15:19 WARN ObjectStore: Version information not found in metastore. 
hive.metastore.schema.verification is not enabled so recording the schema 
version 1.2.0
15/10/08 12:15:20 WARN ObjectStore: Failed to get database default, returning 
NoSuchObjectException
15/10/08 12:15:21 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/10/08 12:15:21 WARN Connection: BoneCP specified but not present in 
CLASSPATH (or one of dependencies)
15/10/08 12:15:21 WARN Connection: BoneCP specified but not present in 
CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

scala> val x = sql("select 1 xx union all select 2")
x: org.apache.spark.sql.DataFrame = [xx: int]

scala> val y = sql("select 1 yy union all select 2")
y: org.apache.spark.sql.DataFrame = [yy: int]

scala> 

scala> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */ 
res0: Long = 0        

{code}

does give me the incorrect count.

> Incorrect empty join sets when executor-memory >= 32g
> -----------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Using an inner join, to match together two integer columns, I generally get 
> no results when there should be matches.  But the results vary and depend on 
> whether the dataframes are coming from SQL, JSON, or cached, as well as the 
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on 
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
> http://spark.apache.org/downloads.html.)
> {code}
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Reply via email to