[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Ben Moran (JIRA) Tue, 06 Oct 2015 10:42:51 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945441#comment-14945441
 ]


Ben Moran commented on SPARK-10914:
-----------------------------------

I just ran with 
--executor-memory 100g --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

but the problem persists.  In the worker log it shows:


15/10/06 18:36:36 INFO ExecutorRunner: Launch command: 
"/usr/lib/jvm/java-7-oracle/jre/bin/java" "-cp" 
"/home/spark/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar"
 "-Xms102400M" "-Xmx102400M" "-Dspark.driver.port=53169" 
"-XX:-UseCompressedOops" "-XX:MaxPermSize=256m" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"akka.tcp://sparkDriver@10.122.82.99:53169/user/CoarseGrainedScheduler" 
"--executor-id" "0" "--hostname" "10.122.82.99" "--cores" "20" "--app-id" 
"app-20151006183636-0019" "--worker-url" 
"akka.tcp://sparkWorker@10.122.82.99:51402/user/Worker"


> Incorrect empty join sets when executor-memory >= 32g
> -----------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Using an inner join, to match together two integer columns, I generally get 
> no results when there should be matches.  But the results vary and depend on 
> whether the dataframes are coming from SQL, JSON, or cached, as well as the 
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on 
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from 
> http://spark.apache.org/downloads.html.)
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g

Reply via email to