[jira] [Commented] (SPARK-19628) Duplicate Spark jobs in 2.1.0

Jork Zijlstra (JIRA) Tue, 31 Oct 2017 03:15:32 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226559#comment-16226559
 ]


Jork Zijlstra commented on SPARK-19628:
---------------------------------------

Hello [~guilhermeslucas],

I'm currently no longer employed at the company where we encountered the 
problem. 
[~skoning] Do you still have the problem and could you help?

How much more code do you need? Usually you want to scale down the test to find 
the problem and this is pretty much the minimal version.

{code}
spark.read.orc(...).show(20) or spark.read.orc(...).collect()
{code}
Both trigger the duplicate jobs.

Regards, Jork

> Duplicate Spark jobs in 2.1.0
> -----------------------------
>
>                 Key: SPARK-19628
>                 URL: https://issues.apache.org/jira/browse/SPARK-19628
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Jork Zijlstra
>         Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, 
> spark2.1.0.png
>
>
> After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs 
> executed. Going back to Spark 2.0.1 they are gone again
> {code}
> import org.apache.spark.sql._
> object DoubleJobs {
>   def main(args: Array[String]) {
>     System.setProperty("hadoop.home.dir", "/tmp");
>     val sparkSession: SparkSession = SparkSession.builder
>       .master("local[4]")
>       .appName("spark session example")
>       .config("spark.driver.maxResultSize", "6G")
>       .config("spark.sql.orc.filterPushdown", true)
>       .config("spark.sql.hive.metastorePartitionPruning", true)
>       .getOrCreate()
>     sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
>     val paths = Seq(
>       ""//some orc source
>     )
>     def dataFrame(path: String): DataFrame = {
>       sparkSession.read.orc(path)
>     }
>     paths.foreach(path => {
>       dataFrame(path).show(20)
>     })
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19628) Duplicate Spark jobs in 2.1.0

Reply via email to