[jira] [Updated] (SPARK-19628) Duplicate Spark jobs in 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19628: - Labels: bulk-closed (was: ) > Duplicate Spark jobs in 2.1.0 > - > > Key: SPARK-19628 > URL: https://issues.apache.org/jira/browse/SPARK-19628 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jork Zijlstra >Priority: Major > Labels: bulk-closed > Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, > spark2.1.0.png > > > After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs > executed. Going back to Spark 2.0.1 they are gone again > {code} > import org.apache.spark.sql._ > object DoubleJobs { > def main(args: Array[String]) { > System.setProperty("hadoop.home.dir", "/tmp"); > val sparkSession: SparkSession = SparkSession.builder > .master("local[4]") > .appName("spark session example") > .config("spark.driver.maxResultSize", "6G") > .config("spark.sql.orc.filterPushdown", true) > .config("spark.sql.hive.metastorePartitionPruning", true) > .getOrCreate() > sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > val paths = Seq( > ""//some orc source > ) > def dataFrame(path: String): DataFrame = { > sparkSession.read.orc(path) > } > paths.foreach(path => { > dataFrame(path).show(20) > }) > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19628) Duplicate Spark jobs in 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19628: -- Fix Version/s: (was: 2.0.1) > Duplicate Spark jobs in 2.1.0 > - > > Key: SPARK-19628 > URL: https://issues.apache.org/jira/browse/SPARK-19628 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jork Zijlstra > Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, > spark2.1.0.png > > > After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs > executed. Going back to Spark 2.0.1 they are gone again > {code} > import org.apache.spark.sql._ > object DoubleJobs { > def main(args: Array[String]) { > System.setProperty("hadoop.home.dir", "/tmp"); > val sparkSession: SparkSession = SparkSession.builder > .master("local[4]") > .appName("spark session example") > .config("spark.driver.maxResultSize", "6G") > .config("spark.sql.orc.filterPushdown", true) > .config("spark.sql.hive.metastorePartitionPruning", true) > .getOrCreate() > sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > val paths = Seq( > ""//some orc source > ) > def dataFrame(path: String): DataFrame = { > sparkSession.read.orc(path) > } > paths.foreach(path => { > dataFrame(path).show(20) > }) > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19628) Duplicate Spark jobs in 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19628: - Component/s: (was: Spark Core) SQL > Duplicate Spark jobs in 2.1.0 > - > > Key: SPARK-19628 > URL: https://issues.apache.org/jira/browse/SPARK-19628 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Jork Zijlstra > Fix For: 2.0.1 > > Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, > spark2.1.0.png > > > After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs > executed. Going back to Spark 2.0.1 they are gone again > {code} > import org.apache.spark.sql._ > object DoubleJobs { > def main(args: Array[String]) { > System.setProperty("hadoop.home.dir", "/tmp"); > val sparkSession: SparkSession = SparkSession.builder > .master("local[4]") > .appName("spark session example") > .config("spark.driver.maxResultSize", "6G") > .config("spark.sql.orc.filterPushdown", true) > .config("spark.sql.hive.metastorePartitionPruning", true) > .getOrCreate() > sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > val paths = Seq( > ""//some orc source > ) > def dataFrame(path: String): DataFrame = { > sparkSession.read.orc(path) > } > paths.foreach(path => { > dataFrame(path).show(20) > }) > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19628) Duplicate Spark jobs in 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jork Zijlstra updated SPARK-19628: -- Attachment: spark2.1.0-examplecode.png > Duplicate Spark jobs in 2.1.0 > - > > Key: SPARK-19628 > URL: https://issues.apache.org/jira/browse/SPARK-19628 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Jork Zijlstra > Fix For: 2.0.1 > > Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, > spark2.1.0.png > > > After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs > executed. Going back to Spark 2.0.1 they are gone again > {code} > import org.apache.spark.sql._ > object DoubleJobs { > def main(args: Array[String]) { > System.setProperty("hadoop.home.dir", "/tmp"); > val sparkSession: SparkSession = SparkSession.builder > .master("local[4]") > .appName("spark session example") > .config("spark.driver.maxResultSize", "6G") > .config("spark.sql.orc.filterPushdown", true) > .config("spark.sql.hive.metastorePartitionPruning", true) > .getOrCreate() > sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > val paths = Seq( > ""//some orc source > ) > def dataFrame(path: String): DataFrame = { > sparkSession.read.orc(path) > } > paths.foreach(path => { > dataFrame(path).show(20) > }) > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19628) Duplicate Spark jobs in 2.1.0
[ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jork Zijlstra updated SPARK-19628: -- Attachment: spark2.0.1.png spark2.1.0.png > Duplicate Spark jobs in 2.1.0 > - > > Key: SPARK-19628 > URL: https://issues.apache.org/jira/browse/SPARK-19628 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Jork Zijlstra > Fix For: 2.0.1 > > Attachments: spark2.0.1.png, spark2.1.0.png > > > After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs > executed. Going back to Spark 2.0.1 they are gone again > {code} > import org.apache.spark.sql._ > object DoubleJobs { > def main(args: Array[String]) { > System.setProperty("hadoop.home.dir", "/tmp"); > val sparkSession: SparkSession = SparkSession.builder > .master("local[4]") > .appName("spark session example") > .config("spark.driver.maxResultSize", "6G") > .config("spark.sql.orc.filterPushdown", true) > .config("spark.sql.hive.metastorePartitionPruning", true) > .getOrCreate() > sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > val paths = Seq( > ""//some orc source > ) > def dataFrame(path: String): DataFrame = { > sparkSession.read.orc(path) > } > paths.foreach(path => { > dataFrame(path).show(20) > }) > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org