[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149083#comment-17149083 ] Jason Moore commented on SPARK-27780: - I encounter this on 3.0.0 running with a much older shuffle service (I think like 2.1.1). Is there any documentation on which shuffle services 3.0.0 will work with? > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32136) Spark producing incorrect groupBy results when key is a struct with nullable properties
[ https://issues.apache.org/jira/browse/SPARK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149073#comment-17149073 ] Jason Moore commented on SPARK-32136: - Here is a similar test, and why it's a problem for what I'm needing to do: {noformat} case class C(d: Double) case class B(c: Option[C]) case class A(b: Option[B]) val df = Seq( A(None), A(Some(B(None))), A(Some(B(Some(C(1.0) ).toDF val res = df.groupBy("b").agg(count("*")) > res.show +---++ | b|count(1)| +---++ | [[]]| 2| |[[1.0]]| 1| +---++ > res.as[(Option[B], Long)].collect java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException: Null value appeared in non-nullable field: - field (class: "scala.Double", name: "d") - option value class: "C" - field (class: "scala.Option", name: "c") - option value class: "B" - field (class: "scala.Option", name: "_1") - root class: "scala.Tuple2" If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int). newInstance(class scala.Tuple2) {noformat} Interestingly, and potentially usefully to know, that using an Int instead of a Double above works as expected. > Spark producing incorrect groupBy results when key is a struct with nullable > properties > --- > > Key: SPARK-32136 > URL: https://issues.apache.org/jira/browse/SPARK-32136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jason Moore >Priority: Major > > I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm > noticing a behaviour change in a particular aggregation we're doing, and I > think I've tracked it down to how Spark is now treating nullable properties > within the column being grouped by. > > Here's a simple test I've been able to set up to repro it: > > {code:scala} > case class B(c: Option[Double]) > case class A(b: Option[B]) > val df = Seq( > A(None), > A(Some(B(None))), > A(Some(B(Some(1.0 > ).toDF > val res = df.groupBy("b").agg(count("*")) > {code} > Spark 2.4.6 has the expected result: > {noformat} > > res.show > +-++ > |b|count(1)| > +-++ > | []| 1| > | null| 1| > |[1.0]| 1| > +-++ > > res.collect.foreach(println) > [[null],1] > [null,1] > [[1.0],1] > {noformat} > But Spark 3.0.0 has an unexpected result: > {noformat} > > res.show > +-++ > |b|count(1)| > +-++ > | []| 2| > |[1.0]| 1| > +-++ > > res.collect.foreach(println) > [[null],2] > [[1.0],1] > {noformat} > Notice how it has keyed one of the values in be as `[null]`; that is, an > instance of B with a null value for the `c` property instead of a null for > the overall value itself. > Is this an intended change? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32136) Spark producing incorrect groupBy results when key is a struct with nullable properties
[ https://issues.apache.org/jira/browse/SPARK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-32136: Description: I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm noticing a behaviour change in a particular aggregation we're doing, and I think I've tracked it down to how Spark is now treating nullable properties within the column being grouped by. Here's a simple test I've been able to set up to repro it: {code:scala} case class B(c: Option[Double]) case class A(b: Option[B]) val df = Seq( A(None), A(Some(B(None))), A(Some(B(Some(1.0 ).toDF val res = df.groupBy("b").agg(count("*")) {code} Spark 2.4.6 has the expected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 1| | null| 1| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],1] [null,1] [[1.0],1] {noformat} But Spark 3.0.0 has an unexpected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 2| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],2] [[1.0],1] {noformat} Notice how it has keyed one of the values in be as `[null]`; that is, an instance of B with a null value for the `c` property instead of a null for the overall value itself. Is this an intended change? was: I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm noticing a behaviour change in a particular aggregation we're doing, and I think I've tracked it down to how Spark is now treating nullable properties within the column being grouped by. Here's a simple test I've been able to set up to repro it: {code:scala} case class B(c: Option[Double]) case class A(b: Option[B]) val df = Seq( A(None), A(Some(B(None))), A(Some(B(Some(1.0 ).toDF val res = df.groupBy("b").agg(count("*")) {code} Spark 2.4.6 has the expected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 1| | null| 1| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],1] [null,1] [[1.0],1] {noformat} But Spark 3.0.0 has an unexpected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 2| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],2] [[1.0],1] {noformat} Notice how it has keyed one of the values in be as `[null]`; that is, an instance of B with a null value for the `c` property instead of a null for the overall value itself. Is this an intended change? > Spark producing incorrect groupBy results when key is a struct with nullable > properties > --- > > Key: SPARK-32136 > URL: https://issues.apache.org/jira/browse/SPARK-32136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jason Moore >Priority: Major > > I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm > noticing a behaviour change in a particular aggregation we're doing, and I > think I've tracked it down to how Spark is now treating nullable properties > within the column being grouped by. > > Here's a simple test I've been able to set up to repro it: > > {code:scala} > case class B(c: Option[Double]) > case class A(b: Option[B]) > val df = Seq( > A(None), > A(Some(B(None))), > A(Some(B(Some(1.0 > ).toDF > val res = df.groupBy("b").agg(count("*")) > {code} > Spark 2.4.6 has the expected result: > {noformat} > > res.show > +-++ > |b|count(1)| > +-++ > | []| 1| > | null| 1| > |[1.0]| 1| > +-++ > > res.collect.foreach(println) > [[null],1] > [null,1] > [[1.0],1] > {noformat} > But Spark 3.0.0 has an unexpected result: > {noformat} > > res.show > +-++ > |b|count(1)| > +-++ > | []| 2| > |[1.0]| 1| > +-++ > > res.collect.foreach(println) > [[null],2] > [[1.0],1] > {noformat} > Notice how it has keyed one of the values in be as `[null]`; that is, an > instance of B with a null value for the `c` property instead of a null for > the overall value itself. > Is this an intended change? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32136) Spark producing incorrect groupBy results when key is a struct with nullable properties
Jason Moore created SPARK-32136: --- Summary: Spark producing incorrect groupBy results when key is a struct with nullable properties Key: SPARK-32136 URL: https://issues.apache.org/jira/browse/SPARK-32136 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Jason Moore I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm noticing a behaviour change in a particular aggregation we're doing, and I think I've tracked it down to how Spark is now treating nullable properties within the column being grouped by. Here's a simple test I've been able to set up to repro it: {code:scala} case class B(c: Option[Double]) case class A(b: Option[B]) val df = Seq( A(None), A(Some(B(None))), A(Some(B(Some(1.0 ).toDF val res = df.groupBy("b").agg(count("*")) {code} Spark 2.4.6 has the expected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 1| | null| 1| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],1] [null,1] [[1.0],1] {noformat} But Spark 3.0.0 has an unexpected result: {noformat} > res.show +-++ |b|count(1)| +-++ | []| 2| |[1.0]| 1| +-++ > res.collect.foreach(println) [[null],2] [[1.0],1] {noformat} Notice how it has keyed one of the values in be as `[null]`; that is, an instance of B with a null value for the `c` property instead of a null for the overall value itself. Is this an intended change? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688957#comment-16688957 ] Jason Moore edited comment on SPARK-26027 at 11/16/18 3:10 AM: --- I was originally going to withdraw the ticket when I discovered my actual issue, and happy for that to happen. The main concern I was left with was that the build scripts download based on the default Scala version (2.11.12 on v2.4.0 tag) rather than taking the profile flag into account). If you don't see this as an issue to worry about, close this ticket and forget all about it. was (Author: jasonmoore2k): I was originally going to withdraw the ticket when I discovered my actual issue, and happy for that to happen. The main concern I was left with was that the build scripts download based on the default Scala version (2.11.12 on v2.4.0 tag) rater than taking the profile flag into account). If you don't see this as an issue to worry about, close this ticket and forget all about it. > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Minor > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688957#comment-16688957 ] Jason Moore commented on SPARK-26027: - I was originally going to withdraw the ticket when I discovered my actual issue, and happy for that to happen. The main concern I was left with was that the build scripts download based on the default Scala version (2.11.12 on v2.4.0 tag) rater than taking the profile flag into account). If you don't see this as an issue to worry about, close this ticket and forget all about it. > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Minor > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684567#comment-16684567 ] Jason Moore edited comment on SPARK-26027 at 11/13/18 12:49 AM: Actually, the fact that scala-library 2.11.12 is downloaded to launch the zinc server turned out to be a red herring and my issue was not properly setting the profile to scala-2.12. I'm not sure of the repercussion of having the zinc server started by ./build/mvn having 2.11.12 though (see https://github.com/apache/spark/blob/v2.4.0/build/mvn#L155) so maybe someone with a more intimate knowledge of it can comment. was (Author: jasonmoore2k): Actually, the fact that scala-library 2.11.12 is downloaded to launch the zinc server turned out to be a red herring and my issue was not properly setting the profile to scala-2.12. I'm not sure of the repercussion of having the zinc server started by ./build/mvn having 2.11.12 though (see [https://github.com/apache/spark/blob/master/build/mvn#L155|https://github.com/apache/spark/blob/master/build/mvn#L155).]) so maybe someone with a more intimate knowledge of it can comment. > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Minor > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684567#comment-16684567 ] Jason Moore commented on SPARK-26027: - Actually, the fact that scala-library 2.11.12 is downloaded to launch the zinc server turned out to be a red herring and my issue was not properly setting the profile to scala-2.12. I'm not sure of the repercussion of having the zinc server started by ./build/mvn having 2.11.12 though (see [https://github.com/apache/spark/blob/master/build/mvn#L155|https://github.com/apache/spark/blob/master/build/mvn#L155).]) so maybe someone with a more intimate knowledge of it can comment. > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Minor > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-26027: Priority: Minor (was: Major) > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Minor > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-26027: Description: In ./build/mvn, from pom.xml is used to determine which Scala library to fetch but it doesn't seem to use the value under the scala-2.12 profile even if that is set. The result is that the maven build still uses scala-library 2.11.12 and compilation fails. Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think that only updates scala.binary.version) was: In ./build/mvn, from pom.xml is used to determine which Scala library to fetch but it doesn't seem to use the value under the scala-2.12 profile even if that is set. The result is that the maven build still uses scala-library 2.11.12 and compilation fails. Am I missing a step? > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Major > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? (I do run ./dev/change-scala-version.sh 2.12 but I think > that only updates scala.binary.version) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
[ https://issues.apache.org/jira/browse/SPARK-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-26027: Description: In ./build/mvn, from pom.xml is used to determine which Scala library to fetch but it doesn't seem to use the value under the scala-2.12 profile even if that is set. The result is that the maven build still uses scala-library 2.11.12 and compilation fails. Am I missing a step? was: In ./build/mvn, from pom.xml is used to determine which Scala library to fetch but it doesn't seem to use the value under the scala-2.12 profile even if that is set. The impact is that the maven build still uses scala-library 2.11.12 and compilation fails. Am I missing a step? > Unable to build Spark for Scala 2.12 with Maven script provided > --- > > Key: SPARK-26027 > URL: https://issues.apache.org/jira/browse/SPARK-26027 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Jason Moore >Priority: Major > > In ./build/mvn, from pom.xml is used to determine which Scala > library to fetch but it doesn't seem to use the value under the scala-2.12 > profile even if that is set. > The result is that the maven build still uses scala-library 2.11.12 and > compilation fails. > Am I missing a step? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26027) Unable to build Spark for Scala 2.12 with Maven script provided
Jason Moore created SPARK-26027: --- Summary: Unable to build Spark for Scala 2.12 with Maven script provided Key: SPARK-26027 URL: https://issues.apache.org/jira/browse/SPARK-26027 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.0 Reporter: Jason Moore In ./build/mvn, from pom.xml is used to determine which Scala library to fetch but it doesn't seem to use the value under the scala-2.12 profile even if that is set. The impact is that the maven build still uses scala-library 2.11.12 and compilation fails. Am I missing a step? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20411) New features for expression.scalalang.typed
[ https://issues.apache.org/jira/browse/SPARK-20411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992517#comment-15992517 ] Jason Moore commented on SPARK-20411: - And, ideally, anything else within org.apache.spark.sql.functions (e.g. countDistinct). We're looking to replace our use of DataFrames with Datasets, which means finding a replacement for all the aggregation functions that we use. If I end up putting together some functions myself, I'll pop back here to contribute them. > New features for expression.scalalang.typed > --- > > Key: SPARK-20411 > URL: https://issues.apache.org/jira/browse/SPARK-20411 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.1.0 >Reporter: Loic Descotte >Priority: Minor > > In Spark 2 it is possible to use typed expressions for aggregation methods: > {code} > import org.apache.spark.sql.expressions.scalalang._ > dataset.groupByKey(_.productId).agg(typed.sum[Token](_.score)).toDF("productId", > "sum").orderBy('productId).show > {code} > It seems that only avg, count and sum are defined : > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/expressions/scalalang/typed.html > It is very nice to be able to use a typesafe DSL, but it would be good to > have more possibilities, like min and max functions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data
[ https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875115#comment-15875115 ] Jason Moore commented on SPARK-18105: - I've hit the same using a very recent build from branch-2.1 (b083ec5115f53a79ac54b85024c358510a03a459). > LZ4 failed to decompress a stream of shuffled data > -- > > Key: SPARK-18105 > URL: https://issues.apache.org/jira/browse/SPARK-18105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Davies Liu >Assignee: Davies Liu > > When lz4 is used to compress the shuffle files, it may fail to decompress it > as "stream is corrupt" > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in > stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted > at > org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220) > at > org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109) > at java.io.BufferedInputStream.read(BufferedInputStream.java:353) > at java.io.DataInputStream.read(DataInputStream.java:149) > at com.google.common.io.ByteStreams.read(ByteStreams.java:828) > at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > https://github.com/jpountz/lz4-java/issues/89 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17436) dataframe.write sometimes does not keep sorting
[ https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836847#comment-15836847 ] Jason Moore commented on SPARK-17436: - Ahh, I think you are correct. The issue on the write seems to be resolved as of 2.1+ (checked using parquet-tools). On a read, some optimizations were made in https://github.com/apache/spark/pull/12095 that would mean that some files may get merged into a single partition (which will almost certainly be not sorted). I mentioned the issue mailing list, but had missed the (useful) responses: http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-within-partitions-is-not-maintained-in-parquet-td18618.html If I set spark.sql.files.openCostInBytes very high, the re-read test now seems fine. Thanks! > dataframe.write sometimes does not keep sorting > --- > > Key: SPARK-17436 > URL: https://issues.apache.org/jira/browse/SPARK-17436 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Ran Haim >Priority: Minor > > update > *** > It seems that in spark 2.1 code, the sorting issue is resolved. > The sorter does consider inner sorting in the sorting key - but I think it > will be faster to just insert the rows to a list in a hash map. > *** > When using partition by, datawriter can sometimes mess up an ordered > dataframe. > The problem originates in > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer. > In the writeRows method when too many files are opened (configurable), it > starts inserting rows to UnsafeKVExternalSorter, then it reads all the rows > again from the sorter and writes them to the corresponding files. > The problem is that the sorter actually sorts the rows using the partition > key, and that can sometimes mess up the original sort (or secondary sort if > you will). > I think the best way to fix it is to stop using a sorter, and just put the > rows in a map using key as partition key and value as an arraylist, and then > just walk through all the keys and write it in the original order - this will > probably be faster as there no need for ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17436) dataframe.write sometimes does not keep sorting
[ https://issues.apache.org/jira/browse/SPARK-17436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830889#comment-15830889 ] Jason Moore commented on SPARK-17436: - Hi [~ran.h...@optimalplus.com], [~srowen], How sure are we that this is not a problem anymore? I've been testing some of my application code that relies on preserving ordering (for performance) on 2.1 (build from head of branch-2.1) and it still seems that after sorting within partitions, saving to parquet, then re-reading from parquet, that the rows are not correctly ordered. {code} val random = new scala.util.Random val raw = sc.parallelize((1 to 100).map { _ => random.nextInt }).toDF("A").repartition(100) raw.sortWithinPartitions("A").write.mode("overwrite").parquet("maybeSorted.parquet") val maybeSorted = spark.read.parquet("maybeSorted.parquet") val isSorted = maybeSorted.mapPartitions { rows => val isSorted = rows.map(_.getInt(0)).sliding(2).forall { case List(x, y) => x <= y } Iterator(isSorted) }.reduce(_ && _) {code} > dataframe.write sometimes does not keep sorting > --- > > Key: SPARK-17436 > URL: https://issues.apache.org/jira/browse/SPARK-17436 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Ran Haim >Priority: Minor > > update > *** > It seems that in spark 2.1 code, the sorting issue is resolved. > The sorter does consider inner sorting in the sorting key - but I think it > will be faster to just insert the rows to a list in a hash map. > *** > When using partition by, datawriter can sometimes mess up an ordered > dataframe. > The problem originates in > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer. > In the writeRows method when too many files are opened (configurable), it > starts inserting rows to UnsafeKVExternalSorter, then it reads all the rows > again from the sorter and writes them to the corresponding files. > The problem is that the sorter actually sorts the rows using the partition > key, and that can sometimes mess up the original sort (or secondary sort if > you will). > I think the best way to fix it is to stop using a sorter, and just put the > rows in a map using key as partition key and value as an arraylist, and then > just walk through all the keys and write it in the original order - this will > probably be faster as there no need for ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
[ https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453801#comment-15453801 ] Jason Moore commented on SPARK-17195: - That's right, and I totally agree that's where the fix needs to be. And I'm pressing them to make this fix. I guess that means that this ticket can be closed, as it seems a reasonable workaround within Spark itself isn't possible. Once the TD driver has been fixed I'll return here to mention the version it is fixed in. > Dealing with JDBC column nullability when it is not reliable > > > Key: SPARK-17195 > URL: https://issues.apache.org/jira/browse/SPARK-17195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jason Moore > > Starting with Spark 2.0.0, the column "nullable" property is important to > have correct for the code generation to work properly. Marking the column as > nullable = false used to (<2.0.0) allow null values to be operated on, but > now this will result in: > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > {noformat} > I'm all for the change towards a more ridged behavior (enforcing correct > input). But the problem I'm facing now is that when I used JDBC to read from > a Teradata server, the column nullability is often not correct (particularly > when sub-queries are involved). > This is the line in question: > https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140 > I'm trying to work out what would be the way forward for me on this. I know > that it's really the fault of the Teradata database server not returning the > correct schema, but I'll need to make Spark itself or my application > resilient to this behavior. > One of the Teradata JDBC Driver tech leads has told me that "when the > rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length > string, then the other metadata values may not be completely accurate" - so > one option could be to treat the nullability (at least) the same way as the > "unknown" case (as nullable = true). For reference, see the rest of our > discussion here: > http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability > Any other thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
[ https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432314#comment-15432314 ] Jason Moore commented on SPARK-17195: - That's right. The JDBC API has ResultSetMetaData.isNullable returning: * ResultSetMetaData.columnNoNulls (= 0) which means the column does not allow NULL values * ResultSetMetaData.columnNullable (= 1) which means the column allows NULL values * ResultSetMetaData.columnNullableUnknown (= 2) which means the nullability of a column's values is unknown In Spark we take this result and do as you've described: If something is not non null then it is nullable. See first link in the ticket description above. > Dealing with JDBC column nullability when it is not reliable > > > Key: SPARK-17195 > URL: https://issues.apache.org/jira/browse/SPARK-17195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jason Moore > > Starting with Spark 2.0.0, the column "nullable" property is important to > have correct for the code generation to work properly. Marking the column as > nullable = false used to (<2.0.0) allow null values to be operated on, but > now this will result in: > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > {noformat} > I'm all for the change towards a more ridged behavior (enforcing correct > input). But the problem I'm facing now is that when I used JDBC to read from > a Teradata server, the column nullability is often not correct (particularly > when sub-queries are involved). > This is the line in question: > https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140 > I'm trying to work out what would be the way forward for me on this. I know > that it's really the fault of the Teradata database server not returning the > correct schema, but I'll need to make Spark itself or my application > resilient to this behavior. > One of the Teradata JDBC Driver tech leads has told me that "when the > rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length > string, then the other metadata values may not be completely accurate" - so > one option could be to treat the nullability (at least) the same way as the > "unknown" case (as nullable = true). For reference, see the rest of our > discussion here: > http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability > Any other thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
[ https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432285#comment-15432285 ] Jason Moore commented on SPARK-17195: - Correct, currently Spark doesn't allow us to override what the driver determines as the schema to apply. It wasn't that I was able to control this in the past, but that everything worked fine regardless of the value of StructField.nullable. But now in 2.0, it appears to me that the new code generation will break when processing a (in my case String) field expected to not be null, but is null. It's definitely not Spark's fault (I hope I've made that clear enough) and it's either in the JDBC driver, or further downstream in the database server itself, that the blame lies. Over on the Teradata forums (link in the description above) I'm trying to raise with their team the possibility that they could be marking the column as "columnNullableUnknown" which Spark will then map to nullable=true. We'll see if they can manage to do that, and then I hope my problem will be solved. > Dealing with JDBC column nullability when it is not reliable > > > Key: SPARK-17195 > URL: https://issues.apache.org/jira/browse/SPARK-17195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jason Moore > > Starting with Spark 2.0.0, the column "nullable" property is important to > have correct for the code generation to work properly. Marking the column as > nullable = false used to (<2.0.0) allow null values to be operated on, but > now this will result in: > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > {noformat} > I'm all for the change towards a more ridged behavior (enforcing correct > input). But the problem I'm facing now is that when I used JDBC to read from > a Teradata server, the column nullability is often not correct (particularly > when sub-queries are involved). > This is the line in question: > https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140 > I'm trying to work out what would be the way forward for me on this. I know > that it's really the fault of the Teradata database server not returning the > correct schema, but I'll need to make Spark itself or my application > resilient to this behavior. > One of the Teradata JDBC Driver tech leads has told me that "when the > rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length > string, then the other metadata values may not be completely accurate" - so > one option could be to treat the nullability (at least) the same way as the > "unknown" case (as nullable = true). For reference, see the rest of our > discussion here: > http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability > Any other thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
[ https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432129#comment-15432129 ] Jason Moore commented on SPARK-17195: - > I think it might be sensible to support this by implementing > `SchemaRelationProvider`. That was one of the thoughts I had too. > it should be fixed in Teradata I more or less agree with this, but I'm not certain their JDBC driver is being provided with everything they need from the server to decide that it should be "columnNullableUnknown". Maybe I'll shoot some more questions their way on this. > Dealing with JDBC column nullability when it is not reliable > > > Key: SPARK-17195 > URL: https://issues.apache.org/jira/browse/SPARK-17195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jason Moore > > Starting with Spark 2.0.0, the column "nullable" property is important to > have correct for the code generation to work properly. Marking the column as > nullable = false used to (<2.0.0) allow null values to be operated on, but > now this will result in: > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > {noformat} > I'm all for the change towards a more ridged behavior (enforcing correct > input). But the problem I'm facing now is that when I used JDBC to read from > a Teradata server, the column nullability is often not correct (particularly > when sub-queries are involved). > This is the line in question: > https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140 > I'm trying to work out what would be the way forward for me on this. I know > that it's really the fault of the Teradata database server not returning the > correct schema, but I'll need to make Spark itself or my application > resilient to this behavior. > One of the Teradata JDBC Driver tech leads has told me that "when the > rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length > string, then the other metadata values may not be completely accurate" - so > one option could be to treat the nullability (at least) the same way as the > "unknown" case (as nullable = true). For reference, see the rest of our > discussion here: > http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability > Any other thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
Jason Moore created SPARK-17195: --- Summary: Dealing with JDBC column nullability when it is not reliable Key: SPARK-17195 URL: https://issues.apache.org/jira/browse/SPARK-17195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Jason Moore Starting with Spark 2.0.0, the column "nullable" property is important to have correct for the code generation to work properly. Marking the column as nullable = false used to (<2.0.0) allow null values to be operated on, but now this will result in: {noformat} Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) {noformat} I'm all for the change towards a more ridged behavior (enforcing correct input). But the problem I'm facing now is that when I used JDBC to read from a Teradata server, the column nullability is often not correct (particularly when sub-queries are involved). This is the line in question: https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140 I'm trying to work out what would be the way forward for me on this. I know that it's really the fault of the Teradata database server not returning the correct schema, but I'll need to make Spark itself or my application resilient to this behavior. One of the Teradata JDBC Driver tech leads has told me that "when the rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length string, then the other metadata values may not be completely accurate" - so one option could be to treat the nullability (at least) the same way as the "unknown" case (as nullable = true). For reference, see the rest of our discussion here: http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability Any other thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17022) Potential deadlock in driver handling message
[ https://issues.apache.org/jira/browse/SPARK-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418470#comment-15418470 ] Jason Moore commented on SPARK-17022: - This one is maybe related to SPARK-16533 and/or SPARK-16702, right? My team works in an environment where preemption (and killing of executors) is a common occurrence, so have been burnt a bit by this one. We had been putting together a patch, but I'll see how this one holds up. > Potential deadlock in driver handling message > - > > Key: SPARK-17022 > URL: https://issues.apache.org/jira/browse/SPARK-17022 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 2.0.0 >Reporter: Tao Wang >Assignee: Tao Wang >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Suggest t1 < t2 < t3 > At t1, someone called YarnSchedulerBackend.doRequestTotalExecutors from one > of three functions: CoarseGrainedSchedulerBackend.killExecutors, > CoarseGrainedSchedulerBackend.requestTotalExecutors or > CoarseGrainedSchedulerBackend.requestExecutors, in all of which will hold the > lock `CoarseGrainedSchedulerBackend`. > Then YarnSchedulerBackend.doRequestTotalExecutors will send a > RequestExecutors message to `yarnSchedulerEndpoint` and wait for reply. > At t2, someone send a RemoveExecutor to `yarnSchedulerEndpoint` and the > message is received by the endpoint. > At t3, the RequestExexutor message sent at t1 is received by the endpoint. > Then the endpoint would first handle RemoveExecutor then the RequestExecutor > message. > When handling RemoveExecutor, it would send the same message to > `driverEndpoint` and wait for reply. > In `driverEndpoint` it will request lock `CoarseGrainedSchedulerBackend` to > handle that message, while the lock has been occupied in t1. > So it would cause a deadlock. > We have found the issue in our deployment, it would block the driver to make > it handle no messages until the two message all went timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[ https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-14915: Affects Version/s: 2.0.0 1.5.3 > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job to never complete > --- > > Key: SPARK-14915 > URL: https://issues.apache.org/jira/browse/SPARK-14915 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.3, 1.6.2, 2.0.0 >Reporter: Jason Moore >Assignee: Jason Moore >Priority: Critical > Fix For: 2.0.0 > > > In SPARK-14357, code was corrected towards the originally intended behavior > that a CommitDeniedException should not count towards the failure count for a > job. After having run with this fix for a few weeks, it's become apparent > that this behavior has some unintended consequences - that a speculative task > will continuously receive a CDE from the driver, now causing it to fail and > retry over and over without limit. > I'm thinking we could put a task that receives a CDE from the driver, into a > TaskState.FINISHED or some other state to indicated that the task shouldn't > be resubmitted by the TaskScheduler. I'd probably need some opinions on > whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[ https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261252#comment-15261252 ] Jason Moore commented on SPARK-14915: - That's exactly my current thinking too. But even if keep allowing some tasks to be retried without limit in certain contexts (the current two I'm aware of are: commit denied on speculative tasks or an executor lost because of a YARN de-allocation), it does seem that the commit denied is often happening when another copy has already succeeded. I'm about to do some testing on this now, and not re-queuing in this scenario. > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job to never complete > --- > > Key: SPARK-14915 > URL: https://issues.apache.org/jira/browse/SPARK-14915 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Jason Moore >Priority: Critical > > In SPARK-14357, code was corrected towards the originally intended behavior > that a CommitDeniedException should not count towards the failure count for a > job. After having run with this fix for a few weeks, it's become apparent > that this behavior has some unintended consequences - that a speculative task > will continuously receive a CDE from the driver, now causing it to fail and > retry over and over without limit. > I'm thinking we could put a task that receives a CDE from the driver, into a > TaskState.FINISHED or some other state to indicated that the task shouldn't > be resubmitted by the TaskScheduler. I'd probably need some opinions on > whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[ https://issues.apache.org/jira/browse/SPARK-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259920#comment-15259920 ] Jason Moore commented on SPARK-14915: - Could I get thoughts on this: at [TaskSetManager.scala#L723|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L723] a call is made to addPendingTask after a task has failed. I can think of a scenario that it might be a good idea not to add the task back into the pending queue: when success(index) == true (which implies that another copy of the task has already succeeded). I'm soon going to test it out with the condition, as I think it's quite possibly what is causing tasks to continually re-queue after a CDE until the stage has completed (further lengthening the duration of the stage, as that take up execution resources). > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job to never complete > --- > > Key: SPARK-14915 > URL: https://issues.apache.org/jira/browse/SPARK-14915 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Jason Moore >Priority: Critical > > In SPARK-14357, code was corrected towards the originally intended behavior > that a CommitDeniedException should not count towards the failure count for a > job. After having run with this fix for a few weeks, it's become apparent > that this behavior has some unintended consequences - that a speculative task > will continuously receive a CDE from the driver, now causing it to fail and > retry over and over without limit. > I'm thinking we could put a task that receives a CDE from the driver, into a > TaskState.FINISHED or some other state to indicated that the task shouldn't > be resubmitted by the TaskScheduler. I'd probably need some opinions on > whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14915) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
Jason Moore created SPARK-14915: --- Summary: Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete Key: SPARK-14915 URL: https://issues.apache.org/jira/browse/SPARK-14915 Project: Spark Issue Type: Bug Affects Versions: 1.6.2 Reporter: Jason Moore Priority: Critical In SPARK-14357, code was corrected towards the originally intended behavior that a CommitDeniedException should not count towards the failure count for a job. After having run with this fix for a few weeks, it's become apparent that this behavior has some unintended consequences - that a speculative task will continuously receive a CDE from the driver, now causing it to fail and retry over and over without limit. I'm thinking we could put a task that receives a CDE from the driver, into a TaskState.FINISHED or some other state to indicated that the task shouldn't be resubmitted by the TaskScheduler. I'd probably need some opinions on whether there are other consequences for doing something like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[ https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore resolved SPARK-14357. - Resolution: Fixed Issue resolved by pull request 12228 https://github.com/apache/spark/pull/12228 > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job failure > - > > Key: SPARK-14357 > URL: https://issues.apache.org/jira/browse/SPARK-14357 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.0, 1.6.1 >Reporter: Jason Moore >Assignee: Jason Moore >Priority: Critical > Fix For: 1.6.2, 2.0.0, 1.5.2 > > > Speculation can often result in a CommitDeniedException, but ideally this > shouldn't result in the job failing. So changes were made along with > SPARK-8167 to ensure that the CommitDeniedException is caught and given a > failure reason that doesn't increment the failure count. > However, I'm still noticing that this exception is causing jobs to fail using > the 1.6.1 release version. > {noformat} > 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in > stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage > 315.0 (TID 100793, qaphdd099.quantium.com.au.local): > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Failed to commit task > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267) > ... 8 more > Caused by: org.apache.spark.executor.CommitDeniedException: > attempt_201604041136_0315_m_18_8: Not committed because the driver did > not authorize commit > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282) > ... 9 more > {noformat} > It seems to me that the CommitDeniedException gets wrapped into a > RuntimeException at > [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286] > and then into a SparkException at > [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154] > which results in it not being able to be handled properly at > [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290] > The solution might be that this catch block should type match on the > inner-most cause of an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[ https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Moore updated SPARK-14357: Component/s: Spark Core > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job failure > - > > Key: SPARK-14357 > URL: https://issues.apache.org/jira/browse/SPARK-14357 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.0, 1.6.1 >Reporter: Jason Moore >Priority: Critical > > Speculation can often result in a CommitDeniedException, but ideally this > shouldn't result in the job failing. So changes were made along with > SPARK-8167 to ensure that the CommitDeniedException is caught and given a > failure reason that doesn't increment the failure count. > However, I'm still noticing that this exception is causing jobs to fail using > the 1.6.1 release version. > {noformat} > 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in > stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage > 315.0 (TID 100793, qaphdd099.quantium.com.au.local): > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Failed to commit task > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267) > ... 8 more > Caused by: org.apache.spark.executor.CommitDeniedException: > attempt_201604041136_0315_m_18_8: Not committed because the driver did > not authorize commit > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282) > ... 9 more > {noformat} > It seems to me that the CommitDeniedException gets wrapped into a > RuntimeException at > [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286] > and then into a SparkException at > [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154] > which results in it not being able to be handled properly at > [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290] > The solution might be that this catch block should type match on the > inner-most cause of an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
Jason Moore created SPARK-14357: --- Summary: Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure Key: SPARK-14357 URL: https://issues.apache.org/jira/browse/SPARK-14357 Project: Spark Issue Type: Bug Affects Versions: 1.6.1, 1.6.0, 1.5.2 Reporter: Jason Moore Priority: Critical Speculation can often result in a CommitDeniedException, but ideally this shouldn't result in the job failing. So changes were made along with SPARK-8167 to ensure that the CommitDeniedException is caught and given a failure reason that doesn't increment the failure count. However, I'm still noticing that this exception is causing jobs to fail using the 1.6.1 release version. {noformat} 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job. org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage 315.0 (TID 100793, qaphdd099.quantium.com.au.local): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to commit task at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267) ... 8 more Caused by: org.apache.spark.executor.CommitDeniedException: attempt_201604041136_0315_m_18_8: Not committed because the driver did not authorize commit at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282) ... 9 more {noformat} It seems to me that the CommitDeniedException gets wrapped into a RuntimeException at [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286] and then into a SparkException at [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154] which results in it not being able to be handled properly at [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290] The solution might be that this catch block should type match on the inner-most cause of an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8892) Column.cast(LongType) does not work for large values
Jason Moore created SPARK-8892: -- Summary: Column.cast(LongType) does not work for large values Key: SPARK-8892 URL: https://issues.apache.org/jira/browse/SPARK-8892 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Jason Moore It seems that casting a column from String to Long seems to go through an intermediate step of being cast to a Double (hits Cast.scala line 328 in castToDecimal). The result is that for large values, the wrong value is returned. This test reveals this bug: {code:java} import org.apache.spark.sql.types._ import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.{SparkConf, SparkContext} import org.scalatest.FlatSpec import scala.util.Random class DataFrameCastBug extends FlatSpec { DataFrame should cast StringType to LongType correctly in { val sc = new SparkContext(new SparkConf().setMaster(local).setAppName(app)) val qc = new SQLContext(sc) val values = Seq.fill(10)(Random.nextLong) val source = qc.createDataFrame( sc.parallelize(values.map(v = Row(v))), StructType(Seq(StructField(value, LongType val result = source.select(source(value), source(value).cast(StringType).cast(LongType).as(castValue)) assert(result.where(result(value) !== result(castValue)).count() === 0) } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org