[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841339#comment-17841339 ] PJ Fanning commented on SPARK-47959: [~zshao] if you have a test environment, could you try it with the 2.18.0-SNAPSHOT Jackson jars to see if they halp? > Improve GET_JSON_OBJECT performance on executors running multiple tasks > --- > > Key: SPARK-47959 > URL: https://issues.apache.org/jira/browse/SPARK-47959 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Zheng Shao >Priority: Major > > We have a Spark executor that is running 32 workers in parallel. The query > is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. > We noticed that 80+% of the stacktrace of the worker threads are blocked on > the following stacktrace: > > {code:java} > com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - > blocked on java.lang.Object@7529fde1 > com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) > > com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown > Source) > ... > {code} > > Apparently jackson-core has such a performance bug from version 2.3 - 2.15, > and not fixed until version 2.18 (unreleased): > [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] > > {code:java} > synchronized (lock) { > if (size() >= MAX_ENTRIES) { > clear(); > } > } > {code} > > instead of > [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] > > {code:java} > /* As of 2.18, the limit is not strictly enforced, but we do try > to > * clear entries if we have reached the limit. We do not expect to > * go too much over the limit, and if we do, it's not a huge > problem. > * If some other thread has the lock, we will not clear but the > lock should > * not be held for long, so another thread should be able to > clear in the near future. > */ > if (lock.tryLock()) { > try { > if (size() >= DEFAULT_MAX_ENTRIES) { > clear(); > } > } finally { > lock.unlock(); > } > } {code} > > Potential fixes: > # Upgrade to Jackson-core 2.18 when it's released; > # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't > totally understand the options suggested by this thread yet. > # Introduce a new UDF that doesn't depend on jackson-core -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4
[ https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739895#comment-17739895 ] PJ Fanning commented on SPARK-35253: Janino 3.1.10 is out today and resolves [https://github.com/janino-compiler/janino/issues/201] - which may be an issue if you have to parse input that might not be entirely trustworthy. It appears that in trunk, Spark already uses 3.1.9. If this issue, can be closed - I can raise a separate issue about doing a further upgrade to 3.1.10. > Upgrade Janino from 3.0.16 to 3.1.4 > --- > > Key: SPARK-35253 > URL: https://issues.apache.org/jira/browse/SPARK-35253 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > > From the [change log|http://janino-compiler.github.io/janino/changelog.html], > the janino 3.0.x line has been deprecated, we can use 3.1.x line instead of > it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42854) Jackson 2.15
[ https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning resolved SPARK-42854. Resolution: Duplicate > Jackson 2.15 > > > Key: SPARK-42854 > URL: https://issues.apache.org/jira/browse/SPARK-42854 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 3.4.1 >Reporter: PJ Fanning >Priority: Major > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Spark users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Spark team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715949#comment-17715949 ] PJ Fanning commented on SPARK-43263: This is a duplicate of SPARK-42854 and it is not a good idea to disregard the points made in SPARK-42854 > Upgrade FasterXML jackson to 2.15.0 > --- > > Key: SPARK-43263 > URL: https://issues.apache.org/jira/browse/SPARK-43263 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves > [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) > (contributed by @pjfannin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42854) Jackson 2.15
[ https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-42854: --- Description: I'm not advocating for an upgrade to [Jackson 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 2.15.0-rc1 has just been released and 2.15.0 should be out soon. There are some security focused enhancements including a new class called StreamReadConstraints. The defaults on [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] are pretty high but it is not inconceivable that some Spark users might need to relax them. Parsing large strings as numbers is sub-quadratic, thus the default limit of 1000 chars or bytes (depending on input context). When the Spark team consider upgrading to Jackson 2.15 or above, you might also want to consider adding some way for users to configure the StreamReadConstraints. was: I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been released and 2.15.0 should be out soon. There are some security focused enhancements including a new class called StreamReadConstraints. The defaults on [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] are pretty high but it is not inconceivable that some Spark users might need to relax them. Parsing large strings as numbers is sub-quadratic, thus the default limit of 1000 chars or bytes (depending on input context). When the Spark team consider upgrading to Jackson 2.15 or above, you might also want to consider adding some way for users to configure the StreamReadConstraints. > Jackson 2.15 > > > Key: SPARK-42854 > URL: https://issues.apache.org/jira/browse/SPARK-42854 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 3.4.1 >Reporter: PJ Fanning >Priority: Major > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Spark users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Spark team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42854) Jackson 2.15
[ https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-42854: --- Description: I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been released and 2.15.0 should be out soon. There are some security focused enhancements including a new class called StreamReadConstraints. The defaults on [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] are pretty high but it is not inconceivable that some Spark users might need to relax them. Parsing large strings as numbers is sub-quadratic, thus the default limit of 1000 chars or bytes (depending on input context). When the Spark team consider upgrading to Jackson 2.15 or above, you might also want to consider adding some way for users to configure the StreamReadConstraints. was: I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been released and 2.15.0 should be out soon. There are some security focused enhancements including a new class called StreamReadConstraints. The defaults on [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] are pretty high but it is not inconceivable that some Spark users might need to relax them. Parsing large strings as numbers is sub-quadratic, thus the default limit of 1000 chars or bytes (depending on input context). > Jackson 2.15 > > > Key: SPARK-42854 > URL: https://issues.apache.org/jira/browse/SPARK-42854 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 3.4.1 >Reporter: PJ Fanning >Priority: Major > > I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been > released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Spark users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Spark team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42854) Jackson 2.15
PJ Fanning created SPARK-42854: -- Summary: Jackson 2.15 Key: SPARK-42854 URL: https://issues.apache.org/jira/browse/SPARK-42854 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 3.4.1 Reporter: PJ Fanning I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been released and 2.15.0 should be out soon. There are some security focused enhancements including a new class called StreamReadConstraints. The defaults on [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] are pretty high but it is not inconceivable that some Spark users might need to relax them. Parsing large strings as numbers is sub-quadratic, thus the default limit of 1000 chars or bytes (depending on input context). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0
[ https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-40911: --- Description: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the per issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. I will also disable the .tasty file lookups for Scala 2.11/2.12 as they are not useful for those users. For Spark usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. This change will require updating all Jackson jars to v2.14.0 (as Jackson does not support using version mismatches - except at patch version level). was: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the jackson issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. This change will require updating all Jackson jars to v2.14.0 (as Jackson does not support using version mismatches - except at patch version level). > Upgrade jackson-module-scala to 2.14.0 > -- > > Key: SPARK-40911 > URL: https://issues.apache.org/jira/browse/SPARK-40911 > Project: Spark > Issue Type: Improvement > Components: Java API >Affects Versions: 3.3.0 >Reporter: PJ Fanning >Priority: Major > > This 2.14.0 release is still a few weeks a way. There is an rc2 release but > there will probably be an rc3 before a full release. > The reason I marked the Jira component as 'Java API' is that this issue will > affect Java users more than Scala users. > I raised this separately to SPARK-40666 because I have a different reason for > this issue. SPARK-40666 can probably already be closed as the CVE is fixed in > jackson-databind 2.13.4.2. > There are performance issues in jackson-module-scala 2.13.x that may affect > some Spark users. Specifically, the per issue is > [https://github.com/FasterXML/jackson-module-scala/issues/576] > Scala3 support added in jackson-module-scala 2.13.0 means that if you use > Scala 2.13, you should be able to use Scala3 compiled classes with >
[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0
[ https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-40911: --- Description: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the jackson issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. This change will require updating all Jackson jars to v2.14.0 (as Jackson does not support using version mismatches - except at patch version level). was: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the per issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. This change will require updating all Jackson jars to v2.14.0 (as Jackson does not support using version mismatches - except at patch version level). > Upgrade jackson-module-scala to 2.14.0 > -- > > Key: SPARK-40911 > URL: https://issues.apache.org/jira/browse/SPARK-40911 > Project: Spark > Issue Type: Improvement > Components: Java API >Affects Versions: 3.3.0 >Reporter: PJ Fanning >Priority: Major > > This 2.14.0 release is still a few weeks a way. There is an rc2 release but > there will probably be an rc3 before a full release. > The reason I marked the Jira component as 'Java API' is that this issue will > affect Java users more than Scala users. > I raised this separately to SPARK-40666 because I have a different reason for > this issue. SPARK-40666 can probably already be closed as the CVE is fixed in > jackson-databind 2.13.4.2. > There are performance issues in jackson-module-scala 2.13.x that may affect > some Spark users. Specifically, the jackson issue is > [https://github.com/FasterXML/jackson-module-scala/issues/576] > Scala3 support added in jackson-module-scala 2.13.0 means that if you use > Scala 2.13, you should be able to use Scala3 compiled classes with > jackson-module-scala. Scala3 compiled classes are harder to recognise using > runtime reflection (and Jackson
[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0
[ https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-40911: --- Description: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the per issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. This change will require updating all Jackson jars to v2.14.0 (as Jackson does not support using version mismatches - except at patch version level). was: This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the per issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. > Upgrade jackson-module-scala to 2.14.0 > -- > > Key: SPARK-40911 > URL: https://issues.apache.org/jira/browse/SPARK-40911 > Project: Spark > Issue Type: Improvement > Components: Java API >Affects Versions: 3.3.0 >Reporter: PJ Fanning >Priority: Major > > This 2.14.0 release is still a few weeks a way. There is an rc2 release but > there will probably be an rc3 before a full release. > The reason I marked the Jira component as 'Java API' is that this issue will > affect Java users more than Scala users. > I raised this separately to SPARK-40666 because I have a different reason for > this issue. SPARK-40666 can probably already be closed as the CVE is fixed in > jackson-databind 2.13.4.2. > There are performance issues in jackson-module-scala 2.13.x that may affect > some Spark users. Specifically, the per issue is > [https://github.com/FasterXML/jackson-module-scala/issues/576] > Scala3 support added in jackson-module-scala 2.13.0 means that if you use > Scala 2.13, you should be able to use Scala3 compiled classes with > jackson-module-scala. Scala3 compiled classes are harder to recognise using > runtime reflection (and Jackson is built around runtime reflection). Scala2 > compiled classes have specific annotations. With Scala3 compiled classes, we > need to look for .tasty files.
[jira] [Created] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0
PJ Fanning created SPARK-40911: -- Summary: Upgrade jackson-module-scala to 2.14.0 Key: SPARK-40911 URL: https://issues.apache.org/jira/browse/SPARK-40911 Project: Spark Issue Type: Improvement Components: Java API Affects Versions: 3.3.0 Reporter: PJ Fanning This 2.14.0 release is still a few weeks a way. There is an rc2 release but there will probably be an rc3 before a full release. The reason I marked the Jira component as 'Java API' is that this issue will affect Java users more than Scala users. I raised this separately to SPARK-40666 because I have a different reason for this issue. SPARK-40666 can probably already be closed as the CVE is fixed in jackson-databind 2.13.4.2. There are performance issues in jackson-module-scala 2.13.x that may affect some Spark users. Specifically, the per issue is [https://github.com/FasterXML/jackson-module-scala/issues/576] Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 2.13, you should be able to use Scala3 compiled classes with jackson-module-scala. Scala3 compiled classes are harder to recognise using runtime reflection (and Jackson is built around runtime reflection). Scala2 compiled classes have specific annotations. With Scala3 compiled classes, we need to look for .tasty files. This lookup can be slow if you have a lot of jars (or big jars). Issue [576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an issue where this .tasty lookup is done every time you try to serialize/deserialize a Java class with an ObjectMapper that has the DefaultScalaModule registered. For Scala usage, it may be worth turning off this .tasty file support altogether. This is another enhancement in jackson-module-scala (but not in the RC2 release). I will follow up and update this issue when the v2.14.0 release is ready. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40666) Upgrade FasterXML jackson-databind to 2.14
[ https://issues.apache.org/jira/browse/SPARK-40666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623810#comment-17623810 ] PJ Fanning commented on SPARK-40666: The recent CVEs are fixed in Jackson 2.13.4.2 and Spark trunk branch uses that version. Can this be closed? > Upgrade FasterXML jackson-databind to 2.14 > -- > > Key: SPARK-40666 > URL: https://issues.apache.org/jira/browse/SPARK-40666 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [CVE-2022-42003|https://nvd.nist.gov/vuln/detail/CVE-2022-42003] > [Github|https://github.com/FasterXML/jackson-databind/issues/3590] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40457) upgrade jackson data mapper to latest
[ https://issues.apache.org/jira/browse/SPARK-40457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623809#comment-17623809 ] PJ Fanning commented on SPARK-40457: Maybe this could be closed as a duplicate of SPARK-30466 > upgrade jackson data mapper to latest > -- > > Key: SPARK-40457 > URL: https://issues.apache.org/jira/browse/SPARK-40457 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bilna >Priority: Major > > Upgrade jackson-mapper-asl to the latest to resolve CVE-2019-10172 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
[ https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623806#comment-17623806 ] PJ Fanning commented on SPARK-30466: An upcoming release of Hadoop 3 will remove its remaining use of the Jackson1 jars. Possibly, as soon as the Hadoop 3.3.5 release. > remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13 > -- > > Key: SPARK-30466 > URL: https://issues.apache.org/jira/browse/SPARK-30466 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Michael Burgener >Priority: Major > Labels: security > > These 2 libraries are deprecated and replaced by the jackson-databind > libraries which are already included. These two libraries are flagged by our > vulnerability scanners as having the following security vulnerabilities. > I've set the priority to Major due to the Critical nature and hopefully they > can be addressed quickly. Please note, I'm not a developer but work in > InfoSec and this was flagged when we incorporated spark into our product. If > you feel the priority is not set correctly please change accordingly. I'll > watch the issue and flag our dev team to update once resolved. > jackson-mapper-asl-1.9.13 > CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] > > CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-7525] > > CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-17485] > > CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL) > [https://nvd.nist.gov/vuln/detail/CVE-2017-15095] > > CVE-2018-5968 (CVSS 3.0 Score 8.1 High) > [https://nvd.nist.gov/vuln/detail/CVE-2018-5968] > > jackson-core-asl-1.9.13 > CVE-2016-7051 (CVSS 3.0 Score 8.6 High) > https://nvd.nist.gov/vuln/detail/CVE-2016-7051 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38665) upgrade jackson due to CVE-2020-36518
PJ Fanning created SPARK-38665: -- Summary: upgrade jackson due to CVE-2020-36518 Key: SPARK-38665 URL: https://issues.apache.org/jira/browse/SPARK-38665 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1 Reporter: PJ Fanning * https://github.com/FasterXML/jackson-databind/issues/2816 * only jackson-databind has a 2.13.2.1 release * other jackson jars should stay at 2.13.2 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37630) Security issue from Log4j 1.X exploit
[ https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487275#comment-17487275 ] PJ Fanning commented on SPARK-37630: [~jinlow] there is little point commenting on this closed issue - please look at https://issues.apache.org/jira/browse/SPARK-6305 - this issue is marked as a duplicate of that and progress has been made on the switch to log4jv2 > Security issue from Log4j 1.X exploit > - > > Key: SPARK-37630 > URL: https://issues.apache.org/jira/browse/SPARK-37630 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.2.0 >Reporter: Ismail H >Priority: Major > Labels: security > > log4j is being used in version [1.2.17|#L122]] > > This version has been deprecated and since [then have a known issue that > hasn't been adressed in 1.X > versions|https://www.cvedetails.com/cve/CVE-2019-17571/]. > > *Solution:* > * Upgrade log4j to version 2.15.0 which correct all known issues. [Last > known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37625) update log4j to 2.15
[ https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458802#comment-17458802 ] PJ Fanning commented on SPARK-37625: log4j 2.16.0 is out - it might be best to pause this as it doesn't seem urgent to change Spark > update log4j to 2.15 > - > > Key: SPARK-37625 > URL: https://issues.apache.org/jira/browse/SPARK-37625 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: weifeng zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37630) Security issue from Log4j 0day exploit
[ https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458798#comment-17458798 ] PJ Fanning commented on SPARK-37630: Maybe a duplicate of SPARK-6305 > Security issue from Log4j 0day exploit > -- > > Key: SPARK-37630 > URL: https://issues.apache.org/jira/browse/SPARK-37630 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.2.0 >Reporter: Ismail H >Priority: Major > Labels: security > > log4j is being used in version [1.2.17|#L122]] > > This version has been deprecated and since [then have a known issue that > hasn't been adressed in 1.X > versions|https://www.cvedetails.com/cve/CVE-2019-17571/]. > > *Solution:* > * Upgrade log4j to version 2.15.0 which correct all known issues. [Last > known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27683) Remove usage of TraversableOnce
[ https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838086#comment-16838086 ] PJ Fanning edited comment on SPARK-27683 at 5/12/19 3:01 PM: - [~srowen] would it be possible to use the scala-collection-compat lib? It has a type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 2.12 versions of the lib but to the core IterableOnce in 2.13. [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156] [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22] The akka team created equivalent type aliases to avoid the dependency on scala-collection-compat and this approach could be used to add additional type aliases that suit Spark's requirements. was (Author: pj.fanning): [~srowen] would it be possible to use the scala-collection-compat lib? It has a type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 2.12 versions of the lib but to the core IterableOnce in 2.13. [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156] [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22] > Remove usage of TraversableOnce > --- > > Key: SPARK-27683 > URL: https://issues.apache.org/jira/browse/SPARK-27683 > Project: Spark > Issue Type: Sub-task > Components: ML, Spark Core, SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We > should use {{IterableOnce}} instead. This one is a bigger change as there are > more API methods with the existing signature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27683) Remove usage of TraversableOnce
[ https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838086#comment-16838086 ] PJ Fanning commented on SPARK-27683: [~srowen] would it be possible to use the scala-collection-compat lib? It has a type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 2.12 versions of the lib but to the core IterableOnce in 2.13. [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156] [https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22] > Remove usage of TraversableOnce > --- > > Key: SPARK-27683 > URL: https://issues.apache.org/jira/browse/SPARK-27683 > Project: Spark > Issue Type: Sub-task > Components: ML, Spark Core, SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We > should use {{IterableOnce}} instead. This one is a bigger change as there are > more API methods with the existing signature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21708) use sbt 1.x
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21708: --- Summary: use sbt 1.x (was: use sbt 1.0.0) > use sbt 1.x > --- > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning >Priority: Minor > > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > According to https://github.com/sbt/sbt/issues/3424, we will need to change > the HTTP location where we get the sbt-launch jar. > Other related issues: > SPARK-14401 > https://github.com/typesafehub/sbteclipse/issues/343 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124012#comment-16124012 ] PJ Fanning commented on SPARK-21708: [~srowen] Your point about IDEs is valid. IntelliJ IDEA has support (https://blog.jetbrains.com/scala/2017/07/19/intellij-idea-scala-plugin-2017-2-sbt-1-0-improved-sbt-shell-play-2-6-and-better-implicits-management/) and hopefully the sbteclipse plugin for generating eclipse workspaces from sbt files will be updated soon. All in all, there are quite a number of sbt plugins to upgrade before the sbt version can be raised to 1.0.0. So, by the time we are in a position to switch to 1.0.0, it should be easier for developers to adapt. > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning >Priority: Minor > > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > According to https://github.com/sbt/sbt/issues/3424, we will need to change > the HTTP location where we get the sbt-launch jar. > Other related issues: > SPARK-14401 > https://github.com/typesafehub/sbteclipse/issues/343 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21708: --- Description: Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html According to https://github.com/sbt/sbt/issues/3424, we will need to change the HTTP location where we get the sbt-launch jar. Other related issues: SPARK-14401 https://github.com/typesafehub/sbteclipse/issues/343 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 was: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/typesafehub/sbteclipse/issues/343 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning >Priority: Minor > > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > According to https://github.com/sbt/sbt/issues/3424, we will need to change > the HTTP location where we get the sbt-launch jar. > Other related issues: > SPARK-14401 > https://github.com/typesafehub/sbteclipse/issues/343 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123451#comment-16123451 ] PJ Fanning commented on SPARK-21708: [~srowen] the build/sbt scripting will download the preferred sbt version. With a good internet connection, it takes a couple of minutes. > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning >Priority: Minor > > I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is > released. > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > Other related issues: > SPARK-14401 > https://github.com/sbt/sbt/issues/3424 > https://github.com/typesafehub/sbteclipse/issues/343 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21709) use sbt 0.13.16 and update sbt plugins
[ https://issues.apache.org/jira/browse/SPARK-21709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21709: --- Description: A preliminary step to SPARK-21708. Quite a lot of sbt plugin changes needed to get to full sbt 1.0.0 support. was: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/typesafehub/sbteclipse/issues/343 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 > use sbt 0.13.16 and update sbt plugins > -- > > Key: SPARK-21709 > URL: https://issues.apache.org/jira/browse/SPARK-21709 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning > > A preliminary step to SPARK-21708. > Quite a lot of sbt plugin changes needed to get to full sbt 1.0.0 support. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21709) use sbt 0.13.16 and update sbt plugins
PJ Fanning created SPARK-21709: -- Summary: use sbt 0.13.16 and update sbt plugins Key: SPARK-21709 URL: https://issues.apache.org/jira/browse/SPARK-21709 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.3.0 Reporter: PJ Fanning I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/typesafehub/sbteclipse/issues/343 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21708: --- Description: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/typesafehub/sbteclipse/issues/343 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 was: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning > > I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is > released. > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > Other related issues: > SPARK-14401 > https://github.com/sbt/sbt/issues/3424 > https://github.com/typesafehub/sbteclipse/issues/343 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21708: --- Description: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html Other related issues: SPARK-14401 https://github.com/sbt/sbt/issues/3424 https://github.com/jrudolph/sbt-dependency-graph/issues/134 https://github.com/AlpineNow/junit_xml_listener/issues/6 https://github.com/spray/sbt-revolver/issues/62 https://github.com/ihji/sbt-antlr4/issues/14 was: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. https://github.com/sbt/sbt/issues/3424 Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning > > I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is > released. > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > Other related issues: > SPARK-14401 > https://github.com/sbt/sbt/issues/3424 > https://github.com/jrudolph/sbt-dependency-graph/issues/134 > https://github.com/AlpineNow/junit_xml_listener/issues/6 > https://github.com/spray/sbt-revolver/issues/62 > https://github.com/ihji/sbt-antlr4/issues/14 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14401) Switch to stock sbt-pom-reader plugin
[ https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123319#comment-16123319 ] PJ Fanning commented on SPARK-14401: This would be useful for a general upgrade to sbt 1.0.0 > Switch to stock sbt-pom-reader plugin > - > > Key: SPARK-14401 > URL: https://issues.apache.org/jira/browse/SPARK-14401 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Josh Rosen > > Spark currently depends on a forked version of {{sbt-pom-reader}} which we > build from source. It would be great to port our modifications to the > upstream project so that we can migrate to the official version and stop > maintaining our fork. > [~scrapco...@gmail.com], could you edit this ticket to fill in more detail > about which custom changes have not been ported yet? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21708) use sbt 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21708: --- Description: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. https://github.com/sbt/sbt/issues/3424 Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html was: I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html > use sbt 1.0.0 > - > > Key: SPARK-21708 > URL: https://issues.apache.org/jira/browse/SPARK-21708 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning > > I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is > released. https://github.com/sbt/sbt/issues/3424 > Should improve sbt build times. > http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21708) use sbt 1.0.0
PJ Fanning created SPARK-21708: -- Summary: use sbt 1.0.0 Key: SPARK-21708 URL: https://issues.apache.org/jira/browse/SPARK-21708 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.3.0 Reporter: PJ Fanning I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is released. Should improve sbt build times. http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090125#comment-16090125 ] PJ Fanning commented on SPARK-20871: [~srowen] One of the main reasons for the Janino compile to fail is if the generated code is too large (> 64k). In this case, printing the full code is not as useful. The code is still logged in full at debug level, so users can always reconfigure their logging to get the full logging. In my team's use case, the fallback to not using codegen works ok and the codegen is still so fast that it suits us best to allow the codegen and have the automatic fallback in the edge case of the codegen failing. We have a SaaS application that uses Spark and we don't really want our user's code to appear in our logs by default. > Only log Janino code in debug mode > -- > > Key: SPARK-20871 > URL: https://issues.apache.org/jira/browse/SPARK-20871 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Glen Takahashi >Priority: Trivial > Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png > > > Currently if Janino code compilation fails, it will log the entirety of the > code in the executors. Because the generated code can often be very large, > the logging can cause heap pressure on the driver and cause it to fall over. > I propose removing the "$formatted" from here: > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-20871: --- Fix Version/s: 2.3.0 > Only log Janino code in debug mode > -- > > Key: SPARK-20871 > URL: https://issues.apache.org/jira/browse/SPARK-20871 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Glen Takahashi > Fix For: 2.3.0 > > Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png > > > Currently if Janino code compilation fails, it will log the entirety of the > code in the executors. Because the generated code can often be very large, > the logging can cause heap pressure on the driver and cause it to fall over. > I propose removing the "$formatted" from here: > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-20871: --- Component/s: (was: Spark Core) SQL > Only log Janino code in debug mode > -- > > Key: SPARK-20871 > URL: https://issues.apache.org/jira/browse/SPARK-20871 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Glen Takahashi > Fix For: 2.3.0 > > Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png > > > Currently if Janino code compilation fails, it will log the entirety of the > code in the executors. Because the generated code can often be very large, > the logging can cause heap pressure on the driver and cause it to fall over. > I propose removing the "$formatted" from here: > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21438) Update to scalatest 3.x
[ https://issues.apache.org/jira/browse/SPARK-21438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21438: --- Issue Type: Sub-task (was: Task) Parent: SPARK-14220 > Update to scalatest 3.x > --- > > Key: SPARK-21438 > URL: https://issues.apache.org/jira/browse/SPARK-21438 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.3.0 >Reporter: PJ Fanning > > Only scalatest 3.x supports scala 2.12. > https://mvnrepository.com/artifact/org.scalatest/scalatest_2.12 > This has already been attempted as part of SPARK-18896 but there were issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21438) Update to scalatest 3.x
PJ Fanning created SPARK-21438: -- Summary: Update to scalatest 3.x Key: SPARK-21438 URL: https://issues.apache.org/jira/browse/SPARK-21438 Project: Spark Issue Type: Task Components: Build Affects Versions: 2.3.0 Reporter: PJ Fanning Only scalatest 3.x supports scala 2.12. https://mvnrepository.com/artifact/org.scalatest/scalatest_2.12 This has already been attempted as part of SPARK-18896 but there were issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20495) Add StorageLevel to cacheTable API
[ https://issues.apache.org/jira/browse/SPARK-20495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998435#comment-15998435 ] PJ Fanning commented on SPARK-20495: Thanks everyone for working on this change. Is it too late to consider this for v2.2.0 or even v2.2.1? > Add StorageLevel to cacheTable API > --- > > Key: SPARK-20495 > URL: https://issues.apache.org/jira/browse/SPARK-20495 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li > Fix For: 2.3.0 > > > Currently, cacheTable API always uses the default MEMORY_AND_DISK storage > level. We can add a new cacheTable API with the extra parameter StorageLevel. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20539) support optional dataframe name
PJ Fanning created SPARK-20539: -- Summary: support optional dataframe name Key: SPARK-20539 URL: https://issues.apache.org/jira/browse/SPARK-20539 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: PJ Fanning In the Spark UI and some exception logging, Dataframes are described using the schemas. This is very useful. We use Spark SQL in an application where our customers can manipulate data. When we need to examine logs or check the Spark REST API or UI, we would prefer to be able to override the name of the Dataframe to be something that identifies the origin of the Dataframe as opposed to having the column names exposed. There is a small possibility that the column names could contain some Personally Identifiable Information. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20458) support getting Yarn Tracking URL in code
PJ Fanning created SPARK-20458: -- Summary: support getting Yarn Tracking URL in code Key: SPARK-20458 URL: https://issues.apache.org/jira/browse/SPARK-20458 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.1.0 Reporter: PJ Fanning org.apache.spark.deploy.yarn.Client logs the Yarn tracking URL but it would be useful to be able to access this in code, as opposed to mining log output. I have an application where I monitor the health of the SparkContext and associated Executors using the Spark REST API. Would it be feasible to add a listener API to listen for new ApplicationReports in org.apache.spark.deploy.yarn.Client? Alternatively, this URL could be exposed as a property associated with the SparkContext. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18896) Suppress ScalaCheck warning -- Unknown ScalaCheck args provided when executing tests using sbt
[ https://issues.apache.org/jira/browse/SPARK-18896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769991#comment-15769991 ] PJ Fanning commented on SPARK-18896: I noticed from the pull request that you are looking at possibly upgrading scalatest too. Getting to scalatest 3.0.1 would be useful for later scala 2.12 support. Scalatest 2.x is not cross compiled for Scala 2.12. > Suppress ScalaCheck warning -- Unknown ScalaCheck args provided when > executing tests using sbt > -- > > Key: SPARK-18896 > URL: https://issues.apache.org/jira/browse/SPARK-18896 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Trivial > > While executing tests for {{DAGScheduler}} I've noticed the following warning: > {code} > > core/testOnly org.apache.spark.scheduler.DAGSchedulerSuite > ... > [info] Warning: Unknown ScalaCheck args provided: -oDF > {code} > The reason is due to a bug in ScalaCheck as reported in > https://github.com/rickynils/scalacheck/issues/212 and fixed in > https://github.com/rickynils/scalacheck/commit/df435a5 that is available in > ScalaCheck 1.13.4. > Spark uses [ScalaCheck > 1.12.5|https://github.com/apache/spark/blob/master/pom.xml#L717] which is > behind the latest 1.12.6 [released on Nov > 1|https://github.com/rickynils/scalacheck/releases] (not to mention 1.13.4). > Let's get rid of ScalaCheck's warning (and perhaps upgrade ScalaCheck along > the way too!). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked
[ https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning closed SPARK-16716. -- Resolution: Duplicate This looks like it was fixed by SPARK-16664 > calling cache on joined dataframe can lead to data being blanked > > > Key: SPARK-16716 > URL: https://issues.apache.org/jira/browse/SPARK-16716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: PJ Fanning > > I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code. > The code works ok on Spark 1.6.1. > I have a notebook up on Databricks Community Edition that demonstrates the > issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0 > The code uses some custom code to join 4 dataframes. > It calls show on this dataframe and the data is as expected. > After calling .cache, the data is blanked. > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked
[ https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419430#comment-15419430 ] PJ Fanning commented on SPARK-16716: I set up an equivalent notebook for spark 2.0 in Databricks community edition and the join and cache worked out. It appears that issue is just in Spark 1.6.2. > calling cache on joined dataframe can lead to data being blanked > > > Key: SPARK-16716 > URL: https://issues.apache.org/jira/browse/SPARK-16716 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: PJ Fanning > > I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code. > The code works ok on Spark 1.6.1. > I have a notebook up on Databricks Community Edition that demonstrates the > issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0 > The code uses some custom code to join 4 dataframes. > It calls show on this dataframe and the data is as expected. > After calling .cache, the data is blanked. > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked
PJ Fanning created SPARK-16716: -- Summary: calling cache on joined dataframe can lead to data being blanked Key: SPARK-16716 URL: https://issues.apache.org/jira/browse/SPARK-16716 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.2 Reporter: PJ Fanning I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code. The code works ok on Spark 1.6.1. I have a notebook up on Databricks Community Edition that demonstrates the issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0 The code uses some custom code to join 4 dataframes. It calls show on this dataframe and the data is as expected. After calling .cache, the data is blanked. https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15615) Support for creating a dataframe from JSON in Dataset[String]
PJ Fanning created SPARK-15615: -- Summary: Support for creating a dataframe from JSON in Dataset[String] Key: SPARK-15615 URL: https://issues.apache.org/jira/browse/SPARK-15615 Project: Spark Issue Type: Bug Reporter: PJ Fanning We should deprecate DataFrameReader.scala json(rdd: RDD[String]) and support json(ds: Dataset[String]) instead -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15463) Support for creating a dataframe from CSV in Dataset[String]
[ https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-15463: --- Summary: Support for creating a dataframe from CSV in Dataset[String] (was: Support for creating a dataframe from CSV in RDD[String]) > Support for creating a dataframe from CSV in Dataset[String] > > > Key: SPARK-15463 > URL: https://issues.apache.org/jira/browse/SPARK-15463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: PJ Fanning > > I currently use Databrick's spark-csv lib but some features don't work with > Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV > support into spark-sql directly, that spark-csv won't be modified. > I currently read some CSV data that has been pre-processed and is in > RDD[String] format. > There is sqlContext.read.json(rdd: RDD[String]) but other formats don't > appear to support the creation of DataFrames based on loading from > RDD[String]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]
[ https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299881#comment-15299881 ] PJ Fanning edited comment on SPARK-15463 at 5/25/16 11:09 AM: -- Dataset[String] to DataFrame conversion seems fine to me. Would it make sense to change sqlContext.read.json(rdd: RDD[String]) to sqlContext.read.json(ds: Dataset[String]) too? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala was (Author: pj.fanning): Dataset[String] to DataFrame conversion seems fine to me > Support for creating a dataframe from CSV in RDD[String] > > > Key: SPARK-15463 > URL: https://issues.apache.org/jira/browse/SPARK-15463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: PJ Fanning > > I currently use Databrick's spark-csv lib but some features don't work with > Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV > support into spark-sql directly, that spark-csv won't be modified. > I currently read some CSV data that has been pre-processed and is in > RDD[String] format. > There is sqlContext.read.json(rdd: RDD[String]) but other formats don't > appear to support the creation of DataFrames based on loading from > RDD[String]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]
[ https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299881#comment-15299881 ] PJ Fanning commented on SPARK-15463: Dataset[String] to DataFrame conversion seems fine to me > Support for creating a dataframe from CSV in RDD[String] > > > Key: SPARK-15463 > URL: https://issues.apache.org/jira/browse/SPARK-15463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: PJ Fanning > > I currently use Databrick's spark-csv lib but some features don't work with > Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV > support into spark-sql directly, that spark-csv won't be modified. > I currently read some CSV data that has been pre-processed and is in > RDD[String] format. > There is sqlContext.read.json(rdd: RDD[String]) but other formats don't > appear to support the creation of DataFrames based on loading from > RDD[String]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]
PJ Fanning created SPARK-15463: -- Summary: Support for creating a dataframe from CSV in RDD[String] Key: SPARK-15463 URL: https://issues.apache.org/jira/browse/SPARK-15463 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: PJ Fanning I currently use Databrick's spark-csv lib but some features don't work with Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV support into spark-sql directly, that spark-csv won't be modified. I currently read some CSV data that has been pre-processed and is in RDD[String] format. There is sqlContext.read.json(rdd: RDD[String]) but other formats don't appear to support the creation of DataFrames based on loading from RDD[String]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]
[ https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-15463: --- Issue Type: Improvement (was: Bug) > Support for creating a dataframe from CSV in RDD[String] > > > Key: SPARK-15463 > URL: https://issues.apache.org/jira/browse/SPARK-15463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: PJ Fanning > > I currently use Databrick's spark-csv lib but some features don't work with > Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV > support into spark-sql directly, that spark-csv won't be modified. > I currently read some CSV data that has been pre-processed and is in > RDD[String] format. > There is sqlContext.read.json(rdd: RDD[String]) but other formats don't > appear to support the creation of DataFrames based on loading from > RDD[String]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12956) add spark.yarn.hdfs.home.directory property
[ https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning closed SPARK-12956. -- Resolution: Duplicate > add spark.yarn.hdfs.home.directory property > --- > > Key: SPARK-12956 > URL: https://issues.apache.org/jira/browse/SPARK-12956 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.0 >Reporter: PJ Fanning > > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > uses the default home directory based on the hadoop configuration. I have a > use case where it would be useful to override this and to provide an explicit > base path. > If this seems like a generally use config property, I can put together a pull > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12956) add spark.yarn.hdfs.home.directory property
[ https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223463#comment-15223463 ] PJ Fanning commented on SPARK-12956: [~tgraves] I think you can close this as a duplicate of SPARK-13063 > add spark.yarn.hdfs.home.directory property > --- > > Key: SPARK-12956 > URL: https://issues.apache.org/jira/browse/SPARK-12956 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.6.0 >Reporter: PJ Fanning > > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > uses the default home directory based on the hadoop configuration. I have a > use case where it would be useful to override this and to provide an explicit > base path. > If this seems like a generally use config property, I can put together a pull > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12956) add spark.yarn.hdfs.home.directory property
PJ Fanning created SPARK-12956: -- Summary: add spark.yarn.hdfs.home.directory property Key: SPARK-12956 URL: https://issues.apache.org/jira/browse/SPARK-12956 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.6.0 Reporter: PJ Fanning https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala uses the default home directory based on the hadoop configuration. I have a use case where it would be useful to override this and to provide an explicit base path. If this seems like a generally use config property, I can put together a pull request. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8616) SQLContext doesn't handle tricky column names when loading from JDBC
[ https://issues.apache.org/jira/browse/SPARK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072697#comment-15072697 ] PJ Fanning commented on SPARK-8616: --- Seems to duplicate the 'In Progress' task, SPARK-12437. > SQLContext doesn't handle tricky column names when loading from JDBC > > > Key: SPARK-8616 > URL: https://issues.apache.org/jira/browse/SPARK-8616 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: Ubuntu 14.04, Sqlite 3.8.7, Spark 1.4.0 >Reporter: Gergely Svigruha > > Reproduce: > - create a table in a relational database (in my case sqlite) with a column > name containing a space: > CREATE TABLE my_table (id INTEGER, "tricky column" TEXT); > - try to create a DataFrame using that table: > sqlContext.read.format("jdbc").options(Map( > "url" -> "jdbs:sqlite:...", > "dbtable" -> "my_table")).load() > java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (no such > column: tricky) > According to the SQL spec this should be valid: > http://savage.net.au/SQL/sql-99.bnf.html#delimited%20identifier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11640) shading packages in spark-assembly jar
[ https://issues.apache.org/jira/browse/SPARK-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002537#comment-15002537 ] PJ Fanning commented on SPARK-11640: [~sowen] Thanks - the hadoop-provided profile does lead to bouncycastle classes being excluded from the spark-assembly jar > shading packages in spark-assembly jar > -- > > Key: SPARK-11640 > URL: https://issues.apache.org/jira/browse/SPARK-11640 > Project: Spark > Issue Type: Wish > Components: Build >Reporter: PJ Fanning > > The spark assembly jar contains classes from many external dependencies like > hadoop and bouncycastle. > I have run into issues trying to use bouncycastle code in a Spark job because > the JCE codebase expects the encryption code to be in a signed jar and since > the classes are copied into spark-assembly jar and it is not signed, the JCE > framework returns an error. > If the bouncycastle classes in spark-assembly were shaded, then I could > deploy the properly signed bcprov jar. The spark code could access the shaded > copies of the bouncycastle classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11640) shading packages in spark-assembly jar
[ https://issues.apache.org/jira/browse/SPARK-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-11640: --- Issue Type: Wish (was: Bug) > shading packages in spark-assembly jar > -- > > Key: SPARK-11640 > URL: https://issues.apache.org/jira/browse/SPARK-11640 > Project: Spark > Issue Type: Wish > Components: Build >Reporter: PJ Fanning > > The spark assembly jar contains classes from many external dependencies like > hadoop and bouncycastle. > I have run into issues trying to use bouncycastle code in a Spark job because > the JCE codebase expects the encryption code to be in a signed jar and since > the classes are copied into spark-assembly jar and it is not signed, the JCE > framework returns an error. > If the bouncycastle classes in spark-assembly were shaded, then I could > deploy the properly signed bcprov jar. The spark code could access the shaded > copies of the bouncycastle classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11640) shading packages in spark-assembly jar
PJ Fanning created SPARK-11640: -- Summary: shading packages in spark-assembly jar Key: SPARK-11640 URL: https://issues.apache.org/jira/browse/SPARK-11640 Project: Spark Issue Type: Bug Components: Build Reporter: PJ Fanning The spark assembly jar contains classes from many external dependencies like hadoop and bouncycastle. I have run into issues trying to use bouncycastle code in a Spark job because the JCE codebase expects the encryption code to be in a signed jar and since the classes are copied into spark-assembly jar and it is not signed, the JCE framework returns an error. If the bouncycastle classes in spark-assembly were shaded, then I could deploy the properly signed bcprov jar. The spark code could access the shaded copies of the bouncycastle classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-8494: -- Description: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923 Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} was: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Assignee: Patrick Wendell I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923 Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-8494: -- Description: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} was: I just wanted to document this for posterity. I had an issue when running a Spark 1.0 app locally with sbt. The issue was that if you both: 1. Reference a scala class (e.g. None) inside of a closure. 2. Run your program with 'sbt run' It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this issue. Somehow scala classes were not being loaded correctly inside of the executors: Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Assignee: Patrick Wendell I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354)
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-8494: -- Attachment: spark-test-case.zip ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Assignee: Patrick Wendell Attachments: spark-test-case.zip I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies ++= Seq( com.h2database % h2 % 1.4.187, com.typesafe.akka %% akka-actor % akkaVersion, com.typesafe.akka %% akka-slf4j % akkaVersion, ch.qos.logback % logback-classic % 1.0.13, io.spray %% spray-can% sprayVersion, io.spray %% spray-routing% sprayVersion, io.spray %% spray-json % 1.3.1, com.databricks %% spark-csv% 1.0.3, org.specs2 %% specs2 % 2.4.17 % test, org.specs2 %% specs2-junit % 2.4.17 % test, io.spray %% spray-testkit% sprayVersion % test, com.typesafe.akka %% akka-testkit % akkaVersion% test, junit % junit% 4.12 % test ) scalacOptions ++= Seq( -unchecked, -deprecation, -Xlint, -Ywarn-dead-code, -language:_, -target:jvm-1.7, -encoding, UTF-8 ) testOptions += Tests.Argument(TestFrameworks.JUnit, -v) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
PJ Fanning created SPARK-8494: - Summary: ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Assignee: Patrick Wendell I just wanted to document this for posterity. I had an issue when running a Spark 1.0 app locally with sbt. The issue was that if you both: 1. Reference a scala class (e.g. None) inside of a closure. 2. Run your program with 'sbt run' It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this issue. Somehow scala classes were not being loaded correctly inside of the executors: Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-8494: -- Description: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies ++= Seq( com.h2database % h2 % 1.4.187, com.typesafe.akka %% akka-actor % akkaVersion, com.typesafe.akka %% akka-slf4j % akkaVersion, ch.qos.logback % logback-classic % 1.0.13, io.spray %% spray-can% sprayVersion, io.spray %% spray-routing% sprayVersion, io.spray %% spray-json % 1.3.1, com.databricks %% spark-csv% 1.0.3, org.specs2 %% specs2 % 2.4.17 % test, org.specs2 %% specs2-junit % 2.4.17 % test, io.spray %% spray-testkit% sprayVersion % test, com.typesafe.akka %% akka-testkit % akkaVersion% test, junit % junit% 4.12 % test ) scalacOptions ++= Seq( -unchecked, -deprecation, -Xlint, -Ywarn-dead-code, -language:_, -target:jvm-1.7, -encoding, UTF-8 ) testOptions += Tests.Argument(TestFrameworks.JUnit, -v) {code} was: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923 Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.None$ java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-8494: -- Description: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies ++= Seq( com.h2database % h2 % 1.4.187, com.typesafe.akka %% akka-actor % akkaVersion, com.typesafe.akka %% akka-slf4j % akkaVersion, ch.qos.logback % logback-classic % 1.0.13, io.spray %% spray-can% sprayVersion, io.spray %% spray-routing% sprayVersion, io.spray %% spray-json % 1.3.1, com.databricks %% spark-csv% 1.0.3, org.specs2 %% specs2 % 2.4.17 % test, org.specs2 %% specs2-junit % 2.4.17 % test, io.spray %% spray-testkit% sprayVersion % test, com.typesafe.akka %% akka-testkit % akkaVersion% test, junit % junit% 4.12 % test ) scalacOptions ++= Seq( -unchecked, -deprecation, -Xlint, -Ywarn-dead-code, -language:_, -target:jvm-1.7, -encoding, UTF-8 ) testOptions += Tests.Argument(TestFrameworks.JUnit, -v) {code} was: I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies
[jira] [Commented] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594132#comment-14594132 ] PJ Fanning commented on SPARK-8494: --- [~pwendell] Apologies about the JIRA being assigned to you. I cloned SPARK-1923 and now can't change the Assignee. ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Assignee: Patrick Wendell I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies ++= Seq( com.h2database % h2 % 1.4.187, com.typesafe.akka %% akka-actor % akkaVersion, com.typesafe.akka %% akka-slf4j % akkaVersion, ch.qos.logback % logback-classic % 1.0.13, io.spray %% spray-can% sprayVersion, io.spray %% spray-routing% sprayVersion, io.spray %% spray-json % 1.3.1, com.databricks %% spark-csv% 1.0.3, org.specs2 %% specs2 % 2.4.17 % test, org.specs2 %% specs2-junit % 2.4.17 % test, io.spray %% spray-testkit% sprayVersion % test, com.typesafe.akka %% akka-testkit % akkaVersion% test, junit % junit% 4.12 % test ) scalacOptions ++= Seq( -unchecked, -deprecation, -Xlint, -Ywarn-dead-code, -language:_, -target:jvm-1.7, -encoding, UTF-8 ) testOptions += Tests.Argument(TestFrameworks.JUnit, -v) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org