[jira] [Commented] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-26 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841339#comment-17841339
 ] 

PJ Fanning commented on SPARK-47959:


[~zshao] if you have a test environment, could you try it with the 
2.18.0-SNAPSHOT Jackson jars to see if they halp?

> Improve GET_JSON_OBJECT performance on executors running multiple tasks
> ---
>
> Key: SPARK-47959
> URL: https://issues.apache.org/jira/browse/SPARK-47959
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Zheng Shao
>Priority: Major
>
> We have a Spark executor that is running 32 workers in parallel.  The query 
> is a simple SELECT with several `GET_JSON_OBJECT` UDF calls.
> We noticed that 80+% of the stacktrace of the worker threads are blocked on 
> the following stacktrace:
>  
> {code:java}
> com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - 
> blocked on java.lang.Object@7529fde1 
> com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825)
>  
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798)
>  
> com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196)
>  
> org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown
>  Source)
> ...
> {code}
>  
> Apparently jackson-core has such a performance bug from version 2.3 - 2.15, 
> and not fixed until version 2.18 (unreleased): 
> [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50]
>  
> {code:java}
>             synchronized (lock) {
>                 if (size() >= MAX_ENTRIES) {
>                     clear();
>                 }
>             }
> {code}
>  
> instead of 
> [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59]
>  
> {code:java}
>             /* As of 2.18, the limit is not strictly enforced, but we do try 
> to
>              * clear entries if we have reached the limit. We do not expect to
>              * go too much over the limit, and if we do, it's not a huge 
> problem.
>              * If some other thread has the lock, we will not clear but the 
> lock should
>              * not be held for long, so another thread should be able to 
> clear in the near future.
>              */
>             if (lock.tryLock()) {
>                 try {
>                     if (size() >= DEFAULT_MAX_ENTRIES) {
>                         clear();
>                     }
>                 } finally {
>                     lock.unlock();
>                 }
>             }   {code}
>  
> Potential fixes:
>  # Upgrade to Jackson-core 2.18 when it's released;
>  # Follow [https://github.com/FasterXML/jackson-core/issues/998] - I don't 
> totally understand the options suggested by this thread yet.
>  # Introduce a new UDF that doesn't depend on jackson-core



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4

2023-07-04 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739895#comment-17739895
 ] 

PJ Fanning commented on SPARK-35253:


Janino 3.1.10 is out today and resolves 
[https://github.com/janino-compiler/janino/issues/201] - which may be an issue 
if you have to parse input that might not be entirely trustworthy. 

It appears that in trunk, Spark already uses 3.1.9. If this issue, can be 
closed - I can raise a separate issue about doing a further upgrade to 3.1.10.

> Upgrade Janino from 3.0.16 to 3.1.4
> ---
>
> Key: SPARK-35253
> URL: https://issues.apache.org/jira/browse/SPARK-35253
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> From the [change log|http://janino-compiler.github.io/janino/changelog.html], 
>  the janino 3.0.x line has been deprecated,  we can use 3.1.x line instead of 
> it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42854) Jackson 2.15

2023-05-05 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning resolved SPARK-42854.

Resolution: Duplicate

> Jackson 2.15
> 
>
> Key: SPARK-42854
> URL: https://issues.apache.org/jira/browse/SPARK-42854
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: PJ Fanning
>Priority: Major
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Spark users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Spark team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43263) Upgrade FasterXML jackson to 2.15.0

2023-04-24 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715949#comment-17715949
 ] 

PJ Fanning commented on SPARK-43263:


This is a duplicate of SPARK-42854 and it is not a good idea to disregard the 
points made in SPARK-42854

> Upgrade FasterXML jackson to 2.15.0
> ---
>
> Key: SPARK-43263
> URL: https://issues.apache.org/jira/browse/SPARK-43263
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> * #390: (yaml) Upgrade to Snakeyaml 2.0 (resolves 
> [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471])
>  (contributed by @pjfannin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42854) Jackson 2.15

2023-03-19 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-42854:
---
Description: 
I'm not advocating for an upgrade to [Jackson 
2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
2.15.0-rc1 has just been released and 2.15.0 should be out soon.

There are some security focused enhancements including a new class called 
StreamReadConstraints. The defaults on 
[StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
 are pretty high but it is not inconceivable that some Spark users might need 
to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
default limit of 1000 chars or bytes (depending on input context).

When the Spark team consider upgrading to Jackson 2.15 or above, you might also 
want to consider adding some way for users to configure the 
StreamReadConstraints.

  was:
I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been 
released and 2.15.0 should be out soon.

There are some security focused enhancements including a new class called 
StreamReadConstraints. The defaults on 
[StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
 are pretty high but it is not inconceivable that some Spark users might need 
to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
default limit of 1000 chars or bytes (depending on input context).

When the Spark team consider upgrading to Jackson 2.15 or above, you might also 
want to consider adding some way for users to configure the 
StreamReadConstraints.


> Jackson 2.15
> 
>
> Key: SPARK-42854
> URL: https://issues.apache.org/jira/browse/SPARK-42854
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: PJ Fanning
>Priority: Major
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Spark users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Spark team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42854) Jackson 2.15

2023-03-19 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-42854:
---
Description: 
I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been 
released and 2.15.0 should be out soon.

There are some security focused enhancements including a new class called 
StreamReadConstraints. The defaults on 
[StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
 are pretty high but it is not inconceivable that some Spark users might need 
to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
default limit of 1000 chars or bytes (depending on input context).

When the Spark team consider upgrading to Jackson 2.15 or above, you might also 
want to consider adding some way for users to configure the 
StreamReadConstraints.

  was:
I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been 
released and 2.15.0 should be out soon.

There are some security focused enhancements including a new class called 
StreamReadConstraints. The defaults on 
[StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
 are pretty high but it is not inconceivable that some Spark users might need 
to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
default limit of 1000 chars or bytes (depending on input context). 


> Jackson 2.15
> 
>
> Key: SPARK-42854
> URL: https://issues.apache.org/jira/browse/SPARK-42854
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.4.1
>Reporter: PJ Fanning
>Priority: Major
>
> I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been 
> released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Spark users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Spark team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42854) Jackson 2.15

2023-03-19 Thread PJ Fanning (Jira)
PJ Fanning created SPARK-42854:
--

 Summary: Jackson 2.15
 Key: SPARK-42854
 URL: https://issues.apache.org/jira/browse/SPARK-42854
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Affects Versions: 3.4.1
Reporter: PJ Fanning


I'm not advocating for an upgrade to Jackson 2.15. 2.15.0-rc1 has just been 
released and 2.15.0 should be out soon.

There are some security focused enhancements including a new class called 
StreamReadConstraints. The defaults on 
[StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
 are pretty high but it is not inconceivable that some Spark users might need 
to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
default limit of 1000 chars or bytes (depending on input context). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0

2022-10-25 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-40911:
---
Description: 
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the per issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered. I will also disable the .tasty file lookups for 
Scala 2.11/2.12 as they are not useful for those users.

For Spark usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready. This 
change will require updating all Jackson jars to v2.14.0 (as Jackson does not 
support using version mismatches - except at patch version level).

  was:
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the jackson issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready. This 
change will require updating all Jackson jars to v2.14.0 (as Jackson does not 
support using version mismatches - except at patch version level).


> Upgrade jackson-module-scala to 2.14.0
> --
>
> Key: SPARK-40911
> URL: https://issues.apache.org/jira/browse/SPARK-40911
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.0
>Reporter: PJ Fanning
>Priority: Major
>
> This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
> there will probably be an rc3 before a full release.
> The reason I marked the Jira component as 'Java API' is that this issue will 
> affect Java users more than Scala users.
> I raised this separately to SPARK-40666 because I have a different reason for 
> this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
> jackson-databind 2.13.4.2.
> There are performance issues in jackson-module-scala 2.13.x that may affect 
> some Spark users. Specifically, the per issue is 
> [https://github.com/FasterXML/jackson-module-scala/issues/576]
> Scala3 support added in jackson-module-scala 2.13.0 means that if you use 
> Scala 2.13, you should be able to use Scala3 compiled classes with 
> 

[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0

2022-10-25 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-40911:
---
Description: 
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the jackson issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready. This 
change will require updating all Jackson jars to v2.14.0 (as Jackson does not 
support using version mismatches - except at patch version level).

  was:
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the per issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready. This 
change will require updating all Jackson jars to v2.14.0 (as Jackson does not 
support using version mismatches - except at patch version level).


> Upgrade jackson-module-scala to 2.14.0
> --
>
> Key: SPARK-40911
> URL: https://issues.apache.org/jira/browse/SPARK-40911
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.0
>Reporter: PJ Fanning
>Priority: Major
>
> This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
> there will probably be an rc3 before a full release.
> The reason I marked the Jira component as 'Java API' is that this issue will 
> affect Java users more than Scala users.
> I raised this separately to SPARK-40666 because I have a different reason for 
> this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
> jackson-databind 2.13.4.2.
> There are performance issues in jackson-module-scala 2.13.x that may affect 
> some Spark users. Specifically, the jackson issue is 
> [https://github.com/FasterXML/jackson-module-scala/issues/576]
> Scala3 support added in jackson-module-scala 2.13.0 means that if you use 
> Scala 2.13, you should be able to use Scala3 compiled classes with 
> jackson-module-scala. Scala3 compiled classes are harder to recognise using 
> runtime reflection (and Jackson 

[jira] [Updated] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0

2022-10-25 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-40911:
---
Description: 
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the per issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready. This 
change will require updating all Jackson jars to v2.14.0 (as Jackson does not 
support using version mismatches - except at patch version level).

  was:
This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the per issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready.


> Upgrade jackson-module-scala to 2.14.0
> --
>
> Key: SPARK-40911
> URL: https://issues.apache.org/jira/browse/SPARK-40911
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 3.3.0
>Reporter: PJ Fanning
>Priority: Major
>
> This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
> there will probably be an rc3 before a full release.
> The reason I marked the Jira component as 'Java API' is that this issue will 
> affect Java users more than Scala users.
> I raised this separately to SPARK-40666 because I have a different reason for 
> this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
> jackson-databind 2.13.4.2.
> There are performance issues in jackson-module-scala 2.13.x that may affect 
> some Spark users. Specifically, the per issue is 
> [https://github.com/FasterXML/jackson-module-scala/issues/576]
> Scala3 support added in jackson-module-scala 2.13.0 means that if you use 
> Scala 2.13, you should be able to use Scala3 compiled classes with 
> jackson-module-scala. Scala3 compiled classes are harder to recognise using 
> runtime reflection (and Jackson is built around runtime reflection). Scala2 
> compiled classes have specific annotations. With Scala3 compiled classes, we 
> need to look for .tasty files. 

[jira] [Created] (SPARK-40911) Upgrade jackson-module-scala to 2.14.0

2022-10-25 Thread PJ Fanning (Jira)
PJ Fanning created SPARK-40911:
--

 Summary: Upgrade jackson-module-scala to 2.14.0
 Key: SPARK-40911
 URL: https://issues.apache.org/jira/browse/SPARK-40911
 Project: Spark
  Issue Type: Improvement
  Components: Java API
Affects Versions: 3.3.0
Reporter: PJ Fanning


This 2.14.0 release is still a few weeks a way. There is an rc2 release but 
there will probably be an rc3 before a full release.

The reason I marked the Jira component as 'Java API' is that this issue will 
affect Java users more than Scala users.

I raised this separately to SPARK-40666 because I have a different reason for 
this issue. SPARK-40666 can probably already be closed as the CVE is fixed in 
jackson-databind 2.13.4.2.

There are performance issues in jackson-module-scala 2.13.x that may affect 
some Spark users. Specifically, the per issue is 
[https://github.com/FasterXML/jackson-module-scala/issues/576]

Scala3 support added in jackson-module-scala 2.13.0 means that if you use Scala 
2.13, you should be able to use Scala3 compiled classes with 
jackson-module-scala. Scala3 compiled classes are harder to recognise using 
runtime reflection (and Jackson is built around runtime reflection). Scala2 
compiled classes have specific annotations. With Scala3 compiled classes, we 
need to look for .tasty files. This lookup can be slow if you have a lot of 
jars (or big jars). Issue 
[576|https://github.com/FasterXML/jackson-module-scala/issues/576] fixes an 
issue where this .tasty lookup is done every time you try to 
serialize/deserialize a Java class with an ObjectMapper that has the 
DefaultScalaModule registered.

For Scala usage, it may be worth turning off this .tasty file support 
altogether. This is another enhancement in jackson-module-scala (but not in the 
RC2 release).

I will follow up and update this issue when the v2.14.0 release is ready.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40666) Upgrade FasterXML jackson-databind to 2.14

2022-10-25 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623810#comment-17623810
 ] 

PJ Fanning commented on SPARK-40666:


The recent CVEs are fixed in Jackson 2.13.4.2 and Spark trunk branch uses that 
version. Can this be closed?

> Upgrade FasterXML jackson-databind to 2.14
> --
>
> Key: SPARK-40666
> URL: https://issues.apache.org/jira/browse/SPARK-40666
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [CVE-2022-42003|https://nvd.nist.gov/vuln/detail/CVE-2022-42003]
> [Github|https://github.com/FasterXML/jackson-databind/issues/3590]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40457) upgrade jackson data mapper to latest

2022-10-25 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623809#comment-17623809
 ] 

PJ Fanning commented on SPARK-40457:


Maybe this could be closed as a duplicate of SPARK-30466

> upgrade jackson data mapper to latest 
> --
>
> Key: SPARK-40457
> URL: https://issues.apache.org/jira/browse/SPARK-40457
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Bilna
>Priority: Major
>
> Upgrade  jackson-mapper-asl to the latest to resolve CVE-2019-10172



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13

2022-10-25 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623806#comment-17623806
 ] 

PJ Fanning commented on SPARK-30466:


An upcoming release of Hadoop 3 will remove its remaining use of the Jackson1 
jars. Possibly, as soon as the Hadoop 3.3.5 release.

> remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
> --
>
> Key: SPARK-30466
> URL: https://issues.apache.org/jira/browse/SPARK-30466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Michael Burgener
>Priority: Major
>  Labels: security
>
> These 2 libraries are deprecated and replaced by the jackson-databind 
> libraries which are already included.  These two libraries are flagged by our 
> vulnerability scanners as having the following security vulnerabilities.  
> I've set the priority to Major due to the Critical nature and hopefully they 
> can be addressed quickly.  Please note, I'm not a developer but work in 
> InfoSec and this was flagged when we incorporated spark into our product.  If 
> you feel the priority is not set correctly please change accordingly.  I'll 
> watch the issue and flag our dev team to update once resolved.  
> jackson-mapper-asl-1.9.13
> CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] 
>  
> CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-7525]
>  
> CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-17485]
>  
> CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-15095]
>  
> CVE-2018-5968 (CVSS 3.0 Score 8.1 High)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968]
>  
> jackson-core-asl-1.9.13
> CVE-2016-7051 (CVSS 3.0 Score 8.6 High)
> https://nvd.nist.gov/vuln/detail/CVE-2016-7051



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38665) upgrade jackson due to CVE-2020-36518

2022-03-26 Thread PJ Fanning (Jira)
PJ Fanning created SPARK-38665:
--

 Summary: upgrade jackson due to CVE-2020-36518
 Key: SPARK-38665
 URL: https://issues.apache.org/jira/browse/SPARK-38665
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: PJ Fanning


* https://github.com/FasterXML/jackson-databind/issues/2816
* only jackson-databind has a 2.13.2.1 release
* other jackson jars should stay at 2.13.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37630) Security issue from Log4j 1.X exploit

2022-02-04 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487275#comment-17487275
 ] 

PJ Fanning commented on SPARK-37630:


[~jinlow] there is little point commenting on this closed issue - please look 
at https://issues.apache.org/jira/browse/SPARK-6305 - this issue is marked as a 
duplicate of that and progress has been made on the switch to log4jv2

> Security issue from Log4j 1.X exploit
> -
>
> Key: SPARK-37630
> URL: https://issues.apache.org/jira/browse/SPARK-37630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.2.0
>Reporter: Ismail H
>Priority: Major
>  Labels: security
>
> log4j is being used in version [1.2.17|#L122]]
>  
> This version has been deprecated and since [then have a known issue that 
> hasn't been adressed in 1.X 
> versions|https://www.cvedetails.com/cve/CVE-2019-17571/].
>  
> *Solution:*
>  * Upgrade log4j to version 2.15.0 which correct all known issues. [Last 
> known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37625) update log4j to 2.15

2021-12-13 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458802#comment-17458802
 ] 

PJ Fanning commented on SPARK-37625:


log4j 2.16.0 is out - it might be best to pause this as it doesn't seem urgent 
to change Spark

> update log4j to 2.15 
> -
>
> Key: SPARK-37625
> URL: https://issues.apache.org/jira/browse/SPARK-37625
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: weifeng zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37630) Security issue from Log4j 0day exploit

2021-12-13 Thread PJ Fanning (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458798#comment-17458798
 ] 

PJ Fanning commented on SPARK-37630:


Maybe a duplicate of SPARK-6305

> Security issue from Log4j 0day exploit
> --
>
> Key: SPARK-37630
> URL: https://issues.apache.org/jira/browse/SPARK-37630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.2.0
>Reporter: Ismail H
>Priority: Major
>  Labels: security
>
> log4j is being used in version [1.2.17|#L122]]
>  
> This version has been deprecated and since [then have a known issue that 
> hasn't been adressed in 1.X 
> versions|https://www.cvedetails.com/cve/CVE-2019-17571/].
>  
> *Solution:*
>  * Upgrade log4j to version 2.15.0 which correct all known issues. [Last 
> known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27683) Remove usage of TraversableOnce

2019-05-12 Thread PJ Fanning (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838086#comment-16838086
 ] 

PJ Fanning edited comment on SPARK-27683 at 5/12/19 3:01 PM:
-

[~srowen] would it be possible to use the scala-collection-compat lib? It has a 
type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 
2.12 versions of the lib but to the core IterableOnce in 2.13.

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156]

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22]

The akka team created equivalent type aliases to avoid the dependency on 
scala-collection-compat and this approach could be used to add additional type 
aliases that suit Spark's requirements.

 

 


was (Author: pj.fanning):
[~srowen] would it be possible to use the scala-collection-compat lib? It has a 
type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 
2.12 versions of the lib but to the core IterableOnce in 2.13.

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156]

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22]

 

 

> Remove usage of TraversableOnce
> ---
>
> Key: SPARK-27683
> URL: https://issues.apache.org/jira/browse/SPARK-27683
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
>
> As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We 
> should use {{IterableOnce}} instead. This one is a bigger change as there are 
> more API methods with the existing signature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27683) Remove usage of TraversableOnce

2019-05-12 Thread PJ Fanning (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838086#comment-16838086
 ] 

PJ Fanning commented on SPARK-27683:


[~srowen] would it be possible to use the scala-collection-compat lib? It has a 
type alias `IterableOnce` that maps to `TraversableOnce` in the scala 2.11 and 
2.12 versions of the lib but to the core IterableOnce in 2.13.

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.11_2.12/scala/collection/compat/PackageShared.scala#L156]

[https://github.com/scala/scala-collection-compat/blob/master/compat/src/main/scala-2.13/scala/collection/compat/package.scala#L22]

 

 

> Remove usage of TraversableOnce
> ---
>
> Key: SPARK-27683
> URL: https://issues.apache.org/jira/browse/SPARK-27683
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
>
> As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We 
> should use {{IterableOnce}} instead. This one is a bigger change as there are 
> more API methods with the existing signature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21708) use sbt 1.x

2019-01-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21708:
---
Summary: use sbt 1.x  (was: use sbt 1.0.0)

> use sbt 1.x
> ---
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>Priority: Minor
>
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> According to https://github.com/sbt/sbt/issues/3424, we will need to change 
> the HTTP location where we get the sbt-launch jar.
> Other related issues:
> SPARK-14401
> https://github.com/typesafehub/sbteclipse/issues/343
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124012#comment-16124012
 ] 

PJ Fanning commented on SPARK-21708:


[~srowen] Your point about IDEs is valid. IntelliJ IDEA has support 
(https://blog.jetbrains.com/scala/2017/07/19/intellij-idea-scala-plugin-2017-2-sbt-1-0-improved-sbt-shell-play-2-6-and-better-implicits-management/)
 and hopefully the sbteclipse plugin for generating eclipse workspaces from sbt 
files will be updated soon. All in all, there are quite a number of sbt plugins 
to upgrade before the sbt version can be raised to 1.0.0.
So, by the time we are in a position to switch to 1.0.0, it should be easier 
for developers to adapt. 

> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>Priority: Minor
>
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> According to https://github.com/sbt/sbt/issues/3424, we will need to change 
> the HTTP location where we get the sbt-launch jar.
> Other related issues:
> SPARK-14401
> https://github.com/typesafehub/sbteclipse/issues/343
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21708:
---
Description: 
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

According to https://github.com/sbt/sbt/issues/3424, we will need to change the 
HTTP location where we get the sbt-launch jar.

Other related issues:
SPARK-14401
https://github.com/typesafehub/sbteclipse/issues/343
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14


  was:
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/typesafehub/sbteclipse/issues/343
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14



> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>Priority: Minor
>
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> According to https://github.com/sbt/sbt/issues/3424, we will need to change 
> the HTTP location where we get the sbt-launch jar.
> Other related issues:
> SPARK-14401
> https://github.com/typesafehub/sbteclipse/issues/343
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123451#comment-16123451
 ] 

PJ Fanning commented on SPARK-21708:


[~srowen] the build/sbt scripting will download the preferred sbt version. With 
a good internet connection, it takes a couple of minutes.

> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>Priority: Minor
>
> I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
> released.
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> Other related issues:
> SPARK-14401
> https://github.com/sbt/sbt/issues/3424
> https://github.com/typesafehub/sbteclipse/issues/343
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21709) use sbt 0.13.16 and update sbt plugins

2017-08-11 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21709:
---
Description: 
A preliminary step to SPARK-21708.
Quite a lot of sbt plugin changes needed to get to full sbt 1.0.0 support.


  was:
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/typesafehub/sbteclipse/issues/343
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14



> use sbt 0.13.16 and update sbt plugins
> --
>
> Key: SPARK-21709
> URL: https://issues.apache.org/jira/browse/SPARK-21709
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>
> A preliminary step to SPARK-21708.
> Quite a lot of sbt plugin changes needed to get to full sbt 1.0.0 support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21709) use sbt 0.13.16 and update sbt plugins

2017-08-11 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-21709:
--

 Summary: use sbt 0.13.16 and update sbt plugins
 Key: SPARK-21709
 URL: https://issues.apache.org/jira/browse/SPARK-21709
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.3.0
Reporter: PJ Fanning


I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/typesafehub/sbteclipse/issues/343
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21708:
---
Description: 
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/typesafehub/sbteclipse/issues/343
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14


  was:
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14



> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>
> I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
> released.
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> Other related issues:
> SPARK-14401
> https://github.com/sbt/sbt/issues/3424
> https://github.com/typesafehub/sbteclipse/issues/343
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21708:
---
Description: 
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

Other related issues:
SPARK-14401
https://github.com/sbt/sbt/issues/3424
https://github.com/jrudolph/sbt-dependency-graph/issues/134
https://github.com/AlpineNow/junit_xml_listener/issues/6
https://github.com/spray/sbt-revolver/issues/62
https://github.com/ihji/sbt-antlr4/issues/14


  was:
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released. https://github.com/sbt/sbt/issues/3424
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html


> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>
> I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
> released.
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html
> Other related issues:
> SPARK-14401
> https://github.com/sbt/sbt/issues/3424
> https://github.com/jrudolph/sbt-dependency-graph/issues/134
> https://github.com/AlpineNow/junit_xml_listener/issues/6
> https://github.com/spray/sbt-revolver/issues/62
> https://github.com/ihji/sbt-antlr4/issues/14



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14401) Switch to stock sbt-pom-reader plugin

2017-08-11 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123319#comment-16123319
 ] 

PJ Fanning commented on SPARK-14401:


This would be useful for a general upgrade to sbt 1.0.0

> Switch to stock sbt-pom-reader plugin
> -
>
> Key: SPARK-14401
> URL: https://issues.apache.org/jira/browse/SPARK-14401
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Spark currently depends on a forked version of {{sbt-pom-reader}} which we 
> build from source. It would be great to port our modifications to the 
> upstream project so that we can migrate to the official version and stop 
> maintaining our fork.
> [~scrapco...@gmail.com], could you edit this ticket to fill in more detail 
> about which custom changes have not been ported yet?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21708:
---
Description: 
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released. https://github.com/sbt/sbt/issues/3424
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html

  was:
I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html


> use sbt 1.0.0
> -
>
> Key: SPARK-21708
> URL: https://issues.apache.org/jira/browse/SPARK-21708
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>
> I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
> released. https://github.com/sbt/sbt/issues/3424
> Should improve sbt build times.
> http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21708) use sbt 1.0.0

2017-08-11 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-21708:
--

 Summary: use sbt 1.0.0
 Key: SPARK-21708
 URL: https://issues.apache.org/jira/browse/SPARK-21708
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.3.0
Reporter: PJ Fanning


I had a quick look and I think we'll need to wait until sbt-launch 1.0 jar is 
released.
Should improve sbt build times.
http://www.scala-sbt.org/1.0/docs/sbt-1.0-Release-Notes.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090125#comment-16090125
 ] 

PJ Fanning commented on SPARK-20871:


[~srowen] One of the main reasons for the Janino compile to fail is if the 
generated code is too large (> 64k). In this case, printing the full code  is 
not as useful.
The code is still logged in full at debug level, so users can always 
reconfigure their logging to get the full logging.
In my team's use case, the fallback to not using codegen works ok and the 
codegen is still so fast that it suits us best to allow the codegen and have 
the automatic fallback in the edge case of the codegen failing. We have a SaaS 
application that uses Spark and we don't really want our user's code to appear 
in our logs by default.

> Only log Janino code in debug mode
> --
>
> Key: SPARK-20871
> URL: https://issues.apache.org/jira/browse/SPARK-20871
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Glen Takahashi
>Priority: Trivial
> Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png
>
>
> Currently if Janino code compilation fails, it will log the entirety of the 
> code in the executors. Because the generated code can often be very large, 
> the logging can cause heap pressure on the driver and cause it to fall over.
> I propose removing the "$formatted" from here: 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-20871:
---
Fix Version/s: 2.3.0

> Only log Janino code in debug mode
> --
>
> Key: SPARK-20871
> URL: https://issues.apache.org/jira/browse/SPARK-20871
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Glen Takahashi
> Fix For: 2.3.0
>
> Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png
>
>
> Currently if Janino code compilation fails, it will log the entirety of the 
> code in the executors. Because the generated code can often be very large, 
> the logging can cause heap pressure on the driver and cause it to fall over.
> I propose removing the "$formatted" from here: 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-20871:
---
Component/s: (was: Spark Core)
 SQL

> Only log Janino code in debug mode
> --
>
> Key: SPARK-20871
> URL: https://issues.apache.org/jira/browse/SPARK-20871
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Glen Takahashi
> Fix For: 2.3.0
>
> Attachments: 6a57e344-3fcf-11e7-85cc-52a06df2a489.png
>
>
> Currently if Janino code compilation fails, it will log the entirety of the 
> code in the executors. Because the generated code can often be very large, 
> the logging can cause heap pressure on the driver and cause it to fall over.
> I propose removing the "$formatted" from here: 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L964



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21438) Update to scalatest 3.x

2017-07-17 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-21438:
---
Issue Type: Sub-task  (was: Task)
Parent: SPARK-14220

> Update to scalatest 3.x
> ---
>
> Key: SPARK-21438
> URL: https://issues.apache.org/jira/browse/SPARK-21438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: PJ Fanning
>
> Only scalatest 3.x supports scala 2.12.
> https://mvnrepository.com/artifact/org.scalatest/scalatest_2.12
> This has already been attempted as part of SPARK-18896 but there were issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21438) Update to scalatest 3.x

2017-07-17 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-21438:
--

 Summary: Update to scalatest 3.x
 Key: SPARK-21438
 URL: https://issues.apache.org/jira/browse/SPARK-21438
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 2.3.0
Reporter: PJ Fanning


Only scalatest 3.x supports scala 2.12.
https://mvnrepository.com/artifact/org.scalatest/scalatest_2.12

This has already been attempted as part of SPARK-18896 but there were issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20495) Add StorageLevel to cacheTable API

2017-05-05 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998435#comment-15998435
 ] 

PJ Fanning commented on SPARK-20495:


Thanks everyone for working on this change. Is it too late to consider this for 
v2.2.0 or even v2.2.1?

> Add StorageLevel to cacheTable API 
> ---
>
> Key: SPARK-20495
> URL: https://issues.apache.org/jira/browse/SPARK-20495
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
> Fix For: 2.3.0
>
>
> Currently, cacheTable API always uses the default MEMORY_AND_DISK storage 
> level. We can add a new cacheTable API with the extra parameter StorageLevel. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20539) support optional dataframe name

2017-04-30 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-20539:
--

 Summary: support optional dataframe name
 Key: SPARK-20539
 URL: https://issues.apache.org/jira/browse/SPARK-20539
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: PJ Fanning


In the Spark UI and some exception logging, Dataframes are described using the 
schemas. This is very useful.
We use Spark SQL in an application where our customers can manipulate data. 
When we need to examine logs or check the Spark REST API or UI, we would prefer 
to be able to override the name of the Dataframe to be something that 
identifies the origin of the Dataframe as opposed to having the column names 
exposed. There is a small possibility that the column names could contain some 
Personally Identifiable Information.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20458) support getting Yarn Tracking URL in code

2017-04-25 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-20458:
--

 Summary: support getting Yarn Tracking URL in code
 Key: SPARK-20458
 URL: https://issues.apache.org/jira/browse/SPARK-20458
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.1.0
Reporter: PJ Fanning


org.apache.spark.deploy.yarn.Client logs the Yarn tracking URL but it would be 
useful to be able to access this in code, as opposed to mining log output.

I have an application where I monitor the health of the SparkContext and 
associated Executors using the Spark REST API.

Would it be feasible to add a listener API to listen for new ApplicationReports 
in org.apache.spark.deploy.yarn.Client? Alternatively, this URL could be 
exposed as a property associated with the SparkContext.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18896) Suppress ScalaCheck warning -- Unknown ScalaCheck args provided when executing tests using sbt

2016-12-22 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769991#comment-15769991
 ] 

PJ Fanning commented on SPARK-18896:


I noticed from the pull request that you are looking at possibly upgrading 
scalatest too. Getting to scalatest 3.0.1 would be useful for later scala 2.12 
support. Scalatest 2.x is not cross compiled for Scala 2.12.

> Suppress ScalaCheck warning -- Unknown ScalaCheck args provided when 
> executing tests using sbt
> --
>
> Key: SPARK-18896
> URL: https://issues.apache.org/jira/browse/SPARK-18896
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> While executing tests for {{DAGScheduler}} I've noticed the following warning:
> {code}
> > core/testOnly org.apache.spark.scheduler.DAGSchedulerSuite
> ...
> [info] Warning: Unknown ScalaCheck args provided: -oDF
> {code}
> The reason is due to a bug in ScalaCheck as reported in 
> https://github.com/rickynils/scalacheck/issues/212 and fixed in 
> https://github.com/rickynils/scalacheck/commit/df435a5 that is available in 
> ScalaCheck 1.13.4.
> Spark uses [ScalaCheck 
> 1.12.5|https://github.com/apache/spark/blob/master/pom.xml#L717] which is 
> behind the latest 1.12.6 [released on Nov 
> 1|https://github.com/rickynils/scalacheck/releases] (not to mention 1.13.4).
> Let's get rid of ScalaCheck's warning (and perhaps upgrade ScalaCheck along 
> the way too!).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked

2016-08-12 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning closed SPARK-16716.
--
Resolution: Duplicate

This looks like it was fixed by SPARK-16664

> calling cache on joined dataframe can lead to data being blanked
> 
>
> Key: SPARK-16716
> URL: https://issues.apache.org/jira/browse/SPARK-16716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: PJ Fanning
>
> I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code.
> The code works ok on Spark 1.6.1.
> I have a notebook up on Databricks Community Edition that demonstrates the 
> issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0
> The code uses some custom code to join 4 dataframes.
> It calls show on this dataframe and the data is as expected.
> After calling .cache, the data is blanked.
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked

2016-08-12 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419430#comment-15419430
 ] 

PJ Fanning commented on SPARK-16716:


I set up an equivalent notebook for spark 2.0 in Databricks community edition 
and the join and cache worked out. It appears that issue is just in Spark 1.6.2.

> calling cache on joined dataframe can lead to data being blanked
> 
>
> Key: SPARK-16716
> URL: https://issues.apache.org/jira/browse/SPARK-16716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: PJ Fanning
>
> I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code.
> The code works ok on Spark 1.6.1.
> I have a notebook up on Databricks Community Edition that demonstrates the 
> issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0
> The code uses some custom code to join 4 dataframes.
> It calls show on this dataframe and the data is as expected.
> After calling .cache, the data is blanked.
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16716) calling cache on joined dataframe can lead to data being blanked

2016-07-25 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-16716:
--

 Summary: calling cache on joined dataframe can lead to data being 
blanked
 Key: SPARK-16716
 URL: https://issues.apache.org/jira/browse/SPARK-16716
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.2
Reporter: PJ Fanning


I have reproduced the issue in Spark 1.6.2 and latest 1.6.3-SNAPSHOT code.
The code works ok on Spark 1.6.1.
I have a notebook up on Databricks Community Edition that demonstrates the 
issue. The notebook depends on the library com.databricks:spark-csv_2.10:1.4.0
The code uses some custom code to join 4 dataframes.
It calls show on this dataframe and the data is as expected.
After calling .cache, the data is blanked.

https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5458351705459939/3760010872339805/5521341683971298/latest.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15615) Support for creating a dataframe from JSON in Dataset[String]

2016-05-27 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-15615:
--

 Summary: Support for creating a dataframe from JSON in 
Dataset[String] 
 Key: SPARK-15615
 URL: https://issues.apache.org/jira/browse/SPARK-15615
 Project: Spark
  Issue Type: Bug
Reporter: PJ Fanning


We should deprecate DataFrameReader.scala json(rdd: RDD[String]) and support 
json(ds: Dataset[String]) instead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15463) Support for creating a dataframe from CSV in Dataset[String]

2016-05-26 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-15463:
---
Summary: Support for creating a dataframe from CSV in Dataset[String]  
(was: Support for creating a dataframe from CSV in RDD[String])

> Support for creating a dataframe from CSV in Dataset[String]
> 
>
> Key: SPARK-15463
> URL: https://issues.apache.org/jira/browse/SPARK-15463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: PJ Fanning
>
> I currently use Databrick's spark-csv lib but some features don't work with 
> Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV 
> support into spark-sql directly, that spark-csv won't be modified.
> I currently read some CSV data that has been pre-processed and is in 
> RDD[String] format.
> There is sqlContext.read.json(rdd: RDD[String]) but other formats don't 
> appear to support the creation of DataFrames based on loading from 
> RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]

2016-05-25 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299881#comment-15299881
 ] 

PJ Fanning edited comment on SPARK-15463 at 5/25/16 11:09 AM:
--

Dataset[String] to DataFrame conversion seems fine to me. Would it make sense 
to change sqlContext.read.json(rdd: RDD[String]) to sqlContext.read.json(ds: 
Dataset[String]) too? 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala


was (Author: pj.fanning):
Dataset[String] to DataFrame conversion seems fine to me

> Support for creating a dataframe from CSV in RDD[String]
> 
>
> Key: SPARK-15463
> URL: https://issues.apache.org/jira/browse/SPARK-15463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: PJ Fanning
>
> I currently use Databrick's spark-csv lib but some features don't work with 
> Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV 
> support into spark-sql directly, that spark-csv won't be modified.
> I currently read some CSV data that has been pre-processed and is in 
> RDD[String] format.
> There is sqlContext.read.json(rdd: RDD[String]) but other formats don't 
> appear to support the creation of DataFrames based on loading from 
> RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]

2016-05-25 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299881#comment-15299881
 ] 

PJ Fanning commented on SPARK-15463:


Dataset[String] to DataFrame conversion seems fine to me

> Support for creating a dataframe from CSV in RDD[String]
> 
>
> Key: SPARK-15463
> URL: https://issues.apache.org/jira/browse/SPARK-15463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: PJ Fanning
>
> I currently use Databrick's spark-csv lib but some features don't work with 
> Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV 
> support into spark-sql directly, that spark-csv won't be modified.
> I currently read some CSV data that has been pre-processed and is in 
> RDD[String] format.
> There is sqlContext.read.json(rdd: RDD[String]) but other formats don't 
> appear to support the creation of DataFrames based on loading from 
> RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]

2016-05-21 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-15463:
--

 Summary: Support for creating a dataframe from CSV in RDD[String]
 Key: SPARK-15463
 URL: https://issues.apache.org/jira/browse/SPARK-15463
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: PJ Fanning


I currently use Databrick's spark-csv lib but some features don't work with 
Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV support 
into spark-sql directly, that spark-csv won't be modified.
I currently read some CSV data that has been pre-processed and is in 
RDD[String] format.
There is sqlContext.read.json(rdd: RDD[String]) but other formats don't appear 
to support the creation of DataFrames based on loading from RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]

2016-05-21 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-15463:
---
Issue Type: Improvement  (was: Bug)

> Support for creating a dataframe from CSV in RDD[String]
> 
>
> Key: SPARK-15463
> URL: https://issues.apache.org/jira/browse/SPARK-15463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: PJ Fanning
>
> I currently use Databrick's spark-csv lib but some features don't work with 
> Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV 
> support into spark-sql directly, that spark-csv won't be modified.
> I currently read some CSV data that has been pre-processed and is in 
> RDD[String] format.
> There is sqlContext.read.json(rdd: RDD[String]) but other formats don't 
> appear to support the creation of DataFrames based on loading from 
> RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12956) add spark.yarn.hdfs.home.directory property

2016-04-03 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning closed SPARK-12956.
--
Resolution: Duplicate

> add spark.yarn.hdfs.home.directory property
> ---
>
> Key: SPARK-12956
> URL: https://issues.apache.org/jira/browse/SPARK-12956
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: PJ Fanning
>
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
>  uses the default home directory based on the hadoop configuration. I have a 
> use case where it would be useful to override this and to provide an explicit 
> base path.
> If this seems like a generally use config property, I can put together a pull 
> request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12956) add spark.yarn.hdfs.home.directory property

2016-04-03 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223463#comment-15223463
 ] 

PJ Fanning commented on SPARK-12956:


[~tgraves] I think you can close this as a duplicate of SPARK-13063

> add spark.yarn.hdfs.home.directory property
> ---
>
> Key: SPARK-12956
> URL: https://issues.apache.org/jira/browse/SPARK-12956
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: PJ Fanning
>
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
>  uses the default home directory based on the hadoop configuration. I have a 
> use case where it would be useful to override this and to provide an explicit 
> base path.
> If this seems like a generally use config property, I can put together a pull 
> request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12956) add spark.yarn.hdfs.home.directory property

2016-01-21 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-12956:
--

 Summary: add spark.yarn.hdfs.home.directory property
 Key: SPARK-12956
 URL: https://issues.apache.org/jira/browse/SPARK-12956
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.6.0
Reporter: PJ Fanning


https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 uses the default home directory based on the hadoop configuration. I have a 
use case where it would be useful to override this and to provide an explicit 
base path.
If this seems like a generally use config property, I can put together a pull 
request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8616) SQLContext doesn't handle tricky column names when loading from JDBC

2015-12-28 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072697#comment-15072697
 ] 

PJ Fanning commented on SPARK-8616:
---

Seems to duplicate the 'In Progress' task, SPARK-12437.

> SQLContext doesn't handle tricky column names when loading from JDBC
> 
>
> Key: SPARK-8616
> URL: https://issues.apache.org/jira/browse/SPARK-8616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: Ubuntu 14.04, Sqlite 3.8.7, Spark 1.4.0
>Reporter: Gergely Svigruha
>
> Reproduce:
>  - create a table in a relational database (in my case sqlite) with a column 
> name containing a space:
>  CREATE TABLE my_table (id INTEGER, "tricky column" TEXT);
>  - try to create a DataFrame using that table:
> sqlContext.read.format("jdbc").options(Map(
>   "url" -> "jdbs:sqlite:...",
>   "dbtable" -> "my_table")).load()
> java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (no such 
> column: tricky)
> According to the SQL spec this should be valid:
> http://savage.net.au/SQL/sql-99.bnf.html#delimited%20identifier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11640) shading packages in spark-assembly jar

2015-11-12 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002537#comment-15002537
 ] 

PJ Fanning commented on SPARK-11640:


[~sowen] Thanks - the hadoop-provided profile does lead to bouncycastle classes 
being excluded from the spark-assembly jar

> shading packages in spark-assembly jar
> --
>
> Key: SPARK-11640
> URL: https://issues.apache.org/jira/browse/SPARK-11640
> Project: Spark
>  Issue Type: Wish
>  Components: Build
>Reporter: PJ Fanning
>
> The spark assembly jar contains classes from many external dependencies like 
> hadoop and bouncycastle.
> I have run into issues trying to use bouncycastle code in a Spark job because 
> the JCE codebase expects the encryption code to be in a signed jar and since 
> the classes are copied into spark-assembly jar and it is not signed, the JCE 
> framework returns an error.
> If the bouncycastle classes in spark-assembly were shaded, then I could 
> deploy the properly signed bcprov jar. The spark code could access the shaded 
> copies of the bouncycastle classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11640) shading packages in spark-assembly jar

2015-11-10 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-11640:
---
Issue Type: Wish  (was: Bug)

> shading packages in spark-assembly jar
> --
>
> Key: SPARK-11640
> URL: https://issues.apache.org/jira/browse/SPARK-11640
> Project: Spark
>  Issue Type: Wish
>  Components: Build
>Reporter: PJ Fanning
>
> The spark assembly jar contains classes from many external dependencies like 
> hadoop and bouncycastle.
> I have run into issues trying to use bouncycastle code in a Spark job because 
> the JCE codebase expects the encryption code to be in a signed jar and since 
> the classes are copied into spark-assembly jar and it is not signed, the JCE 
> framework returns an error.
> If the bouncycastle classes in spark-assembly were shaded, then I could 
> deploy the properly signed bcprov jar. The spark code could access the shaded 
> copies of the bouncycastle classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11640) shading packages in spark-assembly jar

2015-11-10 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-11640:
--

 Summary: shading packages in spark-assembly jar
 Key: SPARK-11640
 URL: https://issues.apache.org/jira/browse/SPARK-11640
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: PJ Fanning


The spark assembly jar contains classes from many external dependencies like 
hadoop and bouncycastle.
I have run into issues trying to use bouncycastle code in a Spark job because 
the JCE codebase expects the encryption code to be in a signed jar and since 
the classes are copied into spark-assembly jar and it is not signed, the JCE 
framework returns an error.
If the bouncycastle classes in spark-assembly were shaded, then I could deploy 
the properly signed bcprov jar. The spark code could access the shaded copies 
of the bouncycastle classes.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.None$
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native 

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}


  was:
I just wanted to document this for posterity. I had an issue when running a 
Spark 1.0 app locally with sbt. The issue was that if you both:

1. Reference a scala class (e.g. None) inside of a closure.
2. Run your program with 'sbt run'

It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this 
issue. Somehow scala classes were not being loaded correctly inside of the 
executors:

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.None$
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Attachment: spark-test-case.zip

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell
 Attachments: spark-test-case.zip


 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-8494:
-

 Summary: ClassNotFoundException when running with sbt, scala 
2.10.4, spray 1.3.3
 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell


I just wanted to document this for posterity. I had an issue when running a 
Spark 1.0 app locally with sbt. The issue was that if you both:

1. Reference a scala class (e.g. None) inside of a closure.
2. Run your program with 'sbt run'

It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this 
issue. Somehow scala classes were not being loaded correctly inside of the 
executors:

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies ++= Seq(
  com.h2database  % h2   % 1.4.187,
  com.typesafe.akka  %% akka-actor   % akkaVersion,
  com.typesafe.akka  %% akka-slf4j   % akkaVersion,
  ch.qos.logback  % logback-classic  % 1.0.13,
  io.spray   %% spray-can% sprayVersion,
  io.spray   %% spray-routing% sprayVersion,
  io.spray   %% spray-json   % 1.3.1,
  com.databricks %% spark-csv% 1.0.3,
  org.specs2 %% specs2   % 2.4.17   % test,
  org.specs2 %% specs2-junit % 2.4.17   % test,
  io.spray   %% spray-testkit% sprayVersion   % test,
  com.typesafe.akka  %% akka-testkit % akkaVersion% test,
  junit   % junit% 4.12 % test
)

scalacOptions ++= Seq(
  -unchecked,
  -deprecation,
  -Xlint,
  -Ywarn-dead-code,
  -language:_,
  -target:jvm-1.7,
  -encoding, UTF-8
)

testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
  

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.
I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies ++= Seq(
  com.h2database  % h2   % 1.4.187,
  com.typesafe.akka  %% akka-actor   % akkaVersion,
  com.typesafe.akka  %% akka-slf4j   % akkaVersion,
  ch.qos.logback  % logback-classic  % 1.0.13,
  io.spray   %% spray-can% sprayVersion,
  io.spray   %% spray-routing% sprayVersion,
  io.spray   %% spray-json   % 1.3.1,
  com.databricks %% spark-csv% 1.0.3,
  org.specs2 %% specs2   % 2.4.17   % test,
  org.specs2 %% specs2-junit % 2.4.17   % test,
  io.spray   %% spray-testkit% sprayVersion   % test,
  com.typesafe.akka  %% akka-testkit % akkaVersion% test,
  junit   % junit% 4.12 % test
)

scalacOptions ++= Seq(
  -unchecked,
  -deprecation,
  -Xlint,
  -Ywarn-dead-code,
  -language:_,
  -target:jvm-1.7,
  -encoding, UTF-8
)

testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies 

[jira] [Commented] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread PJ Fanning (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594132#comment-14594132
 ] 

PJ Fanning commented on SPARK-8494:
---

[~pwendell] Apologies about the JIRA being assigned to you. I cloned SPARK-1923 
and now can't change the Assignee.

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org