[jira] [Comment Edited] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies

2021-02-18 Thread Shannon Carey (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286704#comment-17286704
 ] 

Shannon Carey edited comment on SPARK-32385 at 2/18/21, 8:19 PM:
-

Here's another reason that either a BOM or a move away from dependency 
management on the Spark side would be helpful.

Problems such as this 
[https://stackoverflow.com/questions/42352091/spark-sql-fails-with-java-lang-noclassdeffounderror-org-codehaus-commons-compil]
 occur even if the user has apparently done everything right. The Spark 
top-level POM specifies version 3.0.9 of janino in its , 
but when Maven pulls that transitive dependency in via something like 
spark-sql, it gets the latest version instead (such as 3.1.2). The inaccuracy 
causes exceptions at runtime. This occurs due to surprising behavior in Maven, 
recorded in https://issues.apache.org/jira/browse/MNG-5761 and 
https://issues.apache.org/jira/browse/MNG-6141 .

This problem forces people to add direct dependencies to specific versions of 
transitive things, sometimes without understanding the cause of the issue, and 
leads to POMs being more fragile.

If you provide a BOM, that could help with this, if the versions are specified. 
Or, don't rely purely on dependency management in Maven, for libraries.


was (Author: rehevkor5):
Here's another reason that either a BOM or a move away from dependency 
management on the Spark side would be helpful.

Problems such as this 
[https://stackoverflow.com/questions/42352091/spark-sql-fails-with-java-lang-noclassdeffounderror-org-codehaus-commons-compil]
 occur even if the user has apparently done everything right. The Spark 
top-level POM specifies version 3.0.9 of janino in its , 
but when Maven pulls that transitive dependency in via something like 
spark-sql, it gets the latest version instead (such as 3.1.2). This occurs due 
to surprising behavior in Maven, recorded in 
https://issues.apache.org/jira/browse/MNG-5761 and 
https://issues.apache.org/jira/browse/MNG-6141 .

This problem forces people to add direct dependencies to specific versions of 
transitive things, sometimes without understanding the cause of the issue, and 
leads to POMs being more fragile.

If you provide a BOM, that could help with this, if the versions are specified. 
Or, don't rely purely on dependency management in Maven, for libraries.

> Publish a "bill of materials" (BOM) descriptor for Spark with correct 
> versions of various dependencies
> --
>
> Key: SPARK-32385
> URL: https://issues.apache.org/jira/browse/SPARK-32385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Vladimir Matveev
>Priority: Major
>
> Spark has a lot of dependencies, many of them very common (e.g. Guava, 
> Jackson). Also, versions of these dependencies are not updated as frequently 
> as they are released upstream, which is totally understandable and natural, 
> but which also means that often Spark has a dependency on a lower version of 
> a library, which is incompatible with a higher, more recent version of the 
> same library. This incompatibility can manifest in different ways, e.g as 
> classpath errors or runtime check errors (like with Jackson), in certain 
> cases.
>  
> Spark does attempt to "fix" versions of its dependencies by declaring them 
> explicitly in its {{pom.xml}} file. However, this approach, being somewhat 
> workable if the Spark-using project itself uses Maven, breaks down if another 
> build system is used, like Gradle. The reason is that Maven uses an 
> unconventional "nearest first" version conflict resolution strategy, while 
> many other tools like Gradle use the "highest first" strategy which resolves 
> the highest possible version number inside the entire graph of dependencies. 
> This means that other dependencies of the project can pull a higher version 
> of some dependency, which is incompatible with Spark.
>  
> One example would be an explicit or a transitive dependency on a higher 
> version of Jackson in the project. Spark itself depends on several modules of 
> Jackson; if only one of them gets a higher version, and others remain on the 
> lower version, this will result in runtime exceptions due to an internal 
> version check in Jackson.
>  
> A widely used solution for this kind of version issues is publishing of a 
> "bill of materials" descriptor (see here: 
> [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html]
>  and here: 
> [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). 
> This descriptor would contain all versions of all dependencies of Spark; then 
> downstream projects will be able to use their 

[jira] [Comment Edited] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies

2021-02-18 Thread Shannon Carey (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286704#comment-17286704
 ] 

Shannon Carey edited comment on SPARK-32385 at 2/18/21, 8:12 PM:
-

Here's another reason that either a BOM or a move away from dependency 
management on the Spark side would be helpful.

Problems such as this 
[https://stackoverflow.com/questions/42352091/spark-sql-fails-with-java-lang-noclassdeffounderror-org-codehaus-commons-compil]
 occur even if the user has apparently done everything right. The Spark 
top-level POM specifies version 3.0.9 of janino in its , 
but when Maven pulls that transitive dependency in via something like 
spark-sql, it gets the latest version instead (such as 3.1.2). This occurs due 
to surprising behavior in Maven, recorded in 
https://issues.apache.org/jira/browse/MNG-5761 and 
https://issues.apache.org/jira/browse/MNG-6141 .

This problem forces people to add direct dependencies to specific versions of 
transitive things, sometimes without understanding the cause of the issue, and 
leads to POMs being more fragile.

If you provide a BOM, that could help with this, if the versions are specified. 
Or, don't rely purely on dependency management in Maven, for libraries.


was (Author: rehevkor5):
Here's another reason that either a BOM or a move away from dependency 
management on the Spark side would be helpful.

Problems such as this 
[https://stackoverflow.com/questions/42352091/spark-sql-fails-with-java-lang-noclassdeffounderror-org-codehaus-commons-compil]
 occur even if the user has apparently done everything right. The Spark 
top-level POM specifies version 3.0.9 of janino in its , 
but when Maven pulls that transitive dependency in via something like 
spark-sql, it gets the latest version instead (such as 3.1.2). This occurs due 
to surprising behavior in Maven, recorded in 
https://issues.apache.org/jira/browse/MNG-5761 
andhttps://issues.apache.org/jira/browse/MNG-6141 .

This problem forces people to add direct dependencies to specific versions of 
transitive things, sometimes without understanding the cause of the issue, and 
leads to POMs being more fragile.

If you provide a BOM, that could help with this, if the versions are specified. 
Or, don't rely purely on dependency management in Maven, for libraries.

> Publish a "bill of materials" (BOM) descriptor for Spark with correct 
> versions of various dependencies
> --
>
> Key: SPARK-32385
> URL: https://issues.apache.org/jira/browse/SPARK-32385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Vladimir Matveev
>Priority: Major
>
> Spark has a lot of dependencies, many of them very common (e.g. Guava, 
> Jackson). Also, versions of these dependencies are not updated as frequently 
> as they are released upstream, which is totally understandable and natural, 
> but which also means that often Spark has a dependency on a lower version of 
> a library, which is incompatible with a higher, more recent version of the 
> same library. This incompatibility can manifest in different ways, e.g as 
> classpath errors or runtime check errors (like with Jackson), in certain 
> cases.
>  
> Spark does attempt to "fix" versions of its dependencies by declaring them 
> explicitly in its {{pom.xml}} file. However, this approach, being somewhat 
> workable if the Spark-using project itself uses Maven, breaks down if another 
> build system is used, like Gradle. The reason is that Maven uses an 
> unconventional "nearest first" version conflict resolution strategy, while 
> many other tools like Gradle use the "highest first" strategy which resolves 
> the highest possible version number inside the entire graph of dependencies. 
> This means that other dependencies of the project can pull a higher version 
> of some dependency, which is incompatible with Spark.
>  
> One example would be an explicit or a transitive dependency on a higher 
> version of Jackson in the project. Spark itself depends on several modules of 
> Jackson; if only one of them gets a higher version, and others remain on the 
> lower version, this will result in runtime exceptions due to an internal 
> version check in Jackson.
>  
> A widely used solution for this kind of version issues is publishing of a 
> "bill of materials" descriptor (see here: 
> [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html]
>  and here: 
> [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). 
> This descriptor would contain all versions of all dependencies of Spark; then 
> downstream projects will be able to use their build system's support for BOMs 
> to enforce 

[jira] [Comment Edited] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies

2020-08-28 Thread Vladimir Matveev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186846#comment-17186846
 ] 

Vladimir Matveev edited comment on SPARK-32385 at 8/28/20, 11:37 PM:
-

[~srowen] Sorry for the delayed response!

> This requires us fixing every version of every transitive dependency. How 
> does that get updated as the transitive dependency graph changes? this 
> exchanges one problem for another I think. That is, we are definitely not 
> trying to fix dependency versions except where necessary.

I don't think this is right — you don't have to fix more than just direct 
dependencies, like you already do. It's pretty much the same thing as defining 
the version numbers like 
[here|https://github.com/apache/spark/blob/a0bd273bb04d9a5684e291ec44617972dcd4accd/pom.xml#L121-L197]
 and then declaring specific dependencies with the versions below. It's just it 
is done slightly differently, by using Maven's {{}} 
mechanism and POM inheritance (for Maven; for Gradle e.g. it would be this 
"platform" thing).

> Gradle isn't something that this project supports, but, wouldn't this be a 
> much bigger general problem if its resolution rules are different from Maven? 
> that is, surely gradle can emulate Maven if necessary.

I don't think Gradle can emulate Maven, and I personally don't think it should, 
because Maven's strategy for conflict resolution is quite unconventional, and 
is not used by most of the dependency management tools, not just in the Java 
world. Also, I naturally don't have statistics, so this is just my speculation, 
but it seems likely to me that most of the downstream projects which use Spark 
don't actually use Maven for dependency management, especially given its Scala 
heritage. Therefore, they can't take advantage of Maven's dependency resolution 
algorithm and the current Spark's POM configuration.

Also I'd like to point out again that this whole BOM mechanism is something 
which _Maven_ supports natively, it's not a Gradle extension or something. The 
BOM concept originated in Maven, and it is declared using Maven's 
{{}} block, which is a part of POM syntax. Hopefully this 
would reduce some of the concerns about it.


was (Author: netvl):
Sorry for the delayed response!

> This requires us fixing every version of every transitive dependency. How 
> does that get updated as the transitive dependency graph changes? this 
> exchanges one problem for another I think. That is, we are definitely not 
> trying to fix dependency versions except where necessary.

I don't think this is right — you don't have to fix more than just direct 
dependencies, like you already do. It's pretty much the same thing as defining 
the version numbers like 
[here|https://github.com/apache/spark/blob/a0bd273bb04d9a5684e291ec44617972dcd4accd/pom.xml#L121-L197]
 and then declaring specific dependencies with the versions below. It's just it 
is done slightly differently, by using Maven's `` 
mechanism and POM inheritance (for Maven; for Gradle e.g. it would be this 
"platform" thing).

> Gradle isn't something that this project supports, but, wouldn't this be a 
> much bigger general problem if its resolution rules are different from Maven? 
> that is, surely gradle can emulate Maven if necessary.

I don't think Gradle can emulate Maven, and I personally don't think it should, 
because Maven's strategy for conflict resolution is quite unconventional, and 
is not used by most of the dependency management tools, not just in the Java 
world. Also, I naturally don't have statistics, so this is just my speculation, 
but it seems likely to me that most of the downstream projects which use Spark 
don't actually use Maven for dependency management, especially given its Scala 
heritage. Therefore, they can't take advantage of Maven's dependency resolution 
algorithm and the current Spark's POM configuration.

Also I'd like to point out again that this whole BOM mechanism is something 
which _Maven_ supports natively, it's not a Gradle extension or something. The 
BOM concept originated in Maven, and it is declared using Maven's 
{{}} block, which is a part of POM syntax. Hopefully this 
would reduce some of the concerns about it.

> Publish a "bill of materials" (BOM) descriptor for Spark with correct 
> versions of various dependencies
> --
>
> Key: SPARK-32385
> URL: https://issues.apache.org/jira/browse/SPARK-32385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Vladimir Matveev
>Priority: Major
>
> Spark has a lot of dependencies, many of them very common (e.g. Guava, 
> Jackson). Also, versions of these dependencies are not updated as frequently 
> as 

[jira] [Comment Edited] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies

2020-07-29 Thread DB Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167504#comment-17167504
 ] 

DB Tsai edited comment on SPARK-32385 at 7/29/20, 9:04 PM:
---

+1 This will be very useful for users to include Spark as deps.

[~hyukjin.kwon] from  [https://www.baeldung.com/spring-maven-bom]

Following is an example of how to write a BOM file: 
{code:java}

4.0.0
baeldung
Baeldung-BOM
0.0.1-SNAPSHOT
pom
BaelDung-BOM
parent pom



test
a
1.2


test
b
1.0
compile


test
c
1.0
compile




{code}

As we can see, the BOM is a normal POM file with a dependencyManagement section 
where we can include all an artifact's information and versions.




was (Author: dbtsai):
+1 This will be very useful for users to include Spark as deps.

 

[~hyukjin.kwon] from  [https://www.baeldung.com/spring-maven-bom]

Following is an example of how to write a BOM file: 
{code:java}

4.0.0
baeldung
Baeldung-BOM
0.0.1-SNAPSHOT
pom
BaelDung-BOM
parent pom



test
a
1.2


test
b
1.0
compile


test
c
1.0
compile




{code}

As we can see, the BOM is a normal POM file with a dependencyManagement section 
where we can include all an artifact's information and versions.



> Publish a "bill of materials" (BOM) descriptor for Spark with correct 
> versions of various dependencies
> --
>
> Key: SPARK-32385
> URL: https://issues.apache.org/jira/browse/SPARK-32385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Vladimir Matveev
>Priority: Major
>
> Spark has a lot of dependencies, many of them very common (e.g. Guava, 
> Jackson). Also, versions of these dependencies are not updated as frequently 
> as they are released upstream, which is totally understandable and natural, 
> but which also means that often Spark has a dependency on a lower version of 
> a library, which is incompatible with a higher, more recent version of the 
> same library. This incompatibility can manifest in different ways, e.g as 
> classpath errors or runtime check errors (like with Jackson), in certain 
> cases.
>  
> Spark does attempt to "fix" versions of its dependencies by declaring them 
> explicitly in its {{pom.xml}} file. However, this approach, being somewhat 
> workable if the Spark-using project itself uses Maven, breaks down if another 
> build system is used, like Gradle. The reason is that Maven uses an 
> unconventional "nearest first" version conflict resolution strategy, while 
> many other tools like Gradle use the "highest first" strategy which resolves 
> the highest possible version number inside the entire graph of dependencies. 
> This means that other dependencies of the project can pull a higher version 
> of some dependency, which is incompatible with Spark.
>  
> One example would be an explicit or a transitive dependency on a higher 
> version of Jackson in the project. Spark itself depends on several modules of 
> Jackson; if only one of them gets a higher version, and others remain on the 
> lower version, this will result in runtime exceptions due to an internal 
> version check in Jackson.
>  
> A widely used solution for this kind of version issues is publishing of a 
> "bill of materials" descriptor (see here: 
> [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html]
>  and here: 
> [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). 
> This descriptor would contain all versions of all dependencies of Spark; then 
> downstream projects will be able to use their build system's support for BOMs 
> to enforce version constraints required for Spark to function correctly.
>  
> One example of successful implementation of the BOM-based approach is Spring: 
> [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring 
> projects, e.g. Spring Boot, there are BOM descriptors published which can be 
> used in downstream projects to fix the versions of Spring components and 
> their dependencies, significantly reducing confusion around proper version 
> numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)