[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853172#comment-17853172 ] James Grinter commented on SPARK-32385: --- Regarding the convention of projects publishing boms, those I'm familiar with serve to specify all of the projects' own artifact versions, rather than downstream artifacts' versions as well. But there _would_ seem to be some overlap between the Spark assembly pom, and this proposed idea of having a bom-style pom that includes transitive dependencies. I agree with Vladimir that it is often desirable to "fix" the versions of artifacts when building a Spark application, because many of the jars are flagged as "provided", and so one can be fooled into thinking that the application is using different versions than will actually be present in the run-time environment. I also share Vladimir's frustration at Maven's version resolution mechanisms: they come up short when one needs to ensure that a newer version of a dependency, which does not have a known vulnerability, is used for the build (to correctly run verification, validation and testing, and to satisfy security scanning tools) and at run-time (to actually fix the issue.) > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286704#comment-17286704 ] Shannon Carey commented on SPARK-32385: --- Here's another reason that either a BOM or a move away from dependency management on the Spark side would be helpful. Problems such as this [https://stackoverflow.com/questions/42352091/spark-sql-fails-with-java-lang-noclassdeffounderror-org-codehaus-commons-compil] occur even if the user has apparently done everything right. The Spark top-level POM specifies version 3.0.9 of janino in its , but when Maven pulls that transitive dependency in via something like spark-sql, it gets the latest version instead (such as 3.1.2). This occurs due to surprising behavior in Maven, recorded in https://issues.apache.org/jira/browse/MNG-5761 andhttps://issues.apache.org/jira/browse/MNG-6141 . This problem forces people to add direct dependencies to specific versions of transitive things, sometimes without understanding the cause of the issue, and leads to POMs being more fragile. If you provide a BOM, that could help with this, if the versions are specified. Or, don't rely purely on dependency management in Maven, for libraries. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186855#comment-17186855 ] Vladimir Matveev commented on SPARK-32385: -- > I am still not quite sure what this gains if it doesn't change dependency > resolution. It does not change dependency resolution by itself, it adds an ability for the user, if they want to, to automatically lock onto versions explicitly declared by the Spark project. So yeah, this: > Is it just that you declare one artifact POM to depend on that declares a > bunch of dependent versions, so people don't go depending on different > versions? pretty much summarizes it; this could be expanded to say that (depending on the build tool) it may also enforce these dependent versions in case of conflicts. > I mean people can already do that by setting some spark.version property in > their build. They can't in general, because while it will enforce the Spark's own version, it won't necessarily determine the versions of transitive dependencies. The latter will only happen when the consumer also uses Maven, and when they have a particular order of dependencies in their POM declaration (e.g. no newer Jackson version declared transitively lexically earlier than Spark). > What is important is: if we change the build and it changes Spark's > transitive dependencies for downstream users, that could be a breaking change. My understanding is that this should not happen, unless the user explicitly opts into using the BOM, in which case it arguably changes the situation for the better in most cases, because now versions are guaranteed to align with Spark's declarations. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186850#comment-17186850 ] Sean R. Owen commented on SPARK-32385: -- OK, so it's just a different theory of organizing the Maven dependencies? that could be OK to clean up, but I think we'd have to see a prototype of some of the change to understand what it's going to mean. I get it, BOMs are just a design pattern in Maven, not some new tool or thing. I am still not quite sure what this gains if it doesn't change dependency resolution. Is it just that you declare one artifact POM to depend on that declares a bunch of dependent versions, so people don't go depending on different versions? For Spark artifacts? I mean people can already do that by setting some spark.version property in their build. If it doesn't change transitive dependency handling, what does it do - or does it? I have no opinion on whether closest-first or latest-first resolution is more sound. I think Maven is still probably more widely used but don't particularly care. What is important is: if we change the build and it changes Spark's transitive dependencies for downstream users, that could be a breaking change. Or: anything we can do to make the dependency resolution consistent across SBT, Gradle etc is a win for sure. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186846#comment-17186846 ] Vladimir Matveev commented on SPARK-32385: -- Sorry for the delayed response! > This requires us fixing every version of every transitive dependency. How > does that get updated as the transitive dependency graph changes? this > exchanges one problem for another I think. That is, we are definitely not > trying to fix dependency versions except where necessary. I don't think this is right — you don't have to fix more than just direct dependencies, like you already do. It's pretty much the same thing as defining the version numbers like [here|https://github.com/apache/spark/blob/a0bd273bb04d9a5684e291ec44617972dcd4accd/pom.xml#L121-L197] and then declaring specific dependencies with the versions below. It's just it is done slightly differently, by using Maven's `` mechanism and POM inheritance (for Maven; for Gradle e.g. it would be this "platform" thing). > Gradle isn't something that this project supports, but, wouldn't this be a > much bigger general problem if its resolution rules are different from Maven? > that is, surely gradle can emulate Maven if necessary. I don't think Gradle can emulate Maven, and I personally don't think it should, because Maven's strategy for conflict resolution is quite unconventional, and is not used by most of the dependency management tools, not just in the Java world. Also, I naturally don't have statistics, so this is just my speculation, but it seems likely to me that most of the downstream projects which use Spark don't actually use Maven for dependency management, especially given its Scala heritage. Therefore, they can't take advantage of Maven's dependency resolution algorithm and the current Spark's POM configuration. Also I'd like to point out again that this whole BOM mechanism is something which _Maven_ supports natively, it's not a Gradle extension or something. The BOM concept originated in Maven, and it is declared using Maven's {{}} block, which is a part of POM syntax. Hopefully this would reduce some of the concerns about it. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: >
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178588#comment-17178588 ] Sean R. Owen commented on SPARK-32385: -- This requires us fixing every version of every transitive dependency. How does that get updated as the transitive dependency graph changes? this exchanges one problem for another I think. That is, we are definitely not trying to fix dependency versions except where necessary. Gradle isn't something that this project supports, but, wouldn't this be a much bigger general problem if its resolution rules are different from Maven? that is, surely gradle can emulate Maven if necessary. (We have the same issue with SBT, which is why it is not used for builds or publishing artifacts) > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178581#comment-17178581 ] Vladimir Matveev commented on SPARK-32385: -- [~srowen] a BOM descriptor can be used as a "platform" in Gradle and most likely in Maven (I don't know for sure, but the concept of BOM originates from Maven, so supposedly the tool itself supports it) to enforce compatible version numbers in the dependency graph. Just regular POMs cannot do this, because a regular POM forms just a single node in a dependency graph, and most of the dependency resolution tools take the entire graph of dependencies into account, which may result in accidentally bumped versions somewhere, require manual and ad-hoc resolution in most cases. With BOM, it is sufficient to tell the dependency engine that it should use this BOM to enforce versions, and that's it - the versions will now be fixed to the versions declared by the framework (Spark in this case). As I said, Spring framework uses this concept to a great success to ensure that applications using Spring always have compatible and tested versions. Naturally, `deps/` files is just a list of jar files, and cannot be used for dependency resolution. Also note that such a BOM descriptor would allow to centralize the version declarations within the Spark project itself, so it won't be something "on top" to support, at least as far as I understand it. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178528#comment-17178528 ] Sean R. Owen commented on SPARK-32385: -- What does this record that isn't available from a POM and/or the 'deps/' files? I get the problem about dependencies - total nightmare. But do we want yet another description to manage? it doesn't solve the problem. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167527#comment-17167527 ] Vladimir Matveev commented on SPARK-32385: -- [~hyukjin.kwon] almost: those are just lists of the artifacts in distribution, while BOMs are proper Maven POM descriptors which contain information about dependencies in terms of Maven coordinates. This makes BOMs usable directly as input to build systems like Gradle. Still, the general idea is similar, I guess. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167504#comment-17167504 ] DB Tsai commented on SPARK-32385: - +1 This will be very useful for users to include Spark as deps. [~hyukjin.kwon] from [https://www.baeldung.com/spring-maven-bom] Following is an example of how to write a BOM file: {code:java} 4.0.0 baeldung Baeldung-BOM 0.0.1-SNAPSHOT pom BaelDung-BOM parent pom test a 1.2 test b 1.0 compile test c 1.0 compile {code} As we can see, the BOM is a normal POM file with a dependencyManagement section where we can include all an artifact's information and versions. > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32385) Publish a "bill of materials" (BOM) descriptor for Spark with correct versions of various dependencies
[ https://issues.apache.org/jira/browse/SPARK-32385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166115#comment-17166115 ] Hyukjin Kwon commented on SPARK-32385: -- Do you mean something like this? https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-2.7-hive-1.2 https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-2.7-hive-2.3 https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2-hive-2.3 > Publish a "bill of materials" (BOM) descriptor for Spark with correct > versions of various dependencies > -- > > Key: SPARK-32385 > URL: https://issues.apache.org/jira/browse/SPARK-32385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Vladimir Matveev >Priority: Major > > Spark has a lot of dependencies, many of them very common (e.g. Guava, > Jackson). Also, versions of these dependencies are not updated as frequently > as they are released upstream, which is totally understandable and natural, > but which also means that often Spark has a dependency on a lower version of > a library, which is incompatible with a higher, more recent version of the > same library. This incompatibility can manifest in different ways, e.g as > classpath errors or runtime check errors (like with Jackson), in certain > cases. > > Spark does attempt to "fix" versions of its dependencies by declaring them > explicitly in its {{pom.xml}} file. However, this approach, being somewhat > workable if the Spark-using project itself uses Maven, breaks down if another > build system is used, like Gradle. The reason is that Maven uses an > unconventional "nearest first" version conflict resolution strategy, while > many other tools like Gradle use the "highest first" strategy which resolves > the highest possible version number inside the entire graph of dependencies. > This means that other dependencies of the project can pull a higher version > of some dependency, which is incompatible with Spark. > > One example would be an explicit or a transitive dependency on a higher > version of Jackson in the project. Spark itself depends on several modules of > Jackson; if only one of them gets a higher version, and others remain on the > lower version, this will result in runtime exceptions due to an internal > version check in Jackson. > > A widely used solution for this kind of version issues is publishing of a > "bill of materials" descriptor (see here: > [https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html] > and here: > [https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]). > This descriptor would contain all versions of all dependencies of Spark; then > downstream projects will be able to use their build system's support for BOMs > to enforce version constraints required for Spark to function correctly. > > One example of successful implementation of the BOM-based approach is Spring: > [https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring > projects, e.g. Spring Boot, there are BOM descriptors published which can be > used in downstream projects to fix the versions of Spring components and > their dependencies, significantly reducing confusion around proper version > numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org