[ https://issues.apache.org/jira/browse/SPARK-30491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xinrong Meng updated SPARK-30491: --------------------------------- Description: Current dependency audit files under `dev/deps` only show jar names. And there isn't a simple rule on how to parse the jar name to get the values of different fields. For example, `hadoop2` is the classifier of `avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of `htrace-core-3.1.0-incubating.jar`. Thus, I propose to enable dependency audit files to tell the value of artifact id, version, and classifier of a dependency. For example, `avro-mapred-1.8.2-hadoop2.jar` should be expanded to `avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier. In this way, dependency audit files are able to be consumed by automated tests or downstream tools. There is a good example of the downstream tool that would be enabled: Say we have a Spark application that depends on a third-party dependency `foo`, which pulls in `jackson` as a transient dependency. Unfortunately, `foo` depends on a different version of `jackson` than Spark. So, in the pom of this Spark application, we use the dependency management section to pin the version of `jackson`. By doing this, we are lifting `jackson` to the top-level dependency of my application and I want to have a way to keep tracking what Spark uses. What we can do is to cross-check my Spark application's classpath with what Spark uses. Then, with a test written in my code base, whenever my application bumps Spark version, this test will check what we define in the application and what Spark has, and then remind us to change our application's pom if needed. In my case, I am fine to directly access git to get these audit files. was: Dependency audit files under `dev/deps` only show jar names. Given that, it is not trivial to figure out the dependency classifiers. For example, `avro-mapred-1.8.2-hadoop2.jar` is made up of artifact id `avro-mapred`, version `1.8.2`, and classifier `hadoop2`. In contrast, `htrace-core-3.1.0-incubating.jar` is made up of artifact id `htrace-core`, and version `3.1.0-incubating.jar`. All in all, the classifier can't be told from its position in jar name, however, as part of the identifier of dependency, it should be clearly figured out. > Enable dependency audit files to tell dependency classifier > ----------------------------------------------------------- > > Key: SPARK-30491 > URL: https://issues.apache.org/jira/browse/SPARK-30491 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.4, 3.0.0 > Reporter: Xinrong Meng > Assignee: Xinrong Meng > Priority: Major > Fix For: 3.0.0 > > > Current dependency audit files under `dev/deps` only show jar names. And > there isn't a simple rule on how to parse the jar name to get the values of > different fields. For example, `hadoop2` is the classifier of > `avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of > `htrace-core-3.1.0-incubating.jar`. > Thus, I propose to enable dependency audit files to tell the value of > artifact id, version, and classifier of a dependency. For example, > `avro-mapred-1.8.2-hadoop2.jar` should be expanded to > `avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` > is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier. > In this way, dependency audit files are able to be consumed by automated > tests or downstream tools. There is a good example of the downstream tool > that would be enabled: > Say we have a Spark application that depends on a third-party dependency > `foo`, which pulls in `jackson` as a transient dependency. Unfortunately, > `foo` depends on a different version of `jackson` than Spark. So, in the pom > of this Spark application, we use the dependency management section to pin > the version of `jackson`. By doing this, we are lifting `jackson` to the > top-level dependency of my application and I want to have a way to keep > tracking what Spark uses. What we can do is to cross-check my Spark > application's classpath with what Spark uses. Then, with a test written in my > code base, whenever my application bumps Spark version, this test will check > what we define in the application and what Spark has, and then remind us to > change our application's pom if needed. In my case, I am fine to directly > access git to get these audit files. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org