[ 
https://issues.apache.org/jira/browse/SPARK-30491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-30491:
---------------------------------
    Description: 
Current dependency audit files under `dev/deps` only show jar names. And there 
isn't a simple rule on how to parse the jar name to get the values of different 
fields. For example, `hadoop2` is the classifier of 
`avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of 
`htrace-core-3.1.0-incubating.jar`.

Thus, I propose to enable dependency audit files to tell the value of artifact 
id, version, and classifier of a dependency. For example, 
`avro-mapred-1.8.2-hadoop2.jar` should be expanded to 
`avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` 
is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier.

In this way, dependency audit files are able to be consumed by automated tests 
or downstream tools. There is a good example of the downstream tool that would 
be enabled:
Say we have a Spark application that depends on a third-party dependency `foo`, 
which pulls in `jackson` as a transient dependency. Unfortunately, `foo` 
depends on a different version of `jackson` than Spark. So, in the pom of this 
Spark application, we use the dependency management section to pin the version 
of `jackson`. By doing this, we are lifting `jackson` to the top-level 
dependency of my application and I want to have a way to keep tracking what 
Spark uses. What we can do is to cross-check my Spark application's classpath 
with what Spark uses. Then, with a test written in my code base, whenever my 
application bumps Spark version, this test will check what we define in the 
application and what Spark has, and then remind us to change our application's 
pom if needed. In my case, I am fine to directly access git to get these audit 
files.

 

  was:
Dependency audit files under `dev/deps` only show jar names. Given that, it is 
not trivial to figure out the dependency classifiers.

For example, `avro-mapred-1.8.2-hadoop2.jar` is made up of artifact id 
`avro-mapred`, version `1.8.2`, and classifier `hadoop2`. In contrast, 
`htrace-core-3.1.0-incubating.jar` is made up of artifact id `htrace-core`, and 
version `3.1.0-incubating.jar`. 

All in all, the classifier can't be told from its position in jar name, 
however, as part of the identifier of dependency, it should be clearly figured 
out.

 


> Enable dependency audit files to tell dependency classifier
> -----------------------------------------------------------
>
>                 Key: SPARK-30491
>                 URL: https://issues.apache.org/jira/browse/SPARK-30491
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Xinrong Meng
>            Assignee: Xinrong Meng
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Current dependency audit files under `dev/deps` only show jar names. And 
> there isn't a simple rule on how to parse the jar name to get the values of 
> different fields. For example, `hadoop2` is the classifier of 
> `avro-mapred-1.8.2-hadoop2.jar`, in contrast, `incubating` is the version of 
> `htrace-core-3.1.0-incubating.jar`.
> Thus, I propose to enable dependency audit files to tell the value of 
> artifact id, version, and classifier of a dependency. For example, 
> `avro-mapred-1.8.2-hadoop2.jar` should be expanded to 
> `avro-mapred/1.8.2/hadoop2/avro-mapred-1.8.2-hadoop2.jar` where `avro-mapred` 
> is the artifact id, `1.8.2` is the version, and `haddop2` is the classifier.
> In this way, dependency audit files are able to be consumed by automated 
> tests or downstream tools. There is a good example of the downstream tool 
> that would be enabled:
> Say we have a Spark application that depends on a third-party dependency 
> `foo`, which pulls in `jackson` as a transient dependency. Unfortunately, 
> `foo` depends on a different version of `jackson` than Spark. So, in the pom 
> of this Spark application, we use the dependency management section to pin 
> the version of `jackson`. By doing this, we are lifting `jackson` to the 
> top-level dependency of my application and I want to have a way to keep 
> tracking what Spark uses. What we can do is to cross-check my Spark 
> application's classpath with what Spark uses. Then, with a test written in my 
> code base, whenever my application bumps Spark version, this test will check 
> what we define in the application and what Spark has, and then remind us to 
> change our application's pom if needed. In my case, I am fine to directly 
> access git to get these audit files.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to