On Fri, Jan 14, 2022 at 9:34 AM Daniel Collins <dpcoll...@google.com> wrote:

> > In particular the Hadoop/Spark and Kafka dependencies must be
> **provided** as they were. I am not sure of others but those three matter.
>
> I think there's a bit of a difference here between what should be the
> state in the short term versus the long term.
>
> In the short term, I agree that we should avoid changes to how these
> dependencies are reflected in the POM.
>
> In the long term, I don't think it makes sense for these to continue to be
> "provided" dependencies- if users wish to use a different version of
> hadoop, spark or kafka, they can explicitly override the dependencies with
> the version they want when building their JAR, even if there is a version
> listed as "compile" in the POM file on maven central. The only difference
> is that if they don't have a version preference, the one listed in the POM
> (that we tested with) will be used, which seems like an unambiguous win to
> me.
>

Agree with the sentiment. But I believe the issue is that some tooling will
bundle up the "compile" dependencies and submit with the job, which will
then have a conflict with the libraries on the cluster. On the other hand,
the user will always want to override the "provided" version to match the
cluster, in which case it will just be harmless duplicates on the
classpath, no? I guess huge file size, but it isn't the 90s any more. Since
Ismaël commented, maybe he can help to clarify. I also knew about this
reasoning for Spark & Hadoop but I don't know exactly what is required to
make it work right.

This could become a bothersome issue long term - Gradle dev community has
lots of posts that indicate they don't agree with the existence of
"provided" or "optional" dependencies. (I happen to agree with them, but
philosophy is not the point). We should have a very clear solution for the
cases that require one, and document at least on the wiki.

Kenn


>
> -Daniel
>
> On Thu, Jan 13, 2022 at 4:19 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> Optional dependencies should not be a major issue.
>>
>> What matters to validate that we are not breaking users is to compare
>> the generated POM files with the previous (pre gradle 7 / 2.35.0)
>> version and see that what was provided is still provided.
>>
>> In particular the Hadoop/Spark and Kafka dependencies must be
>> **provided** as they were. I am not sure of others but those three
>> matter.
>>
>> Ismaël
>>
>> On Wed, Jan 12, 2022 at 10:55 PM Emily Ye <emil...@google.com> wrote:
>> >
>> > We've chatted offline and have a tentative plan for what to do with
>> these dependencies that are currently marked as compileOnly (instead of
>> provided). Please review the list if possible [1].
>> >
>> > Two projects we aren't sure about:
>> >
>> > :sdks:java:io:hcatalog
>> >
>> > library.java.jackson_annotations
>> > library.java.jackson_core
>> > library.java.jackson_databind
>> > library.java.hadoop_common
>> > org.apache.hive:hive-exec
>> > org.apache.hive.hcatalog:hive-hcatalog-core
>> >
>> > :sdks:java:io:parquet
>> >
>> > library.java.hadoop_client
>> >
>> >
>> > Does anyone have experience with either of these IOs? ccing Chamikara
>> >
>> > Thank you,
>> > Emily
>> >
>> >
>> > [1]
>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>> >
>> > On Tue, Jan 11, 2022 at 6:38 PM Emily Ye <emil...@google.com> wrote:
>> >>
>> >> As the person volunteering to do fixes for this to unblock Beam
>> 2.36.0, I created a spreadsheet of the projects with dependencies changed
>> from provided to compile only [1]. I pre-filled with what I think things
>> should be, but I don't have very much background in java/maven/gradle
>> configurations so please give input!
>> >>
>> >> Some (mainly hadoop/kafka) I left blank, since I'm not sure - do we
>> keep them provided because it depends on the user's version?
>> >>
>> >> [1]
>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>> >>
>> >> On Tue, Jan 11, 2022 at 1:17 PM Luke Cwik <lc...@google.com> wrote:
>> >>>
>> >>> I'm not following what you're trying to say Kenn since provided in
>> maven requires the user to explicitly add the dependency themselves to have
>> it part of their runtime.
>> >>>
>> >>> As per
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope
>> >>> "
>> >>> * provided
>> >>> This is much like compile, but indicates you expect the JDK or a
>> container to provide the dependency at runtime. For example, when building
>> a web application for the Java Enterprise Edition, you would set the
>> dependency on the Servlet API and related Java EE APIs to scope provided
>> because the web container provides those classes. A dependency with this
>> scope is added to the classpath used for compilation and test, but not the
>> runtime classpath. It is not transitive."
>> >>>
>> >>> On Tue, Jan 11, 2022 at 11:54 AM Kenneth Knowles <k...@apache.org>
>> wrote:
>> >>>>
>> >>>> To clarify: "provided" should have been in the test runtime
>> configuration, but not in the shipped runtime configuration (otherwise dep
>> resolution for users would pull in provided deps, which should not happen)
>> >>>>
>> >>>> On Thu, Dec 30, 2021 at 10:05 AM Luke Cwik <lc...@google.com> wrote:
>> >>>>>
>> >>>>> During the migration to Gradle 7[1] the propdeps plugin was
>> removed[2] since there wasn't a newer version that was compatible with
>> Gradle 7 and a replacement couldn't be found. All existing usages of
>> "provided" were moved to "compileOnly" and "compileOnly" is being mapped to
>> the "provided" maven scope in the generated pom files. This has lead to two
>> issues:
>> >>>>> 1) provided was also part of the runtime configuration, so we are
>> getting a few class not found exceptions when running tests [3]
>> >>>>> 2) the generated pom.xml will have a bunch of compile time only
>> annotations added as a provided dependency in the generated pom files[4]
>> >>>>>
>> >>>>> #1 can be fixed by adding the dependency to both the "compileOnly"
>> and "runtimeOnly" configurations or by adding dependency to the
>> "implementation" configuration
>> >>>>> #2 will make the pom files messier which can lead to confusion for
>> users but shouldn't impact existing uses.
>> >>>>>
>> >>>>> There was a suggestion[4] to completely remove the usage of
>> provided from the generated pom.xml and have all our previously "provided"
>> dependencies declared as "implementation" allowing us to solve both #1 and
>> #2 above.
>> >>>>>
>> >>>>> The largest usage of "provided" in the past was to packages related
>> to the hadoop ecosystem and afterwards it was for packages such as
>> junit/hamcrest/aircompressor in sdks/java/core which aren't required to use
>> the module but can provide additional features if the dependency exists.
>> >>>>>
>> >>>>> What should we migrate if anything to the "implementation"
>> configuration or should we try to recreate what we were doing with the
>> "provided" configuration in the past?
>> >>>>>
>> >>>>> 1: https://issues.apache.org/jira/browse/BEAM-13430
>> >>>>> 2: https://github.com/apache/beam/pull/16308
>> >>>>> 3: https://issues.apache.org/jira/browse/BEAM-13569
>> >>>>> 4:
>> https://github.com/apache/beam/blob/fe456b79419d1a67ebf13d7d4b6695fa1aa6204d/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L964
>> >>>>> 5: https://issues.apache.org/jira/browse/BEAM-13504
>> >>>>>
>>
>

Reply via email to