Kimahriman opened a new pull request, #735:
URL: https://github.com/apache/incubator-sedona/pull/735

   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/community/rule/) and [Contributor Development 
Guide](https://sedona.apache.org/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the assoicated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-212. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   Using the problem of shading to rethink how dependencies are managed. 
Definitely trying to start a discussion and this is a WIP, finally got the 
tests to pass.
   
   The general way things are setup now is "everything is a provided 
dependency, except for python-adapter where everything is shaded". This 
presents a few problems mentioned in the ticket:
   - It's very difficult to deal with dependency conflicts if one of the 
conflicts is in a shaded jar. You can't simply exclude the transitive dependency
   - It's very difficult to actually use the package. What I assume most people 
do is default to just using the python-adapter module so they get all the 
things (assuming they don't have problems with the shading). This is very 
awkward, especially if you aren't using Python. For example, we purely use the 
SQL functions in the `sql` module, but without knowing what sub-dependencies I 
need, I just have to default to the python-adapter module so I have the 
dependencies I need.
   - The python-adapter module has all the dependencies shaded, _plus_ they are 
still compile scope dependencies in the result pom, resulting in double 
including all the dependencies when using Java based build tools or the spark 
`--packages` option. This is a bug due to the `resolved-pom-maven-plugin` and 
the `maven-shade-plugin` not playing nice together, so the 
`dependency-reduced-pom` doesn't end up as the actual pom for the module.
   
   Because of all that, I'm proposing a few changes in this PR:
   - Anything that should be a compile dependency is scoped as such, except for 
geotools for the licensing reasons.
   - Dependencies are defined in a `dependencyManagement` section in the parent 
pom instead of directly defining dependencies for all modules that might only 
be relevant to one. Each module includes just the relevant dependencies it 
needs.
   - Scala plugins are all opt-in to make it clear/sure that the common module 
does not have any odd Scala dependencies even for testing.
   - python-adapter still includes shading, but in a separate target used for 
the Python tests.
   
   I did have to work through various exclusions to get the tests to work. I 
was able to remove the defining the jackson version for all testing purposes. 
   
   Questions to answer would be:
   - Is there any reason to actually publish a shaded module? Anything Spark 
related can either:
     - Java/Scala: use their Java build tool of choice to include the right 
dependencies in their subsequent jar
     - Python: use the `--packages` feature to automatically pull all required 
transitive dependencies
   - I don't know anything about Flink and if there is a use case for that 
needing a shaded module
   - What to do with the new edu.ucar module? It looks like it's in its own 
private repo, so for those behind a proxy that don't have access to every repo 
on the internet, this could cause dependency resolution to fail.
   
   ## How was this patch tested?
   Existing tests.
   
   ## Did this PR include necessary documentation updates?
   - There may have to be documentation updates if this route is accepted, 
would need to look at what the current examples do.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to