Yes, good point. A user would get whatever the POM says without
profiles enabled so it matters.

Playing it out, an app _should_ compile with the Spark dependency
marked 'provided'. In that case the app that is spark-submit-ted is
agnostic to the Hive dependency as the only one that matters is what's
on the cluster. Right? we don't leak through the Hive API in the Spark
API. And yes it's then up to the cluster to provide whatever version
it wants. Vendors will have made a specific version choice when
building their distro one way or the other.

If you run a Spark cluster yourself, you're using the binary distro,
and we're already talking about also publishing a binary distro with
this variation, so that's not the issue.

The corner cases where it might matter are:

- I unintentionally package Spark in the app and by default pull in
Hive 2 when I will deploy against Hive 1. But that's user error, and
causes other problems
- I run tests locally in my project, which will pull in a default
version of Hive defined by the POM

Double-checking, is that right? if so it kind of implies it doesn't
matter. Which is an argument either way about what's the default. I
too would then prefer defaulting to Hive 2 in the POM. Am I missing
something about the implication?

(That fork will stay published forever anyway, that's not an issue per se.)

On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
> Sean, our published POM is pointing and advertising the illegitimate Hive 1.2 
> fork as a compile dependency.
> Yes. It can be overridden. So, why does Apache Spark need to publish like 
> that?
> If someone want to use that illegitimate Hive 1.2 fork, let them override it. 
> We are unable to delete those illegitimate Hive 1.2 fork.
> Those artifacts will be orphans.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to