Thank you all. I'll try to make JIRA and PR for that.
Bests, Dongjoon. On Wed, Nov 20, 2019 at 4:08 PM Cheng Lian <lian.cs....@gmail.com> wrote: > Sean, thanks for the corner cases you listed. They make a lot of sense. > Now I do incline to have Hive 2.3 as the default version. > > Dongjoon, apologize if I didn't make it clear before. What made me > concerned initially was only the following part: > > > can we remove the usage of forked `hive` in Apache Spark 3.0 completely > officially? > > So having Hive 2.3 as the default Hive version and adding a `hive-1.2` > profile to keep the Hive 1.2.1 fork looks like a feasible approach to me. > Thanks for starting the discussion! > > On Wed, Nov 20, 2019 at 9:46 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Yes. Right. That's the situation we are hitting and the result I expected. >> We need to change our default with Hive 2 in the POM. >> >> Dongjoon. >> >> >> On Wed, Nov 20, 2019 at 5:20 AM Sean Owen <sro...@gmail.com> wrote: >> >>> Yes, good point. A user would get whatever the POM says without >>> profiles enabled so it matters. >>> >>> Playing it out, an app _should_ compile with the Spark dependency >>> marked 'provided'. In that case the app that is spark-submit-ted is >>> agnostic to the Hive dependency as the only one that matters is what's >>> on the cluster. Right? we don't leak through the Hive API in the Spark >>> API. And yes it's then up to the cluster to provide whatever version >>> it wants. Vendors will have made a specific version choice when >>> building their distro one way or the other. >>> >>> If you run a Spark cluster yourself, you're using the binary distro, >>> and we're already talking about also publishing a binary distro with >>> this variation, so that's not the issue. >>> >>> The corner cases where it might matter are: >>> >>> - I unintentionally package Spark in the app and by default pull in >>> Hive 2 when I will deploy against Hive 1. But that's user error, and >>> causes other problems >>> - I run tests locally in my project, which will pull in a default >>> version of Hive defined by the POM >>> >>> Double-checking, is that right? if so it kind of implies it doesn't >>> matter. Which is an argument either way about what's the default. I >>> too would then prefer defaulting to Hive 2 in the POM. Am I missing >>> something about the implication? >>> >>> (That fork will stay published forever anyway, that's not an issue per >>> se.) >>> >>> On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> > Sean, our published POM is pointing and advertising the illegitimate >>> Hive 1.2 fork as a compile dependency. >>> > Yes. It can be overridden. So, why does Apache Spark need to publish >>> like that? >>> > If someone want to use that illegitimate Hive 1.2 fork, let them >>> override it. We are unable to delete those illegitimate Hive 1.2 fork. >>> > Those artifacts will be orphans. >>> > >>> >>