[
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639706#comment-13639706
]
Sushanth Sowmyan commented on HIVE-4305:
----------------------------------------
A lot of comparisons on this thread have been comparing a pure maven approach
with an ant-based approach, and to be honest, there are good sides and bad for
both. A "pure" maven project that is "done well" is simpler for developers and
better/simpler as a build system. And it is worth it for most development
projects to try to spend the time trying to fix their build systems. And yes,
hive's build is a complex enough beast to want to simplify it. But that is a
huge undertaking, with no promise that it'll be a successful transition -
projects that start as Maven projects have an easier time getting there than
projects that don't and then undergo mavenization.
Ant+Ivy, for the most part, "works" currently. If someone does care about
mavenization enough to work on a patch and contribute it, we can compare the
approaches. Without that, we're arguing our individual bad experiences with
spaghetti ant builds and inflexible maven builds, and feeding those experiences
into unproductive vitriol(which, btw, also chases away other people who want to
contribute to the discussion). Let's try to cool down a bit and look at
implementable changes for now.
On the question of standardizing between ant or maven as the primary build
system, I'm going to suggest we go with ant for now.
--
Given ant for building, there are still multiple build combinations potentially
at play, and it is those that I hoped we could discuss in this thread :
Ant for building, in conjucntion with:
1) ivy for publishing to local ivy cache
2) ivy for publishing to local maven cache
3) maven-ant-tasks for publishing to local maven cache
& using
a) ivy for dependency resolution & retrieving
b) maven-ant-tasks for dependency resolution & retrieving
I) The publishing scenario:
Among the systems I've described above, HCatalog was using 3-b & Hive was using
1-a along with a bit of 3 as a separate bit for publishing to repositories.
When we were building hcatalog outside of hive and dependent on it, we always
had to build hive, then do a maven publish, and then use it from hcat. When
hcat was merged in with hive, this became a problem because hcat's build was
integrated into the middle of hive's build, before we got an opportunity to
publish to the local maven cache, which we have now temporarily patched in a
hacky manner.
In my experience, ivy is more flexible than maven-ant-tasks in terms of
dependency resolution (I'll get to that in the second section), so Ivy can
fetch from a maven-published cache/repo, but maven-ant-tasks has issues
fetching from an ivy cache. In terms of how third-party projects can consume
hive and/or hcatalog, publishing to a maven repo is the way to go to be
permissible and flexible.
Both ivy and maven-ant-tasks are able to both publish to a local maven cache
and use the same codepath to publish to a maven repo as well. Ignoring
maven-ant-tasks for now, is there a need to have ivy publish it to ivy-cache
for the build, and have a separate task to publish to a maven repo? Couldn't we
streamline this to have ivy just publish and pull from the local maven cache?
This is not an invasive change to hive, and it makes it easier for other
projects to depend on and work with hive, and it streamlines hive's build as
well, by not making it be a special case to publish maven artifacts at publish
time. Is there a good technical reason to avoid this?
I'm okay with using ivy to publish and thus streamline, but I would prefer to
publish and retrieve from local maven-cache in doing so.
--
II) The dependency resolution scenario.
The two tools at hand here are maven-ant-tasks and ivy. At HCatalog, we used to
use ivy, and then we moved to maven-ant-tasks in an attempt to eventually
mavenize, and thus use a single pom.xml which would be a transition point to
eventual mavenization, but we hadn't got there yet at the time we merged with
hive. At this point in time, I'm leaning towards using ivy, and changing
HCatalog back to using ivy.
The real problems that we've faced with maven-ant-tasks, however, is with
transitive dependencies and variable definitions. I might simply not know how
to resolve this, so if you can tell me how to resolve these issues, we might be
able to fix these.
Problem#1 : variable definitions. Currently, hcatalog has a primary pom.xml,
with all its subcomponents defining that pom.xml as their parent. Now, the
problem is that they have to explicitly mention which version of the parent is
their specific parent. So:
In our primary pom.xml, we have:
{noformat}
<groupId>org.apache.hcatalog</groupId>
<artifactId>hcatalog</artifactId>
<version>0.12.0-SNAPSHOT</version>
<properties>
...
<hcatalog.version>${project.version}</hcatalog.version>
<hive.version>${project.version}</hive.version>
</properties>
{noformat}
Here, the properties seem to be read and parsed after the version is set, so
it's usable inside this pom.xml. However, for the child pom.xml files, inside
hcatalog-pig-adaptor, for example, we have to refer to the parent pom.xml, but
at the time we encounter this pom.xml, we either need to specify its version,
and then say that its parent is the same version, or we have to skip specifying
its version, and specify the parent's version before it can load the parent
pom.xml. What this means is that either way, I wind up explicitly having a line
in there with "0.12.0-SNAPSHOT".
If I were using mvn and not maven-ant-tasks, I would not have this problem as I
could pass in an external variable ${hive.version} and could use it inside. I
could even play around with things like ${env.HIVE_VERSION} if I so pleased.
However, these are not being interpolated and read by maven-ant-tasks, and I
don't see a way of specifying them from within the ant task, which forces
hardcoding of these versions inside the pom.xml, and multiple of those, before
I build. Effectively, for me, pom.xml is not a build source file but a
generated artifact of the build process. And I'd argue that in an ant-based
build, that's actually the correct way of going about it.
And if that's the case, ivy:makepom actually does a pretty good job of making a
pom file.
Problem #2 : Transitive dependencies : We had some major issues with the
hcatalog build only recently where we were bringing in jersey 1.9 which had a
hardcoded dependency on another package on a hardcoded glassfish repo which was
taken down. On our end, we were not able to disable the transitive dependency
on the glassfish repo, and the only thing we could do was bump our jersey
dependency to a version which had removed that repo. With ivy, before moving to
maven-ant-tasks, that was not a problem.
See HCATALOG-601 for details on this issue.
These two problems in particular, and not wanting to be too invasive in hive's
current build make me prefer ivy over maven-ant-tasks for dependency resolution
itself.
> Use a single system for dependency resolution
> ---------------------------------------------
>
> Key: HIVE-4305
> URL: https://issues.apache.org/jira/browse/HIVE-4305
> Project: Hive
> Issue Type: Improvement
> Components: Build Infrastructure, HCatalog
> Reporter: Travis Crawford
> Assignee: Carl Steinbach
>
> Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy
> for dependency resolution while HCatalog uses maven-ant-tasks. With the
> project merge we should converge on a single tool for dependency resolution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira