Thanks, Edward.

I'm big +1 to mavenize Hive. Hive has long reached a point where it's hard
to manage its build using ant. I'd like to help on this too.

Thanks,
Xuefu


On Fri, Aug 16, 2013 at 7:31 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> For those interested in pitching in.
> https://github.com/edwardcapriolo/hive
>
>
>
> On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo <edlinuxg...@gmail.com
> >wrote:
>
> > Summary from hive-irc channel. Minor edits for spell check/grammar.
> >
> > The last 10 lines are a summary of the key points.
> >
> > [10:59:17] <ecapriolo> noland: et all. Do you want to talk about hive in
> > maven?
> > [11:01:06] smonchi [~
> > ro...@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit IRC:
> > Quit: ... 'cause there is no patch for human stupidity ...
> > [11:10:04] <noland> ecapriolo: yeah that sounds good to me!
> > [11:10:22] <noland> I saw you created the jira but haven't had time to
> look
> > [11:10:32] <ecapriolo> So I found a few things
> > [11:10:49] <ecapriolo> In common there is one or two testats that
> actually
> > fork a process :)
> > [11:10:56] <ecapriolo> and use build.test.resources
> > [11:11:12] <ecapriolo> Some serde, uses some methods from ql in testing
> > [11:11:27] <ecapriolo> and shims really needs a separate hadoop test shim
> > [11:11:32] <ecapriolo> But that is all simple stuff
> > [11:11:47] <ecapriolo> The biggest problem is I do not know how to solve
> > shims with maven
> > [11:11:50] <ecapriolo> do you have any ideas
> > [11:11:52] <ecapriolo> ?
> > [11:13:00] <noland> That one is going to be a challenge. It might be that
> > in that section we have to drop down to ant
> > [11:14:44] <noland> Is it a requirement that we build both the .20 and
> .23
> > shims for a "package" as we do today?
> > [11:16:46] <ecapriolo> I was thinking we can do it like a JDBC driver
> > [11:16:59] <ecapriolo> Se separate out the interface of shims
> > [11:17:22] <ecapriolo> And then at runtime we drop in a driver
> implementing
> > [11:17:34] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC: Remote host
> > closed the connection
> > [11:17:36] <ecapriolo> That or we could use maven's profile system
> > [11:18:09] <ecapriolo> It seems that everything else can actually link
> > against hadoop-0.20.2 as a provided dependency
> > [11:18:37] <noland> Yeah either would work. The driver method would
> > probably require use to use ant build both the drivers?
> > [11:18:44] <noland> I am a fan of mvn profiles
> > [11:19:05] <ecapriolo> I was thinking we kinda separate the shim out into
> > its own project,, not a module
> > [11:19:10] <ecapriolo> to achive that jdbc thing
> > [11:19:27] <ecapriolo> But I do not have a solution yet, I was looking to
> > farm that out to someone smart...like you :)
> > [11:19:33] <noland> :)
> > [11:19:47] <ecapriolo> All I know is that we need a test shim because
> > HadoopShim requires hadoop-test jars
> > [11:20:10] <ecapriolo> then the Mini stuff is only used in qtest anyway
> > [11:20:48] <ecapriolo> Is this something you want to help with? I was
> > thinking of spinning up a github
> > [11:20:50] <noland> I think that the separate projects would work and
> > perhaps nicely.
> > [11:21:01] <noland> Yeah I'd be interested in helping!
> > [11:21:17] <noland> But I am going on vacation starting next week for
> > about 10 days
> > [11:21:27] <ecapriolo> Ah cool where are you going?
> > [11:21:37] <noland> Netherlands
> > [11:21:42] <noland> Biking around and such
> > [11:23:52] <noland> The one thing I was thinking about with regards to a
> > branch is keeping history. We'll want to keep history for the files but
> > AFAICT svn doesn't understand git mv.
> > [11:24:16] Wertax [~wer...@wolfkamp.xs4all.nl] has joined #hive
> > [11:31:19] jeromatron [~text...@host90-152-1-162.ipv4.regusnet.com] has
> > quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> > [11:35:49] <ecapriolo> noland: Right I do not play to suggest that we
> will
> > do this in git
> > [11:36:11] <ecapriolo> I just see that we are going to have to hack stuff
> > up and it is not the type of work that lends itself well to branches.
> > [11:36:17] <noland> Ahh ok
> > [11:36:56] <ecapriolo> Once we come up with a solution for the shims, and
> > we have something that can reasonably build and test hive we can figure
> out
> > how to apply that to a branch/trunk
> > [11:36:58] <noland> yeah so just do a POC on github and then implement on
> > svn
> > [11:37:05] <noland> cool
> > [11:37:29] <ecapriolo> Along the way we can probably find things that we
> > can do like that common test I found and other minor things
> > [11:37:41] <noland> sounds good
> > [11:37:50] <ecapriolo> Those we can likely just commit into the current
> > trunk and I will file issues for those now
> > [11:37:58] <noland> cool
> > [11:38:41] <ecapriolo> But yea man. I just cant take the project as it is
> > now
> > [11:38:51] <ecapriolo> in eclipse everytime I touch a file it rebuilds
> > everything!
> > [11:38:53] <ecapriolo> Its like WTF
> > [11:39:09] <ecapriolo> Running one tests takes like 3 minutes
> > [11:39:12] <ecapriolo> its out of control
> > [11:39:23] <noland> LOL
> > [11:39:29] <noland> I agree 110%
> > [11:39:32] <ecapriolo> eclipse was not always like that I am not sure how
> > the hell it happened
> > [11:39:51] <noland> The eclipse sep thing is so harmful
> > [11:40:08] <noland> dep thing that is
> > [11:40:12] <ecapriolo> I mean command line ant was always bad, but you
> > used to be able to work in eclipse without having to rebuild everything
> > every change/test
> > [11:40:39] <noland> Yeah the first thing I do these days is disable the
> > ant builder
> > [11:40:52] <ecapriolo> Ow... I did not really know that was a thing
> > [11:40:55] <noland> it starts compiling while you are still working and
> > blocks for minutes
> > [11:41:02] <ecapriolo> Right that is what I mean
> > [11:41:11] <ecapriolo> Everyone has like 10 hacks to work on the project
> > [11:41:14] <noland> yeah you can remove it in project…one sec
> > [11:41:17] <ecapriolo> perm gen
> > [11:41:20] <ecapriolo> ant builder
> > [11:41:32] <noland> project -> properties -> builders
> > [11:41:34] <ecapriolo> hive does not build offline anymore
> > [11:41:37] <noland> yeah
> > [11:41:47] <ecapriolo> Im not sure when this stuff went bad, but it has
> > gotten really really bad
> > [11:42:09] <ecapriolo> Also what I plan on doing is stripping out
> > non-essentials
> > [11:42:25] <ecapriolo> like serde has all this thrift and avro stuff to
> > support custom formats
> > [11:42:30] <ecapriolo> that is going into its own module
> > [11:42:43] <ecapriolo> Going to rip out all the udfs accept between and
> or.
> > [11:43:50] <noland> yeah it'd be nice to have those items in their own
> > modules so you can just build/test them when you want
> > [11:44:12] <ecapriolo> hbase zookeeper locking
> > [11:44:31] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC: Remote host
> > closed the connection
> > [11:44:44] <noland> yeah for sure
> > [11:45:04] <noland> I think the default for testing should be the in
> > process locking
> > [11:45:10] <ecapriolo> Absolutely.
> > [11:45:40] <ecapriolo> The other issue I want to tackle is hive-exec.jar
> > [11:45:54] <ecapriolo> I want to jar-jar all the dependencies.
> > [11:46:46] <ecapriolo> I run into to many conflicts with log4j and guava,
> > and commons-utils all those things need to be packaged into
> non-conflicting
> > packages
> > [11:46:58] <noland> I haven't looked at how we build that yet but I agree
> > it'd be nice if we could jar-jar things like guava
> > [11:47:12] <noland> so we can actually use them on server side
> > [11:47:16] <ecapriolo> We dont really need quava. its probably just used
> > for one tiny thing
> > [11:47:43] <ecapriolo> People are forgetting/do not understand that
> > hive-exec needs to get sent via the distributed cache
> > [11:47:57] <noland> Wen we implement range joins they have a RangeMap
> that
> > we'll need.
> > [11:47:57] <ecapriolo> so making it hulkingly fat just slows everything
> > down
> > [11:48:11] <noland> Do we ship it every time?
> > [11:48:25] <noland> Cause we only have to ship it once per version of the
> > jar.
> > [11:48:42] <ecapriolo> Recently you need the jackson jars on the auxlib
> as
> > well
> > [11:48:46] <ecapriolo> hive will not work without it
> > [11:49:11] <ecapriolo> People are just focused
> > feature-feature-feature...bigger...bigger bigger
> > [11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has quit
> > IRC: Quit: Leaving
> > [11:49:27] <noland> yeah maven modules will definitely help us understand
> > who depends on what.
> > [11:49:28] <ecapriolo> Next up kyro
> > [11:49:51] <noland> I agree there is a lot of tech debt that needs paying
> > [11:50:30] <ecapriolo> So those are all the high level things I want to
> > tackle
> > [11:50:59] <ecapriolo> shims, general cleanup, break out non-essential
> > code, build a better non conflicting hive-exec jar
> > [11:51:10] <noland> That sounds good. Once we hack on github for a while
> > it'd be nice to develop a brief high level plan on how to implement
> > [11:51:26] <ecapriolo> Also get maven artifacts with correct depencency
> > scopes like provided etc
> > [11:51:40] <ecapriolo> Right now pulling a hive jar from maven is like
> > pulling in the world
> > [11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive
> >
> >
> > On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo <edlinuxg...@gmail.com
> >wrote:
> >
> >> I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I
> >> am growing tired of how long hive's build take.
> >>
> >> I have started playing with this by creating a simple multi-module
> >> project and copying stuff as I go. I have ported a minimal shims and
> common
> >> and I have all the tests in common almost running.
> >>
> >> Q. This is going to be ugly hacky work for a while, I was thinking it
> >> should be a branch but it is just going to be a mess of moves and copies
> >> etc. Not really something you can diff etc.
> >>
> >> Is anyone else interested in working on this as well. If so I think we
> >> can just setup a github and I can arrange for anyone to have access to
> it.
> >>
> >> Thanks,
> >> Edward
> >>
> >>
> >> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo <edlinuxg...@gmail.com
> >wrote:
> >>
> >>> "Some of the hard part was that some of the test classes are in the
> wrong
> >>> module that references classes in a later module."
> >>>
> >>> I think the modules will have to be able to reference each other in
> many
> >>> cases. Serde and QL are tightly coupled. QL is really too large and we
> >>> should find a way to cut that up.
> >>>
> >>> Part of this problem is the q.tests
> >>>
> >>> I think one way to handle this is to only allow unit tests inside the
> >>> module. I imagine running all the q tests would be done in a final
> module
> >>> hive-qtest. Or possibly two final modules
> >>> hive-qtest
> >>> hive-qtest-extra (tangential things like UDFS and input formats not
> core
> >>> to hive)
> >>>
> >>>
> >>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley <omal...@apache.org
> >wrote:
> >>>
> >>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com <
> >>>> kulkarni.swar...@gmail.com> wrote:
> >>>>
> >>>> > > I'd like to propose we move towards Maven.
> >>>> >
> >>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase,
> avro
> >>>> etc.)
> >>>> > are maven based.
> >>>> >
> >>>>
> >>>> A big +1 from me too. I actually took a pass at it a couple of months
> >>>> ago.
> >>>> Some of the hard part was that some of the test classes are in the
> wrong
> >>>> module that references classes in a later module. Obviously that
> >>>> prevents
> >>>> any kind of modular build.
> >>>>
> >>>> As an additional plus to Maven is that Maven includes tools to correct
> >>>> the
> >>>> project and module dependencies.
> >>>>
> >>>> -- Owen
> >>>>
> >>>
> >>>
> >>
> >
>

Reply via email to