Hi all, For the database import tool I'm writing (Sqoop; HADOOP-5815), in addition to uploading data into HDFS and using MapReduce to load/transform the data, I'd like to integrate more closely with Hive. Specifically, to run the CREATE TABLE statements needed to automatically inject table defintions into Hive's metastore for the data files that sqoop loads into HDFS. Doing this requires linking against Hive in some way (either directly by using one of their API libraries, or "loosely" by piping commands into a Hive instance).
In either case, there's a dependency there. I was hoping someone on this list with more Ivy experience than I knows what's the best way to make this happen. Hive isn't in the maven2 repository that Hadoop pulls most of its dependencies from. It might be necessary for sqoop to have access to a full build of Hive. It doesn't seem like a good idea to check that binary distribution into Hadoop svn, but I'm not sure what's the most expedient alternative. Is it acceptable to just require that developers who wish to compile/test/run sqoop have a separate standalone Hive deployment and a proper HIVE_HOME variable? This would keep our source repo "clean." The downside here is that it makes it difficult to test Hive-specific integration functionality with Hudson and requires extra leg-work of developers. Thanks, - Aaron Kimball