Hi! Firs of all, let me articulate the particular concern that I have for Hive (and it is Hive-specific, as in Pig, for example, doesn't suffer from the same issue). I apologize for not clearly articulating it earlier.
Basically with Hive the fundamental problem is that it relies on the presence of the symlinks to hbase.jar under /usr/lib/hive/lib in order for the Hive-HBase integration to work. Unlike Pig, even when HBase is installed on the system under a well known location the symlinks have to there in order for Hive to be able to interface with HBase. Currently these symlinks get installed by the Hive package. With your proposed change they will be installed by hive-hbase package. This means change in behaviour on the systems where HBase is present with or without hbase dependency coming from the Hive package. Now, historically Bigtop hasn't really paid much attention to being backwards compatible between releases. Hence this is not a -1, but rather food for thought. Now to comment on your points: On Thu, Jun 28, 2012 at 10:01 PM, Bruno Mahé <[email protected]> wrote: > On 06/28/2012 09:51 PM, Roman Shaposhnik wrote: >> Personally, I think I'm reasonably fine with #3 (after all it kind of >> combines >> #1 with an extra package) with the only concern that I remember >> being potential combinatoric explosion of these helper packages >> (e.g. hive-hbase, hive-cassandra, hive-hbase-cassandra, etc). >> >> Thanks, >> Roman. > > I am not sure to follow why you would have a combinatorial explosion? > Following your example you would have the following packages: > * hive which provides the main non-optional features > * hive-hbase which provides and pulls everything necessary for an > integration with hbase. hive-hbase depending on hive > * hive-cassandra which provides and pulls everything necessary for an > integration with cassandra. hive-cassandra depending on hive > > So then depending on your needs you could install (hive), (hive and > hive-hbase), (hive and hive-cassandra) or (hive and hive-hbase and > hive-cassandra) if you need both. > You want subpackages as orthogonal as possible rather than per use cases. I suppose this could work for as long as the interaction between these types of subpackages is indeed orthogonal (as in -- presence or absence of hive-hbase integration doesn't affect hive-cassandra integration). At this point I can't think of a case where it should be a problem so I'm +0 on the approach. That said -- what really would make me an enthusiastic +1 is if we could make Hive behave more like Pig when it comes to HBase integration. Let me take a look into the launcher script and see whether there's a low hanging fruit in there. Thanks, Roman.
