Re: a hive thrift alternative

Alejandro Abdelnur Wed, 02 May 2012 14:02:48 -0700

Hi Ed,

I've checked this with Carl and got the following:


----
HIVE-2503 doesn't really fix the underlying problems. The example that
I gave in that earlier email of HiveServer reusing the same HiveConf
between disconnects is still valid on trunk (i.e. even with
HIVE-2503). If Ed wants to access Hive from Oozie via an API instead
of through the CLI, then I think his best bet is to run the JDBC
driver in embedded (thick-client) mode.
----

Hope this clarifies the current state of things regarding the Thrift server.

Thx


On Wed, May 2, 2012 at 6:12 AM, Edward Capriolo <[email protected]> wrote:
> https://issues.apache.org/jira/browse/HIVE-2503
>
> I believe what you are describing is fixed in trunk.
>
> On Tuesday, May 1, 2012, Alejandro Abdelnur <[email protected]> wrote:
>> Edward,
>>
>> I agree that hive thrift server would be the ideal approach. However
>> the thrift server is that is not multi-user/multi-job friendly:
>>
>>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201204.mbox/%3CCAJqeMKTDOmDZfNUUW8kSgkivZPkC%2BkH9H5D_RL2YhJGhh4rqNQ%40mail.gmail.com%3E
>>
>> Until Hive address this I think we are better off with the CLI approach.
>>
>> Thx
>>
>> On Mon, Apr 30, 2012 at 10:03 AM, Edward Capriolo <[email protected]>
> wrote:
>>> HaHa. I never rejoined the list after it moved from Yahoo.
>>>
>>> I would not describe hive-thrift as horrible but there is some
> unpleasantness.
>>>
>>> Near future:
>>> https://issues.apache.org/jira/browse/HIVE-2935
>>>
>>> In any case I am willing to accept the issues. I run multiple
>>> hive-thrift servers behind ha-proxy
>>>
>>>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/running_a_hive_thrift_cluster
>>>
>>> This cuts downs concurrency type problems. It's hive so not sure how
>>> much concurrency is needed there.
>>>
>>> Our group just decided to part ways with programming over the CLI. Too
>>> much stuff like this:
>>>
>>> hive -e -S "select x,y from $TABLE WHERE $STUFF" | awk whatever
>>> or:
>>> my list=`hadoop dfs -ls /bla`
>>>
>>> That was not unit testable and just really ugly. Even if it fails
>>> 1/1000 times we have try catch , and we have done stuff that can bring
>>> up the entire stack end to end in an IDE now.
>>>
>>> Layering on top of the CLI is a bad idea in the long run, its like
>>> expect scripting an ssh session. Not that it was a bad design chose
>>> for oozie at the time but it is certainly not the ideal way to handle
>>> it.
>>
>>
>>
>> --
>> Alejandro
>>



-- 
Alejandro

Re: a hive thrift alternative

Reply via email to