[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

Gopal V (JIRA) Mon, 13 Feb 2017 13:18:11 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864448#comment-15864448
 ]


Gopal V commented on HIVE-10511:
--------------------------------

[~gss2002]: The big issue with HS2 was primarily with MapredLocalTask (if 
you're going by the Cloudera docs), which effectively burns down the CPU on the 
HS2 box and both uploads/downloads data to HDFS to run joins.

Currently, I'm doing ~100 concurrent queries per 16Gb  HS2 with LLAP (so, 
approx ~250-500 sessions per box on Tableau). And some part of it needs to 
improve, particularly when moving >10k rows per query.

bq. how do you plan on stopping folks on using sparkSQL cli as it goes directly 
at metastore and fs 

(off-topic)

Look, we're not in the business of stopping users from doing what they want - 
we're not going to go down that way.

However, some admins and business owners are. When dealing with some groups of 
users, the fact that a SQL user can't just "copy all this data to my laptop and 
sell it somewhere" is an advantage.

Current solutions (where filesystem is the only permission level) involve 
maintaining different copies of data to keep it safe with raw file permissions. 
Imagine GPS pickup/dropoff, billing address, CC # and real-name in a db - the 
pricing analysis guys need the first three, the billing folks need the last 3 
etc. This is insanity when it comes to ETL scheduling and keeping all parts of 
the system in sync - so people who go down the "maintain different copies" path 
will be carrying a pager daily.

Not all of that data processing is SQL, at least not the geo-location or 
clustering, so in my view, Hive (as a system of record) needs to make sure 
Spark is not left out of the workflows.

> Replacing the implementation of Hive CLI using Beeline
> ------------------------------------------------------
>
>                 Key: HIVE-10511
>                 URL: https://issues.apache.org/jira/browse/HIVE-10511
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.10.0
>            Reporter: Xuefu Zhang
>            Assignee: Ferdinand Xu
>
> Hive CLI is a legacy tool which had two main use cases: 
> 1. a thick client for SQL on hadoop
> 2. a command line tool for HiveServer1.
> HiveServer1 is already deprecated and removed from Hive code base, so  use 
> case #2 is out of the question. For #1, Beeline provides or is supposed to 
> provides equal functionality, yet is implemented differently from Hive CLI.
> As it has been a while that Hive community has been recommending Beeline + 
> HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
> use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
> Beeline plus embedded HS2 so that Hive community only needs to maintain a 
> single code path. In this way, Hive CLI is just an alias to Beeline at either 
> shell script level or at high code level. The goal is that  no changes or 
> minimum changes are expected from existing user scrip using Hive CLI.
> This is an Umbrella JIRA covering all tasks related to this initiative. Over 
> the last year or two, Beeline has been improved significantly to match what 
> Hive CLI offers. Still, there may still be some gaps or deficiency to be 
> discovered and fixed. In the meantime, we also want to make sure the enough 
> tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

Reply via email to