[jira] [Commented] (HIVE-3472) Build An Analytical SQL Engine for MapReduce

Jason Dai (JIRA) Mon, 24 Sep 2012 01:05:14 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461651#comment-13461651
 ]


Jason Dai commented on HIVE-3472:
---------------------------------

bq. It's often better to start with a design doc or discussion on the dev lists 
before significant amounts of code are contributed.

Yes, we agree with that, and this is what we plan to do with this JIRA - use it 
as a starting point to discuss the best approach to get SQL support for the 
Hadoop ecosystem (and Hive in particular). The code in panthera on github is 
not supposed to be a complete implementation to be reviewed; instead, it is 
supposed to be an early prototype used as a proof point. After the preferred 
approach and design are greed upon, we need to create several sub-tasks, each 
of which will be a small, manageable unit for review (just as what we did with 
HBase-6805).

bq. The first question that comes to mind is why do you propose a separate 
parser for this?

To provide full SQL support in the parser, there are basically three possible 
approaches:
# Extend the existing Hive parser to support full SQL constructs
# Reuse an existing SQL compliant parser and make it co-exist with the existing 
Hive parser
# Reuse an existing SQL compliant parser and extend it to support Hive 
extensions

The problem with the 1st approach is that, SQL is a very complex language, much 
more complex than HiveSQL (as a data point, the grammar file of the Hive parser 
is about 61KB with 2487 lines, while the grammar files of the open source SQL 
parser [https://github.com/porcelli/plsql-parser] are about 524KB with 8583 
lines); in addition, some of the existing SQL grammars in the Hive parser need 
to be significantly changed to support more complex SQL constructs. Therefore, 
it would take significant efforts to add full SQL features to the Hive parser.

The 2nd and 3rd approaches both seem possible, and require significantly fewer 
efforts than the first approach.

bq. Forcing users to think about whether they are in HQL or SQL-92 will cause 
confusion and maintainability problems for them as well (e.g. a .sql file 
written by user1 for HQL will be run in SQL-92 mode by user2, producing either 
errors or wrong results.

I think there are several options to address this issue. In the current 
example, the user actually needs to specify the mode (hiveql or sql) under 
which the following queries will run in the .sql file, so that the mode each 
query will run under is actually predetermined by the .sql file. Another option 
is that, instead of allowing two parsers to co-exist with each other, we can 
build two several jars - effectively two warehouse products (one for HiveSQL 
only and one for SQL only).

Of course another option is to follow the 3rd approach mentioned above: extend 
the SQL parser to support HiveQL extensions.

                
> Build An Analytical SQL Engine for MapReduce
> --------------------------------------------
>
>                 Key: HIVE-3472
>                 URL: https://issues.apache.org/jira/browse/HIVE-3472
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Shengsheng Huang
>         Attachments: SQL-design.pdf
>
>
> While there are continuous efforts in extending Hive’s SQL support (e.g., see 
> some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL 
> constructs are still not supported in HiveQL, such as selecting from multiple 
> tables, subquery in WHERE clauses, etc.  
> We propose to build a SQL-92 full compatible engine (for MapReduce based 
> analytical query processing) as an extension to Hive. 
> The SQL frontend will co-exist with the HiveQL frontend; consequently, one 
> can  mix SQL and HiveQL statements in their queries (switching between HiveQL 
> mode and SQL-92 mode using a “hive.ql.mode” parameter before each query 
> statement). This way useful Hive extensions are still accessible to users. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3472) Build An Analytical SQL Engine for MapReduce

Reply via email to