[
https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461651#comment-13461651
]
Jason Dai commented on HIVE-3472:
---------------------------------
bq. It's often better to start with a design doc or discussion on the dev lists
before significant amounts of code are contributed.
Yes, we agree with that, and this is what we plan to do with this JIRA - use it
as a starting point to discuss the best approach to get SQL support for the
Hadoop ecosystem (and Hive in particular). The code in panthera on github is
not supposed to be a complete implementation to be reviewed; instead, it is
supposed to be an early prototype used as a proof point. After the preferred
approach and design are greed upon, we need to create several sub-tasks, each
of which will be a small, manageable unit for review (just as what we did with
HBase-6805).
bq. The first question that comes to mind is why do you propose a separate
parser for this?
To provide full SQL support in the parser, there are basically three possible
approaches:
# Extend the existing Hive parser to support full SQL constructs
# Reuse an existing SQL compliant parser and make it co-exist with the existing
Hive parser
# Reuse an existing SQL compliant parser and extend it to support Hive
extensions
The problem with the 1st approach is that, SQL is a very complex language, much
more complex than HiveSQL (as a data point, the grammar file of the Hive parser
is about 61KB with 2487 lines, while the grammar files of the open source SQL
parser [https://github.com/porcelli/plsql-parser] are about 524KB with 8583
lines); in addition, some of the existing SQL grammars in the Hive parser need
to be significantly changed to support more complex SQL constructs. Therefore,
it would take significant efforts to add full SQL features to the Hive parser.
The 2nd and 3rd approaches both seem possible, and require significantly fewer
efforts than the first approach.
bq. Forcing users to think about whether they are in HQL or SQL-92 will cause
confusion and maintainability problems for them as well (e.g. a .sql file
written by user1 for HQL will be run in SQL-92 mode by user2, producing either
errors or wrong results.
I think there are several options to address this issue. In the current
example, the user actually needs to specify the mode (hiveql or sql) under
which the following queries will run in the .sql file, so that the mode each
query will run under is actually predetermined by the .sql file. Another option
is that, instead of allowing two parsers to co-exist with each other, we can
build two several jars - effectively two warehouse products (one for HiveSQL
only and one for SQL only).
Of course another option is to follow the 3rd approach mentioned above: extend
the SQL parser to support HiveQL extensions.
> Build An Analytical SQL Engine for MapReduce
> --------------------------------------------
>
> Key: HIVE-3472
> URL: https://issues.apache.org/jira/browse/HIVE-3472
> Project: Hive
> Issue Type: New Feature
> Affects Versions: 0.10.0
> Reporter: Shengsheng Huang
> Attachments: SQL-design.pdf
>
>
> While there are continuous efforts in extending Hive’s SQL support (e.g., see
> some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL
> constructs are still not supported in HiveQL, such as selecting from multiple
> tables, subquery in WHERE clauses, etc.
> We propose to build a SQL-92 full compatible engine (for MapReduce based
> analytical query processing) as an extension to Hive.
> The SQL frontend will co-exist with the HiveQL frontend; consequently, one
> can mix SQL and HiveQL statements in their queries (switching between HiveQL
> mode and SQL-92 mode using a “hive.ql.mode” parameter before each query
> statement). This way useful Hive extensions are still accessible to users.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira