[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar updated HIVE-24230: --------------------------------- Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > Integrate HPL/SQL into HiveServer2 > ---------------------------------- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, hpl/sql > Reporter: Attila Magyar > Assignee: Attila Magyar > Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. > > To make it easier to implement, we keep things separated in the inside at > first, by introducing a hive session level JDBC parameter. > {code:java} > jdbc:hive2://localhost:10000/default;hplsqlMode=true {code} > > The hplsqlMode indicates that we are in procedural SQL mode where the user > can create and call stored procedures. HPLSQL allows you to write any kind of > procedural statement at the top level. This patch doesn't limit this but it > might be better to eventually restrict what statements are allowed outside of > stored procedures. > > Since HPLSQL and Hive are running in the same process there is no need to use > the JDBC driver between them. The patch adds an abstraction with 2 different > implementations, one for executing queries on JDBC (for keeping the existing > behaviour) and another one for directly calling Hive's compiler. In HPLSQL > mode the latter is used. > In the inside a new operation (HplSqlOperation) and operation type > (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it > uses the hplsql interpreter to execute arbitrary scripts. This operation > might spawns new SQLOpertions. > For example consider the following statement: > {code:java} > FOR i in 1..10 LOOP > SELECT * FROM table > END LOOP;{code} > We send this to beeline while we'er in hplsql mode. Hive will create a hplsql > interpreter and store it in the session state. A new HplSqlOperation is > created to run the script on the interpreter. > HPLSQL knows how to execute the for loop, but i'll call Hive to run the > select expression. The HplSqlOperation is notified when the select reads a > row and accumulates the rows into a RowSet (memory consumption need to be > considered here) which can be retrieved via thrift from the client side. > -- This message was sent by Atlassian Jira (v8.3.4#803005)