nrg4878 opened a new pull request, #4720:
URL: https://github.com/apache/hive/pull/4720
… JDBC (Naveen Gangam)
### What changes were proposed in this pull request?
In HIVE-23291 and HIVE-23302, we added support adding a JDBCStorageHandler
based table for remote Hive table. This enables defining a table in hive on one
cluster that points to a table in another cluster. Users can perform queries
(selects, joins, ctas etc) between this JDBC-storage-handler based table and
other local tables.
However, it becomes cumbersome to map each table individually between these
clusters. With HIVE-24396, we added support for connectors that allow us to map
databases instead of tables. The table metadata is mapped at runtime over a
live connector. This allows hive to pull the latest table metadata instead of
having a static metadata in HMS.
This PR does not support Kerberos at this point. But that could be a
separate enhancement. (the mechanics of kerberos auth may not fit into the
current way it works for password-based data sources)
**create connector hive_jdbc
type 'hivejdbc'
url 'jdbc:hive2://nightly.hive.apache.org:10000'
with DCPROPERTIES (
"hive.sql.dbcp.username"="hive",
"hive.sql.dbcp.password"="hive"
);**
**create remote database nightly_default using hive_jdbc with
DBPROPERTIES("connector.remoteDbName"="default");**
Sample working demo:
0: jdbc:hive2://localhost:10000> **_create connector hive_jdbc type
'hivejdbc' url
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with
DCPROPERTIES ("hive.sql.dbcp.username"="hive",
"hive.sql.dbcp.password"="hive");_**
INFO : Compiling
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63):
create connector hive_jdbc type 'hivejdbc' url
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with
DCPROPERTIES ("hive.sql.dbcp.username"="hive", "hive.sql.dbcp.password"="hive")
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63);
Time taken: 1.698 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63):
create connector hive_jdbc type 'hivejdbc' url
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with
DCPROPERTIES ("hive.sql.dbcp.username"="hive", "hive.sql.dbcp.password"="hive")
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63);
Time taken: 0.25 seconds
INFO : OK
No rows affected (2.274 seconds)
0: jdbc:hive2://localhost:10000> **_create remote database nightly_default
using hive_jdbc with DBPROPERTIES("connector.remoteDbName"="default");_**
INFO : Compiling
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6):
create remote database nightly_default using hive_jdbc with
DBPROPERTIES("connector.remoteDbName"="default")
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6);
Time taken: 0.014 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6):
create remote database nightly_default using hive_jdbc with
DBPROPERTIES("connector.remoteDbName"="default")
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6);
Time taken: 0.06 seconds
INFO : OK
No rows affected (0.092 seconds)
0: jdbc:hive2://localhost:10000> **_use nightly_default;_**
INFO : Compiling
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204):
use nightly_default
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204);
Time taken: 0.003 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204):
use nightly_default
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204);
Time taken: 0.158 seconds
INFO : OK
No rows affected (0.176 seconds)
0: jdbc:hive2://localhost:10000> **_show tables;_**
INFO : Compiling
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e):
show tables
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name,
type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e);
Time taken: 0.01 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e):
show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e);
Time taken: 3.8 seconds
INFO : OK
+------------+
| tab_name |
+------------+
| sample_07 |
| sample_08 |
| web_logs |
+------------+
3 rows selected (3.84 seconds)
0: jdbc:hive2://localhost:10000> **_describe formatted web_logs;_**
INFO : Compiling
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23):
describe formatted web_logs
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name,
type:string, comment:from deserializer), FieldSchema(name:data_type,
type:string, comment:from deserializer), FieldSchema(name:comment, type:string,
comment:from deserializer)], properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23);
Time taken: 0.661 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23):
describe formatted web_logs
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23);
Time taken: 2.704 seconds
INFO : OK
+-------------------------------+------------------------------------------------+----------------------------------------------------+
| col_name | data_type
| comment |
+-------------------------------+------------------------------------------------+----------------------------------------------------+
| _version_ | bigint
| from deserializer |
| app | string
| from deserializer |
| bytes | int
| from deserializer |
| city | string
| from deserializer |
| client_ip | string
| from deserializer |
| code | smallint
| from deserializer |
| country_code | string
| from deserializer |
| country_code3 | string
| from deserializer |
| country_name | string
| from deserializer |
| device_family | string
| from deserializer |
| extension | string
| from deserializer |
| latitude | float
| from deserializer |
| longitude | float
| from deserializer |
| method | string
| from deserializer |
| os_family | string
| from deserializer |
| os_major | string
| from deserializer |
| protocol | string
| from deserializer |
| record | string
| from deserializer |
| referer | string
| from deserializer |
| region_code | string
| from deserializer |
| request | string
| from deserializer |
| subapp | string
| from deserializer |
| time | string
| from deserializer |
| url | string
| from deserializer |
| user_agent | string
| from deserializer |
| user_agent_family | string
| from deserializer |
| user_agent_major | string
| from deserializer |
| id | string
| from deserializer |
| date | string
| from deserializer |
| | NULL
| NULL |
| # Detailed Table Information | NULL
| NULL |
| Database: | nightly_default
| NULL |
| OwnerType: | USER
| NULL |
| Owner: | null
| NULL |
| CreateTime: | UNKNOWN
| NULL |
| LastAccessTime: | UNKNOWN
| NULL |
| Retention: | 0
| NULL |
| Location: | file:/tmp/warehouse/external/web_logs
| NULL |
| Table Type: | EXTERNAL_TABLE
| NULL |
| Table Parameters: | NULL
| NULL |
| | EXTERNAL
| TRUE |
| | hive.sql.database.type
| HIVE |
| | hive.sql.dbcp.password
| hive |
| | hive.sql.dbcp.username
| hive |
| | hive.sql.jdbc.driver
| org.apache.hive.jdbc.HiveDriver |
| | hive.sql.jdbc.url
| jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000 |
| | hive.sql.schema
| default |
| | hive.sql.table
| web_logs |
| | storage_handler
| org.apache.hive.storage.jdbc.JdbcStorageHandler |
| | NULL
| NULL |
| # Storage Information | NULL
| NULL |
| SerDe Library: | org.apache.hive.storage.jdbc.JdbcSerDe
| NULL |
| InputFormat: |
org.apache.hive.storage.jdbc.JdbcInputFormat | NULL
|
| OutputFormat: |
org.apache.hive.storage.jdbc.JdbcOutputFormat | NULL
|
| Compressed: | No
| NULL |
| Num Buckets: | 0
| NULL |
| Bucket Columns: | []
| NULL |
| Sort Columns: | []
| NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format
| 1 |
+-------------------------------+------------------------------------------------+----------------------------------------------------+
60 rows selected (3.392 seconds)
0: jdbc:hive2://localhost:10000> **_describe formatted sample_07;_**
INFO : Compiling
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6):
describe formatted sample_07
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name,
type:string, comment:from deserializer), FieldSchema(name:data_type,
type:string, comment:from deserializer), FieldSchema(name:comment, type:string,
comment:from deserializer)], properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6);
Time taken: 0.641 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6):
describe formatted sample_07
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6);
Time taken: 2.532 seconds
INFO : OK
+-------------------------------+------------------------------------------------+----------------------------------------------------+
| col_name | data_type
| comment |
+-------------------------------+------------------------------------------------+----------------------------------------------------+
| code | string
| from deserializer |
| description | string
| from deserializer |
| total_emp | int
| from deserializer |
| salary | int
| from deserializer |
| | NULL
| NULL |
| # Detailed Table Information | NULL
| NULL |
| Database: | nightly_default
| NULL |
| OwnerType: | USER
| NULL |
| Owner: | null
| NULL |
| CreateTime: | UNKNOWN
| NULL |
| LastAccessTime: | UNKNOWN
| NULL |
| Retention: | 0
| NULL |
| Location: | file:/tmp/warehouse/external/sample_07
| NULL |
| Table Type: | EXTERNAL_TABLE
| NULL |
| Table Parameters: | NULL
| NULL |
| | EXTERNAL
| TRUE |
| | hive.sql.database.type
| HIVE |
| | hive.sql.dbcp.password
| hive |
| | hive.sql.dbcp.username
| hive |
| | hive.sql.jdbc.driver
| org.apache.hive.jdbc.HiveDriver |
| | hive.sql.jdbc.url
| jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000 |
| | hive.sql.schema
| default |
| | hive.sql.table
| sample_07 |
| | storage_handler
| org.apache.hive.storage.jdbc.JdbcStorageHandler |
| | NULL
| NULL |
| # Storage Information | NULL
| NULL |
| SerDe Library: | org.apache.hive.storage.jdbc.JdbcSerDe
| NULL |
| InputFormat: |
org.apache.hive.storage.jdbc.JdbcInputFormat | NULL
|
| OutputFormat: |
org.apache.hive.storage.jdbc.JdbcOutputFormat | NULL
|
| Compressed: | No
| NULL |
| Num Buckets: | 0
| NULL |
| Bucket Columns: | []
| NULL |
| Sort Columns: | []
| NULL |
| Storage Desc Params: | NULL
| NULL |
| | serialization.format
| 1 |
+-------------------------------+------------------------------------------------+----------------------------------------------------+
35 rows selected (3.198 seconds)
0: jdbc:hive2://localhost:10000> **_set hive.execution.engine=mr;_**
Error: Error while processing statement: hive execution engine mr is not
supported. (state=42000,code=1)
0: jdbc:hive2://localhost:10000> **_select count(*) from sample_07 where
salary > 50000;_**
INFO : Compiling
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1):
select count(*) from sample_07 where salary > 50000
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:bigint, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1);
Time taken: 5.674 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1):
select count(*) from sample_07 where salary > 50000
WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. tez, impala)
or using Hive 1.X releases.
INFO : Query ID =
ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks determined at compile time: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:1
INFO : Submitting tokens for job: job_local1498163518_0001
INFO : Executing with tokens: []
INFO : The url to track the job: http://localhost:8080/
INFO : Job running in-process (local Hadoop)
INFO : 2023-09-18 15:27:55,391 Stage-1 map = 0%, reduce = 0%
INFO : 2023-09-18 15:27:59,418 Stage-1 map = 100%, reduce = 100%
INFO : Ended Job = job_local1498163518_0001
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO : Total MapReduce CPU Time Spent: 0 msec
INFO : Completed executing
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1);
Time taken: 7.418 seconds
INFO : OK
+------+
| _c0 |
+------+
| 292 |
+------+
1 row selected (13.157 seconds)
0: jdbc:hive2://localhost:10000>
### Why are the changes needed?
This connector enables users to map databases between 2 hive clusters and
federate queries over JDBC between tables in the 2 clusters.
### Does this PR introduce _any_ user-facing change?
NO
### Is the change a dependency upgrade?
NO
### How was this patch tested?
Manually
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]