nrg4878 opened a new pull request, #4720:
URL: https://github.com/apache/hive/pull/4720

   … JDBC (Naveen Gangam)
   
   ### What changes were proposed in this pull request?
   In HIVE-23291 and HIVE-23302, we added support adding a JDBCStorageHandler 
based table for remote Hive table. This enables defining a table in hive on one 
cluster that points to a table in another cluster. Users can perform queries 
(selects, joins, ctas etc) between this JDBC-storage-handler based table and 
other local tables.
   
   However, it becomes cumbersome to map each table individually between these 
clusters. With HIVE-24396, we added support for connectors that allow us to map 
databases instead of tables. The table metadata is mapped at runtime over a 
live connector. This allows hive to pull the latest table metadata instead of 
having a static metadata in HMS.
   
   This PR does not support Kerberos at this point. But that could be a 
separate enhancement. (the mechanics of kerberos auth may not fit into the 
current way it works for password-based data sources)
   
   **create connector hive_jdbc 
       type 'hivejdbc' 
       url 'jdbc:hive2://nightly.hive.apache.org:10000' 
       with DCPROPERTIES (
           "hive.sql.dbcp.username"="hive", 
           "hive.sql.dbcp.password"="hive"
       );**
   
   **create remote database nightly_default using hive_jdbc with 
DBPROPERTIES("connector.remoteDbName"="default");**
   
   Sample working demo:
   0: jdbc:hive2://localhost:10000> **_create connector hive_jdbc type 
'hivejdbc' url 
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with 
DCPROPERTIES ("hive.sql.dbcp.username"="hive", 
"hive.sql.dbcp.password"="hive");_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63): 
create connector hive_jdbc type 'hivejdbc' url 
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with 
DCPROPERTIES ("hive.sql.dbcp.username"="hive", "hive.sql.dbcp.password"="hive")
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63); 
Time taken: 1.698 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63): 
create connector hive_jdbc type 'hivejdbc' url 
'jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000' with 
DCPROPERTIES ("hive.sql.dbcp.username"="hive", "hive.sql.dbcp.password"="hive")
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152209_83f300d9-58e1-42fd-a7b4-b5c82e5bbc63); 
Time taken: 0.25 seconds
   INFO  : OK
   No rows affected (2.274 seconds)
   
   0: jdbc:hive2://localhost:10000> **_create remote database nightly_default 
using hive_jdbc with DBPROPERTIES("connector.remoteDbName"="default");_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6): 
create remote database nightly_default using hive_jdbc with 
DBPROPERTIES("connector.remoteDbName"="default")
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6); 
Time taken: 0.014 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6): 
create remote database nightly_default using hive_jdbc with 
DBPROPERTIES("connector.remoteDbName"="default")
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152225_58e8bfd4-9cf0-4505-9aa3-012be57e23d6); 
Time taken: 0.06 seconds
   INFO  : OK
   No rows affected (0.092 seconds)
   
   0: jdbc:hive2://localhost:10000> **_use nightly_default;_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204): 
use nightly_default
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204); 
Time taken: 0.003 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204): 
use nightly_default
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152528_7af75a5a-54a3-48f0-90cb-209cf236c204); 
Time taken: 0.158 seconds
   INFO  : OK
   No rows affected (0.176 seconds)
   
   0: jdbc:hive2://localhost:10000> **_show tables;_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e): 
show tables
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
type:string, comment:from deserializer)], properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e); 
Time taken: 0.01 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e): 
show tables
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152530_5b5c754b-db5f-47ae-a880-b181757cfa4e); 
Time taken: 3.8 seconds
   INFO  : OK
   +------------+
   |  tab_name  |
   +------------+
   | sample_07  |
   | sample_08  |
   | web_logs   |
   +------------+
   3 rows selected (3.84 seconds)
   
   0: jdbc:hive2://localhost:10000> **_describe formatted web_logs;_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23): 
describe formatted web_logs
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
type:string, comment:from deserializer), FieldSchema(name:data_type, 
type:string, comment:from deserializer), FieldSchema(name:comment, type:string, 
comment:from deserializer)], properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23); 
Time taken: 0.661 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23): 
describe formatted web_logs
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152542_170176a0-92ff-408b-9133-eab9a1953c23); 
Time taken: 2.704 seconds
   INFO  : OK
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   |           col_name            |                   data_type                
    |                      comment                       |
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   | _version_                     | bigint                                     
    | from deserializer                                  |
   | app                           | string                                     
    | from deserializer                                  |
   | bytes                         | int                                        
    | from deserializer                                  |
   | city                          | string                                     
    | from deserializer                                  |
   | client_ip                     | string                                     
    | from deserializer                                  |
   | code                          | smallint                                   
    | from deserializer                                  |
   | country_code                  | string                                     
    | from deserializer                                  |
   | country_code3                 | string                                     
    | from deserializer                                  |
   | country_name                  | string                                     
    | from deserializer                                  |
   | device_family                 | string                                     
    | from deserializer                                  |
   | extension                     | string                                     
    | from deserializer                                  |
   | latitude                      | float                                      
    | from deserializer                                  |
   | longitude                     | float                                      
    | from deserializer                                  |
   | method                        | string                                     
    | from deserializer                                  |
   | os_family                     | string                                     
    | from deserializer                                  |
   | os_major                      | string                                     
    | from deserializer                                  |
   | protocol                      | string                                     
    | from deserializer                                  |
   | record                        | string                                     
    | from deserializer                                  |
   | referer                       | string                                     
    | from deserializer                                  |
   | region_code                   | string                                     
    | from deserializer                                  |
   | request                       | string                                     
    | from deserializer                                  |
   | subapp                        | string                                     
    | from deserializer                                  |
   | time                          | string                                     
    | from deserializer                                  |
   | url                           | string                                     
    | from deserializer                                  |
   | user_agent                    | string                                     
    | from deserializer                                  |
   | user_agent_family             | string                                     
    | from deserializer                                  |
   | user_agent_major              | string                                     
    | from deserializer                                  |
   | id                            | string                                     
    | from deserializer                                  |
   | date                          | string                                     
    | from deserializer                                  |
   |                               | NULL                                       
    | NULL                                               |
   | # Detailed Table Information  | NULL                                       
    | NULL                                               |
   | Database:                     | nightly_default                            
    | NULL                                               |
   | OwnerType:                    | USER                                       
    | NULL                                               |
   | Owner:                        | null                                       
    | NULL                                               |
   | CreateTime:                   | UNKNOWN                                    
    | NULL                                               |
   | LastAccessTime:               | UNKNOWN                                    
    | NULL                                               |
   | Retention:                    | 0                                          
    | NULL                                               |
   | Location:                     | file:/tmp/warehouse/external/web_logs      
    | NULL                                               |
   | Table Type:                   | EXTERNAL_TABLE                             
    | NULL                                               |
   | Table Parameters:             | NULL                                       
    | NULL                                               |
   |                               | EXTERNAL                                   
    | TRUE                                               |
   |                               | hive.sql.database.type                     
    | HIVE                                               |
   |                               | hive.sql.dbcp.password                     
    | hive                                               |
   |                               | hive.sql.dbcp.username                     
    | hive                                               |
   |                               | hive.sql.jdbc.driver                       
    | org.apache.hive.jdbc.HiveDriver                    |
   |                               | hive.sql.jdbc.url                          
    | jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000 |
   |                               | hive.sql.schema                            
    | default                                            |
   |                               | hive.sql.table                             
    | web_logs                                           |
   |                               | storage_handler                            
    | org.apache.hive.storage.jdbc.JdbcStorageHandler    |
   |                               | NULL                                       
    | NULL                                               |
   | # Storage Information         | NULL                                       
    | NULL                                               |
   | SerDe Library:                | org.apache.hive.storage.jdbc.JdbcSerDe     
    | NULL                                               |
   | InputFormat:                  | 
org.apache.hive.storage.jdbc.JdbcInputFormat   | NULL                           
                    |
   | OutputFormat:                 | 
org.apache.hive.storage.jdbc.JdbcOutputFormat  | NULL                           
                    |
   | Compressed:                   | No                                         
    | NULL                                               |
   | Num Buckets:                  | 0                                          
    | NULL                                               |
   | Bucket Columns:               | []                                         
    | NULL                                               |
   | Sort Columns:                 | []                                         
    | NULL                                               |
   | Storage Desc Params:          | NULL                                       
    | NULL                                               |
   |                               | serialization.format                       
    | 1                                                  |
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   60 rows selected (3.392 seconds)
   
   0: jdbc:hive2://localhost:10000> **_describe formatted sample_07;_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6): 
describe formatted sample_07
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, 
type:string, comment:from deserializer), FieldSchema(name:data_type, 
type:string, comment:from deserializer), FieldSchema(name:comment, type:string, 
comment:from deserializer)], properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6); 
Time taken: 0.641 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6): 
describe formatted sample_07
   INFO  : Starting task [Stage-0:DDL] in serial mode
   INFO  : Completed executing 
command(queryId=ngangam_20230918152648_6ea35ff8-324c-427a-96a5-9007835202f6); 
Time taken: 2.532 seconds
   INFO  : OK
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   |           col_name            |                   data_type                
    |                      comment                       |
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   | code                          | string                                     
    | from deserializer                                  |
   | description                   | string                                     
    | from deserializer                                  |
   | total_emp                     | int                                        
    | from deserializer                                  |
   | salary                        | int                                        
    | from deserializer                                  |
   |                               | NULL                                       
    | NULL                                               |
   | # Detailed Table Information  | NULL                                       
    | NULL                                               |
   | Database:                     | nightly_default                            
    | NULL                                               |
   | OwnerType:                    | USER                                       
    | NULL                                               |
   | Owner:                        | null                                       
    | NULL                                               |
   | CreateTime:                   | UNKNOWN                                    
    | NULL                                               |
   | LastAccessTime:               | UNKNOWN                                    
    | NULL                                               |
   | Retention:                    | 0                                          
    | NULL                                               |
   | Location:                     | file:/tmp/warehouse/external/sample_07     
    | NULL                                               |
   | Table Type:                   | EXTERNAL_TABLE                             
    | NULL                                               |
   | Table Parameters:             | NULL                                       
    | NULL                                               |
   |                               | EXTERNAL                                   
    | TRUE                                               |
   |                               | hive.sql.database.type                     
    | HIVE                                               |
   |                               | hive.sql.dbcp.password                     
    | hive                                               |
   |                               | hive.sql.dbcp.username                     
    | hive                                               |
   |                               | hive.sql.jdbc.driver                       
    | org.apache.hive.jdbc.HiveDriver                    |
   |                               | hive.sql.jdbc.url                          
    | jdbc:hive2://nightly7x-us-bs-4.nightly7x-us-bs.root.hwx.site:10000 |
   |                               | hive.sql.schema                            
    | default                                            |
   |                               | hive.sql.table                             
    | sample_07                                          |
   |                               | storage_handler                            
    | org.apache.hive.storage.jdbc.JdbcStorageHandler    |
   |                               | NULL                                       
    | NULL                                               |
   | # Storage Information         | NULL                                       
    | NULL                                               |
   | SerDe Library:                | org.apache.hive.storage.jdbc.JdbcSerDe     
    | NULL                                               |
   | InputFormat:                  | 
org.apache.hive.storage.jdbc.JdbcInputFormat   | NULL                           
                    |
   | OutputFormat:                 | 
org.apache.hive.storage.jdbc.JdbcOutputFormat  | NULL                           
                    |
   | Compressed:                   | No                                         
    | NULL                                               |
   | Num Buckets:                  | 0                                          
    | NULL                                               |
   | Bucket Columns:               | []                                         
    | NULL                                               |
   | Sort Columns:                 | []                                         
    | NULL                                               |
   | Storage Desc Params:          | NULL                                       
    | NULL                                               |
   |                               | serialization.format                       
    | 1                                                  |
   
+-------------------------------+------------------------------------------------+----------------------------------------------------+
   35 rows selected (3.198 seconds)
   
   0: jdbc:hive2://localhost:10000> **_set hive.execution.engine=mr;_**
   Error: Error while processing statement: hive execution engine mr is not 
supported. (state=42000,code=1)
   
   0: jdbc:hive2://localhost:10000> **_select count(*) from sample_07 where 
salary > 50000;_**
   INFO  : Compiling 
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1): 
select count(*) from sample_07 where salary > 50000
   INFO  : Semantic Analysis Completed (retrial = false)
   INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:bigint, comment:null)], properties:null)
   INFO  : Completed compiling 
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1); 
Time taken: 5.674 seconds
   INFO  : Concurrency mode is disabled, not creating a lock manager
   INFO  : Executing 
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1): 
select count(*) from sample_07 where salary > 50000
   WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. tez, impala) 
or using Hive 1.X releases.
   INFO  : Query ID = 
ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1
   INFO  : Total jobs = 1
   INFO  : Launching Job 1 out of 1
   INFO  : Starting task [Stage-1:MAPRED] in serial mode
   INFO  : Number of reduce tasks determined at compile time: 1
   INFO  : In order to change the average load for a reducer (in bytes):
   INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
   INFO  : In order to limit the maximum number of reducers:
   INFO  :   set hive.exec.reducers.max=<number>
   INFO  : In order to set a constant number of reducers:
   INFO  :   set mapreduce.job.reduces=<number>
   INFO  : number of splits:1
   INFO  : Submitting tokens for job: job_local1498163518_0001
   INFO  : Executing with tokens: []
   INFO  : The url to track the job: http://localhost:8080/
   INFO  : Job running in-process (local Hadoop)
   INFO  : 2023-09-18 15:27:55,391 Stage-1 map = 0%,  reduce = 0%
   INFO  : 2023-09-18 15:27:59,418 Stage-1 map = 100%,  reduce = 100%
   INFO  : Ended Job = job_local1498163518_0001
   INFO  : MapReduce Jobs Launched: 
   INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
   INFO  : Total MapReduce CPU Time Spent: 0 msec
   INFO  : Completed executing 
command(queryId=ngangam_20230918152746_0cce0ed6-58ab-477a-805b-a97dca077ff1); 
Time taken: 7.418 seconds
   INFO  : OK
   +------+
   | _c0  |
   +------+
   | 292  |
   +------+
   1 row selected (13.157 seconds)
   
   0: jdbc:hive2://localhost:10000> 
   
   ### Why are the changes needed?
   This connector enables users to map databases between 2 hive clusters and 
federate queries over JDBC between tables in the 2 clusters.
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### Is the change a dependency upgrade?
   NO
   
   ### How was this patch tested?
   Manually
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to