[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842441#comment-17842441
 ] 

Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:00 PM:
--------------------------------------------------------------

*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
      hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map column names between Impala JDBC table and the external 
table.

*Add/Drop/Change Columns*

User can add/drop/change columns for existing external JDBC table with alter 
table statement, same syntax as other Impala table:
    ALTER TABLE jdbc_test ADD COLUMN IF NOT EXISTS date_col DATE;  
    ALTER TABLE jdbc_test DROP COLUMN int_col;
    ALTER TABLE jdbc_test CHANGE COLUMN date_col timestamp_col TIMESTAMP;

*Edit table properties*

User can also alter table properties of existing external JDBC table using 
alter table statement, just like other Impala table:    
{code:java}
    ALTER TABLE alltypes_jdbc
    SET TBLPROPERTIES ("dbcp.username"="impala", "dbcp.password"="cloudera");
{code}



*Query options for external JDBC table*

A new query-option 'clean_dbcp_ds_cache' is added for saving DBCP SQL 
DataSource objects in cache for a longer time so that DBCP connection pools 
could be reused across multiple queries. By default, it’s set as true. When 
it's set as false, a DBCP SQL DataSource object will not be closed when its 
reference count equals 0 and will be kept in cache until the SQL DataSource 
object is idle for more than 5 minutes.


was (Author: wzhou):
*Impala SQL syntax to create external JDBC table*

When creating an external JDBC table, the user needs to specify the minimum 
information: database type, jdbc url, driver class, driver file location, user 
name and password for querying database, table name. Here are two samples to 
create JDBC tables for tables on Postgres server and another Impala cluster 
respectively.

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="hiveuser",
  "dbcp.password"="password",
  "table"="alltypes");

*CREATE EXTERNAL TABLE* alltypes_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
*STORED BY JDBC*
*TBLPROPERTIES* (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="hiveuser",
  
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="hiveuser",
  "table"="alltypes");

*Supported data types*
The column data type for an external JDBC table can be:
Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
Decimal with scale and precision
String type: string
Date
Timestamp

Note that following data types are not supported: char, varchar, and binary. 
Complex data type: struct, map, array and nested type are not supported.

*Table Properties specified for external JDBC table*
In the create external JDBC table statement, user is required to specify the 
following table properties:
database.type: IMPALA, MYSQL, POSTGRES
jdbc.url: jdbc connection string, including the database type, IP address, port 
number, and database name. For example, 
“jdbc:impala://10.96.132.138:21050/functional”.
jdbc.driver: class name of jdbc driver
driver.url: driver URL for downloading the Jar file package that is used to 
access the external database.
table: name of the external table to be mapped in Impala.

Besides the above required properties, user can also specify optional 
parameters to use different authentication methods, or to allow case sensitive 
columns names in remote tables, or to specify additional database properties, 
etc:
jdbc.auth: authentication mechanisms of JDBC driver. It's used for 
Impala-Impala federation.
dbcp.username: jdbc user name
dbcp.password: jdbc password in clear text, this parameter is strongly 
discouraged in production environments. The recommended way is to store it in a 
keystore. See section  “securing password” for details.
dbcp.password.key:  key of the keystore
dbcp.password.keystore: keystore URI.
jdbc.properties: additional properties applied to database engine, like Impala 
Query options. Properties are specified as comma-delimited key=value string. 
jdbc.fetch.size: number of rows to fetch in a batch
column.mappting: Mapping of column names between external table and Impala JDBC 
table. See section "Support case-sensitive table/column names" for details. 

*Securing password*
To mitigate the password leak, the value of “dbcp.password” table property is 
masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE 
FORMATTED | EXTENDED table-name”.
In production deployment, it is strongly discouraged to save the jdbc password 
in clear text in table property "dbcp.password". Instead, user can store 
password in a Java keystore file on HDFS by using the command like below to 
create a keystore file:
      hadoop credential create host1.password -provider 
jceks://hdfs/user/test.jceks -v passwd1
Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of 
“dbcp.password” in the create table statement.

*Support case-sensitive table/column names*

Column names of remote tables may be different from the JDBC table schema. For 
example, Postgres allows case-sensitive column names, but Impala always saves 
column names in lowercase. In this case, the user can set the “column.mapping” 
table property to map column names between Impala JDBC table and the external 
table.

*Add/Drop/Change Columns*

User can add/drop/change columns for existing external JDBC table with alter 
table statement, same syntax as other Impala table:
    ALTER TABLE jdbc_test ADD COLUMN IF NOT EXISTS date_col DATE;  
    ALTER TABLE jdbc_test DROP COLUMN int_col;
    ALTER TABLE jdbc_test CHANGE COLUMN date_col timestamp_col TIMESTAMP;

*Edit table properties*

User can also alter table properties of existing external JDBC table using 
alter table statement, just like other Impala table:
    
||Heading 1||Heading 2||
|ALTER TABLE alltypes_jdbc
    SET TBLPROPERTIES ("dbcp.username"="impala", 
"dbcp.password"="cloudera");|Col A2|


*Query options for external JDBC table*

A new query-option 'clean_dbcp_ds_cache' is added for saving DBCP SQL 
DataSource objects in cache for a longer time so that DBCP connection pools 
could be reused across multiple queries. By default, it’s set as true. When 
it's set as false, a DBCP SQL DataSource object will not be closed when its 
reference count equals 0 and will be kept in cache until the SQL DataSource 
object is idle for more than 5 minutes.

> Update Impala document to cover external jdbc table
> ---------------------------------------------------
>
>                 Key: IMPALA-12754
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12754
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Docs
>            Reporter: Wenzhe Zhou
>            Assignee: gaurav singh
>            Priority: Major
>
> We need to document  the SQL syntax to create jdbc table and alter jdbc 
> table, including the table properties to be set for jdbc and DBCP (Database 
> connection pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to