[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842441#comment-17842441 ]
Wenzhe Zhou edited comment on IMPALA-12754 at 5/1/24 6:11 PM: -------------------------------------------------------------- *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: * jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. * dbcp.username: jdbc user name * dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. * dbcp.password.key: key of the keystore * dbcp.password.keystore: keystore URI. * jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. * jdbc.fetch.size: number of rows to fetch in a batch * column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: {code:java} hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 {code} Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map column names between Impala JDBC table and the external table. *Add/Drop/Change Columns* User can add/drop/change columns for existing external JDBC table with alter table statement, same syntax as other Impala table: {code:java} ALTER TABLE jdbc_test ADD COLUMN IF NOT EXISTS date_col DATE; ALTER TABLE jdbc_test DROP COLUMN int_col; ALTER TABLE jdbc_test CHANGE COLUMN date_col timestamp_col TIMESTAMP; {code} *Edit table properties* User can also alter table properties of existing external JDBC table using alter table statement, just like other Impala table: {code:java} ALTER TABLE alltypes_jdbc SET TBLPROPERTIES ("dbcp.username"="impala", "dbcp.password"="cloudera"); {code} *Query options for external JDBC table* A new query-option 'clean_dbcp_ds_cache' is added for saving DBCP SQL DataSource objects in cache for a longer time so that DBCP connection pools could be reused across multiple queries. By default, it’s set as true. When it's set as false, a DBCP SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource object is idle for more than 5 minutes. was (Author: wzhou): *Impala SQL syntax to create external JDBC table* When creating an external JDBC table, the user needs to specify the minimum information: database type, jdbc url, driver class, driver file location, user name and password for querying database, table name. Here are two samples to create JDBC tables for tables on Postgres server and another Impala cluster respectively. {code:java} CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://10.96.132.138:5432/functional", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="hiveuser", "dbcp.password"="password", "table"="alltypes"); CREATE EXTERNAL TABLE alltypes_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://10.96.132.138:21050/functional", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="hiveuser", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="hiveuser", "table"="alltypes"); {code} *Supported data types* The column data type for an external JDBC table can be: * Numeric data type: boolean, tinyint, smallint, int, bigint, float, double * Decimal with scale and precision * String type: string * Date * Timestamp Note that following data types are not supported: char, varchar, and binary. Complex data type: struct, map, array and nested type are not supported. *Table Properties specified for external JDBC table* In the create external JDBC table statement, user is required to specify the following table properties: * database.type: IMPALA, MYSQL, POSTGRES * jdbc.url: jdbc connection string, including the database type, IP address, port number, and database name. For example, “jdbc:impala://10.96.132.138:21050/functional”. * jdbc.driver: class name of jdbc driver * driver.url: driver URL for downloading the Jar file package that is used to access the external database. * table: name of the external table to be mapped in Impala. Besides the above required properties, user can also specify optional parameters to use different authentication methods, or to allow case sensitive columns names in remote tables, or to specify additional database properties, etc: * jdbc.auth: authentication mechanisms of JDBC driver. It's used for Impala-Impala federation. * dbcp.username: jdbc user name * dbcp.password: jdbc password in clear text, this parameter is strongly discouraged in production environments. The recommended way is to store it in a keystore. See section “securing password” for details. * dbcp.password.key: key of the keystore * dbcp.password.keystore: keystore URI. * jdbc.properties: additional properties applied to database engine, like Impala Query options. Properties are specified as comma-delimited key=value string. * jdbc.fetch.size: number of rows to fetch in a batch * column.mappting: Mapping of column names between external table and Impala JDBC table. See section "Support case-sensitive table/column names" for details. *Securing password* To mitigate the password leak, the value of “dbcp.password” table property is masked in the output of commands “SHOW CREATE TABLE table-name” and “DESCRIBE FORMATTED | EXTENDED table-name”. In production deployment, it is strongly discouraged to save the jdbc password in clear text in table property "dbcp.password". Instead, user can store password in a Java keystore file on HDFS by using the command like below to create a keystore file: hadoop credential create host1.password -provider jceks://hdfs/user/test.jceks -v passwd1 Then specify “dbcp.password.key” and “dbcp.password.keystore” instead of “dbcp.password” in the create table statement. *Support case-sensitive table/column names* Column names of remote tables may be different from the JDBC table schema. For example, Postgres allows case-sensitive column names, but Impala always saves column names in lowercase. In this case, the user can set the “column.mapping” table property to map column names between Impala JDBC table and the external table. *Add/Drop/Change Columns* User can add/drop/change columns for existing external JDBC table with alter table statement, same syntax as other Impala table: {code:java} ALTER TABLE jdbc_test ADD COLUMN IF NOT EXISTS date_col DATE; ALTER TABLE jdbc_test DROP COLUMN int_col; ALTER TABLE jdbc_test CHANGE COLUMN date_col timestamp_col TIMESTAMP; {code} *Edit table properties* User can also alter table properties of existing external JDBC table using alter table statement, just like other Impala table: {code:java} ALTER TABLE alltypes_jdbc SET TBLPROPERTIES ("dbcp.username"="impala", "dbcp.password"="cloudera"); {code} *Query options for external JDBC table* A new query-option 'clean_dbcp_ds_cache' is added for saving DBCP SQL DataSource objects in cache for a longer time so that DBCP connection pools could be reused across multiple queries. By default, it’s set as true. When it's set as false, a DBCP SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource object is idle for more than 5 minutes. > Update Impala document to cover external jdbc table > --------------------------------------------------- > > Key: IMPALA-12754 > URL: https://issues.apache.org/jira/browse/IMPALA-12754 > Project: IMPALA > Issue Type: Sub-task > Components: Docs > Reporter: Wenzhe Zhou > Assignee: gaurav singh > Priority: Major > > We need to document the SQL syntax to create external JDBC table and alter > external JDBC table, including the table properties to be set for JDBC and > DBCP (Database Connection Pool). > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org