[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

2023-10-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781114#comment-17781114
 ] 

ASF subversion and git services commented on IMPALA-7131:
-

Commit c77a4575201b23bc44f7c3560a34e3acf2f44b2d in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c77a45752 ]

IMPALA-7131: Support external data sources in LocalCatalog mode

This patch makes external data source working in LocalCatalog mode:
 - Add APIs in CatalogdMetaProvider to fetch DataSource from Catalog
   server through RPC.
 - Add getDataSources() and getDataSource() in LocalCatalog.
 - Add LocalDataSourceTable class for loading DataSource table in
   LocalCatalog.
 - Handle request for loading DataSource in CatalogServiceCatalog on
   Catalog server.
 - Enable tests which are skipped by
   SkipIfCatalogV2.data_sources_unsupported().
   Remove SkipIfCatalogV2.data_sources_unsupported().
 - Add end-to-end tests for LocalCatalog mode.

Testing:
 - Passed core tests

Change-Id: I40841c9be9064ac67771c4d3f5acbb3b552a2e55
Reviewed-on: http://gerrit.cloudera.org:8080/20574
Tested-by: Impala Public Jenkins 
Reviewed-by: Wenzhe Zhou 


> Support external data sources in local catalog mode
> ---
>
> Key: IMPALA-7131
> URL: https://issues.apache.org/jira/browse/IMPALA-7131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Todd Lipcon
>Assignee: Wenzhe Zhou
>Priority: Minor
>  Labels: catalog-v2
>
> Currently it seems that external data sources are not persisted except in 
> memory on the catalogd. This means that it will be somewhat more difficult to 
> support this feature in the design of impalad without a catalogd.
> This JIRA is to eventually figure out a way to support this feature -- either 
> by supporting in-memory on a per-impalad basis, or perhaps by figuring out a 
> way to register them persistently in a file system directory, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

2023-10-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773839#comment-17773839
 ] 

ASF subversion and git services commented on IMPALA-7131:
-

Commit c2bd30a1b3b49ccc7770ec5ab4adeb0b75f40240 in impala's branch 
refs/heads/master from Fucun Chu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2bd30a1b ]

IMPALA-5741: Initial support for reading tiny RDBMS tables

This patch uses the "external data source" mechanism in Impala to
implement data source for querying JDBC.
It has some limitations due to the restrictions of "external data
source":
  - It is not distributed, e.g, fragment is unpartitioned. The queries
are executed on coordinator.
  - Queries which read following data types from external JDBC tables
are not supported:
BINARY, CHAR, DATETIME, and COMPLEX.
  - Only support binary predicates with operators =, !=, <=, >=,
<, > to be pushed to RDBMS.
  - Following data types are not supported for predicates:
DECIMAL, TIMESTAMP, DATE, and BINARY.
  - External tables with complex types of columns are not supported.
  - Support is limited to the following databases:
MySQL, Postgres, Oracle, MSSQL, H2, DB2, and JETHRO_DATA.
  - Catalog V2 is not supported (IMPALA-7131).
  - DataSource objects are not persistent (IMPALA-12375).

Additional fixes are planned on top of this patch.

Source files under jdbc/conf, jdbc/dao and jdbc/exception are
replicated from Hive JDBC Storage Handler.

In order to query the RDBMS tables, the following steps should be
followed (note that existing data source table will be rebuilt):
1. Make sure the Impala cluster has been started.

2. Copy the jar files of JDBC drivers and the data source library into
HDFS.
${IMPALA_HOME}/testdata/bin/copy-ext-data-sources.sh

3. Create an `alltypes` table in the Postgres database.
${IMPALA_HOME}/testdata/bin/load-ext-data-sources.sh

4. Create data source tables (alltypes_jdbc_datasource and
alltypes_jdbc_datasource_2).
${IMPALA_HOME}/bin/impala-shell.sh -f\
  ${IMPALA_HOME}/testdata/bin/create-ext-data-source-table.sql

5. It's ready to run query to access data source tables created
in last step. Don't need to restart Impala cluster.

Testing:
 - Added unit-test for Postgres and ran unit-test with JDBC driver
   postgresql-42.5.1.jar.
 - Ran manual unit-test for MySql with JDBC driver
   mysql-connector-j-8.1.0.jar.
 - Ran core tests successfully.

Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Reviewed-on: http://gerrit.cloudera.org:8080/17842
Reviewed-by: Wenzhe Zhou 
Reviewed-by: Kurt Deschler 
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Support external data sources in local catalog mode
> ---
>
> Key: IMPALA-7131
> URL: https://issues.apache.org/jira/browse/IMPALA-7131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Todd Lipcon
>Assignee: Wenzhe Zhou
>Priority: Minor
>  Labels: catalog-v2
>
> Currently it seems that external data sources are not persisted except in 
> memory on the catalogd. This means that it will be somewhat more difficult to 
> support this feature in the design of impalad without a catalogd.
> This JIRA is to eventually figure out a way to support this feature -- either 
> by supporting in-memory on a per-impalad basis, or perhaps by figuring out a 
> way to register them persistently in a file system directory, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

2023-08-17 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755730#comment-17755730
 ] 

Wenzhe Zhou commented on IMPALA-7131:
-

getDataSource() and getDataSources() are defined in FeCatalog interface, but 
they are not implemented by LocalCatalog class, and the cache of data source 
objects is not defined in MetaProvider class. 
addDataSource() and removeDataSource() are not defined in FeCatalog interface.

> Support external data sources in local catalog mode
> ---
>
> Key: IMPALA-7131
> URL: https://issues.apache.org/jira/browse/IMPALA-7131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Todd Lipcon
>Assignee: Wenzhe Zhou
>Priority: Minor
>  Labels: catalog-v2
>
> Currently it seems that external data sources are not persisted except in 
> memory on the catalogd. This means that it will be somewhat more difficult to 
> support this feature in the design of impalad without a catalogd.
> This JIRA is to eventually figure out a way to support this feature -- either 
> by supporting in-memory on a per-impalad basis, or perhaps by figuring out a 
> way to register them persistently in a file system directory, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7131) Support external data sources in local catalog mode

2023-08-04 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751119#comment-17751119
 ] 

Quanlong Huang commented on IMPALA-7131:


Modified the title to use "local catalog mode".

> Support external data sources in local catalog mode
> ---
>
> Key: IMPALA-7131
> URL: https://issues.apache.org/jira/browse/IMPALA-7131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog, Frontend
>Reporter: Todd Lipcon
>Priority: Minor
>  Labels: catalog-v2
>
> Currently it seems that external data sources are not persisted except in 
> memory on the catalogd. This means that it will be somewhat more difficult to 
> support this feature in the design of impalad without a catalogd.
> This JIRA is to eventually figure out a way to support this feature -- either 
> by supporting in-memory on a per-impalad basis, or perhaps by figuring out a 
> way to register them persistently in a file system directory, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org