date:20240401

[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-04-01 Thread Maxwell Guo (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833010#comment-17833010
 ] 

Maxwell Guo commented on IMPALA-12771:
--

hi [~hemanth619], thanks for your review comments, I have responded to your 
comments and updated the latest code. Looking forward to your reply. :)

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

2024-04-01 Thread halim kim (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832984#comment-17832984
 ] 

halim kim commented on IMPALA-12291:


[~fangyurao] Thank you for letting me know. I will check it out.

> Insert statement fails even if hdfs ranger policy allows it
> ---
>
> Key: IMPALA-12291
> URL: https://issues.apache.org/jira/browse/IMPALA-12291
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe, Security
> Environment: - Impala Version (4.1.0)
> - Ranger admin version (2.0)
> - Hive version (3.1.2)
>Reporter: halim kim
>Assignee: halim kim
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache Ranger is framework for providing security and authorization in hadoop 
> platform.
> Impala can also utilize apache ranger via ranger hive policy.
> The thing is that insert or some other query is not executed even If you 
> enable ranger hdfs plugin and set proper allow condition for impala query 
> excuting.
> you can see error log like below.
> {code:java}
> AnalysisException: Unable to INSERT into target table (testdb.testtable) 
> because Impala does not have WRITE access to HDFS location: 
> hdfs://testcluster/warehouse/testdb.db/testtable
> {code}
> This happens when ranger hdfs plugin is enabled but impala doesn't have 
> permission for hdfs POSIX permission. 
> For example, In the case that DB file owner, group and permission is set as 
> hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows 
> impala to execute query, Insert Query will be fail.
> In my opinion, The main cause is impala fe component doesn't check ranger 
> policy but hdfs POSIX model permissions. 
> Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272
> I'm working on resolving this issue by adding hdfs ranger policy checking 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

2024-04-01 Thread Fang-Yu Rao (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-12291.
--
Resolution: Duplicate

This seems to be a duplicate of IMPALA-11871. We could probably continue our 
discussion there. I will also review the patch at 
https://gerrit.cloudera.org/c/20221/ and see how we could proceed.

cc: [~khr9603], [~stigahuang], [~amansinha]

> Insert statement fails even if hdfs ranger policy allows it
> ---
>
> Key: IMPALA-12291
> URL: https://issues.apache.org/jira/browse/IMPALA-12291
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe, Security
> Environment: - Impala Version (4.1.0)
> - Ranger admin version (2.0)
> - Hive version (3.1.2)
>Reporter: halim kim
>Assignee: halim kim
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache Ranger is framework for providing security and authorization in hadoop 
> platform.
> Impala can also utilize apache ranger via ranger hive policy.
> The thing is that insert or some other query is not executed even If you 
> enable ranger hdfs plugin and set proper allow condition for impala query 
> excuting.
> you can see error log like below.
> {code:java}
> AnalysisException: Unable to INSERT into target table (testdb.testtable) 
> because Impala does not have WRITE access to HDFS location: 
> hdfs://testcluster/warehouse/testdb.db/testtable
> {code}
> This happens when ranger hdfs plugin is enabled but impala doesn't have 
> permission for hdfs POSIX permission. 
> For example, In the case that DB file owner, group and permission is set as 
> hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows 
> impala to execute query, Insert Query will be fail.
> In my opinion, The main cause is impala fe component doesn't check ranger 
> policy but hdfs POSIX model permissions. 
> Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272
> I'm working on resolving this issue by adding hdfs ranger policy checking 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12873) Support password protected keystore

2024-04-01 Thread Wenzhe Zhou (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832969#comment-17832969
 ] 

Wenzhe Zhou commented on IMPALA-12873:
--

Did not find document and sample how to use password to protect jcek files. 
Checked source code of Hive JDBC storage. It does not use password to protect 
jcek files. It's likely we don't need to do anything.
 

> Support password protected keystore
> ---
>
> Key: IMPALA-12873
> URL: https://issues.apache.org/jira/browse/IMPALA-12873
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> IMPALA-12380 allow user to store jdbc password in a Java keystore file on 
> HDFS. 
> Keystores are generally password protected and so user need a password for 
> accessing keystore. (See 
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords).
> From the Credential API link, if keystore has a password then it can be 
> accessed if password is provided using either the environment variable 
> "HADOOP_CREDSTORE_PASSWORD" or a file containing password and configured in 
> core-site.xml with key 
> hadoop.security.credstore.java-keystore-provider.password-file (See
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java#L214)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-12722) Add test cases for MySQL and Postgres to set additional properties with jdbc.properties

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou closed IMPALA-12722.

Resolution: Won't Do

Did not find a way to verify if the settings take effect in Postgres and MySQL.

> Add test cases for MySQL and Postgres to set additional properties with 
> jdbc.properties
> ---
>
> Key: IMPALA-12722
> URL: https://issues.apache.org/jira/browse/IMPALA-12722
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Wenzhe Zhou
>Assignee: gaurav singh
>Priority: Major
>
> IMPALA-12642 added supporting query options for Impala external JDBC table. 
> It uses JDBC connection string to apply query options to the Impala server by 
> setting the properties in "jdbc.properties" when creating JDBC external 
> DataSource table.
> jdbc.properties can be used for other databases like Postgres and MySQL
> to set additional properties. We need to add test cases for Postgres and 
> MySQL to verify if the settings take effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12909) Generate distributed plan for query accessing multiple JDBC tables

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-12909:


Assignee: Pranav Yogi Lodha

> Generate distributed plan for query accessing multiple JDBC tables
> --
>
> Key: IMPALA-12909
> URL: https://issues.apache.org/jira/browse/IMPALA-12909
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> For a query which access multiple JDBC tables, Planner generate single node 
> plan. It's better to generate distributed plan so that Impala could open 
> multiple JDBC connections in parallel. This restriction is due to current 
> design of External data source framework because scan is single threaded. 
> DataSourceScanNode cannot run in node other than coordinator. 
> There is no issue for query with join between JDBC table and non JDBC table. 
> We have this issue only for all scans as JDBC table scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12583) Support reading hive "information_schema" views in Impala

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-12583:


Assignee: Pranav Yogi Lodha  (was: Wenzhe Zhou)

> Support reading hive "information_schema" views in Impala
> -
>
> Key: IMPALA-12583
> URL: https://issues.apache.org/jira/browse/IMPALA-12583
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Manish Maheshwari
>Assignee: Pranav Yogi Lodha
>Priority: Major
> Attachments: image-2023-11-30-02-24-18-869.png, information_schema.txt
>
>
> Hive supports "information_schema" db that all jdbc tables exposed from the 
> HMS database. The same jdbc source tables should be queryable in Impala too.
>  
> !image-2023-11-30-02-24-18-869.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12873) Support password protected keystore

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-12873:


Assignee: Pranav Yogi Lodha

> Support password protected keystore
> ---
>
> Key: IMPALA-12873
> URL: https://issues.apache.org/jira/browse/IMPALA-12873
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> IMPALA-12380 allow user to store jdbc password in a Java keystore file on 
> HDFS. 
> Keystores are generally password protected and so user need a password for 
> accessing keystore. (See 
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords).
> From the Credential API link, if keystore has a password then it can be 
> accessed if password is provided using either the environment variable 
> "HADOOP_CREDSTORE_PASSWORD" or a file containing password and configured in 
> core-site.xml with key 
> hadoop.security.credstore.java-keystore-provider.password-file (See
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java#L214)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12789) Fix unit-test code JdbcDataSourceTest.java

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-12789:


Assignee: Pranav Yogi Lodha  (was: Wenzhe Zhou)

> Fix unit-test code JdbcDataSourceTest.java
> --
>
> Key: IMPALA-12789
> URL: https://issues.apache.org/jira/browse/IMPALA-12789
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> This JDBC unit-test 
> (java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java)
>  was implemented with H2 database. We don't have H2 in our environment and 
> the code was out of date. We need to rewrite this unit-test in Postgres.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12426) SQL Interface to Completed Queries/DDLs/DMLs

2024-04-01 Thread Michael Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12426.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> SQL Interface to Completed Queries/DDLs/DMLs
> 
>
> Key: IMPALA-12426
> URL: https://issues.apache.org/jira/browse/IMPALA-12426
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, be
>Reporter: Jason Fehr
>Assignee: Jason Fehr
>Priority: Major
>  Labels: impala, workload-management
> Fix For: Impala 4.4.0
>
>
> Implement a way of querying (via SQL) information about completed 
> queries/ddls/dmls.  Adds coordinator startup flags for users to specify that 
> Impala will track completed queries in an internal table.
> Impala will create and maintain an internal Iceberg table named 
> "impala_query_log" in the "system database" that contains all completed 
> queries. This table is automatically created at startup by each coordinator 
> if it does not exist. Then, each completed query is queued in memory and 
> flushed to the query history table either at a set interval (user specified 
> number of minutes) or when a user specified number of completed queries are 
> queued in memory.  Partition this table by the hour of the query end time.
> Data in this table must match the corresponding data in the query profile.  
> Develop automated testing that asserts this requirement is true.
> Don't write use, show, and set queries to this table.
> Add the following metrics to the "impala-server" metrics group:
> * Number of completed queries queued in memory waiting to be written to the 
> table.
> * Number of completed queries successfully written to the table.
> * Number of attempts that failed to write completed queries to the table.
> * Number of times completed queries were written at the regularly scheduled 
> time.
> * Number of times completed queries were written before the scheduled time 
> because the max number of queued records was reached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-04-01 Thread Fang-Yu Rao (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832957#comment-17832957
 ] 

Fang-Yu Rao commented on IMPALA-11871:
--

After reading some past JIRA's in this area, I think it should be safe to skip 
{*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to 
disable it). Before the fix is ready, we could add the following to the 
*core-site.xml* consumed by the catalog server to allow an authorized user (by 
Ranger via Impala's frontend) to insert values into an HDFS table in the 
{*}legacy catalog mode{*}. Recall that the catalog server would consider the 
service user, usually named '{*}impala{*}', as a super user as long as the user 
'{*}impala{*}' belongs to the specified super group by 
''.
{code:java}
   
dfs.permissions.superusergroup

true
  
{code}
This is still secure when Ranger is the authorization provider because of the 
following.
 # For the INSERT statement, Impala's frontend makes sure the logged-in user 
(not necessarily the service user '{*}impala{*}') is granted the necessary 
privilege on the target table. The respective audit log entry is also produced 
whether or not the query is authorized even though we skip 
{*}analyzeWriteAccess{*}().
 # For a query that has been authorized by Impala's frontend and sent to the 
backend for execution, if Impala's backend interacts with the underlying 
services, e.g., HDFS, as the service user '{*}impala{*}', then this service 
user should always be considered as a super user or a user in a super group.

 
+*Detailed Analysis*+
We started performing such a permissions checking in [IMPALA-1279: Check ACLs 
for INSERT and LOAD 
statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce]
 when we were using Sentry as authorization provider. The reason to implement 
IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted 
below for easy reference. In short, we would like to fail a query as early as 
possible if there could be permissions-related issue.
{quote}Impala checks permissions for LOAD and INSERT statements before 
executing them to allow for early-exit if the query would not succeed. However, 
it does not take extended ACLs in CDH5 into account.

When a directory has restrictive Posix permissions (e.g. 000), but has an ACL 
allowing writes, Impala should allow INSERTs and LOADs to happen to that 
directory. Instead, the early check will disallow them.

If the checks were disabled, the queries would execute (or not!) correctly, 
because we delegate to libhdfs or the DistributedFileSystem API to actually 
perform the operations we need.
{quote}
We hand-crafted the permissions checker within Impala. Specifically, in our 
[implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222],
 Hadoop ACL entries takes precedence over the POSIX permissions and we did 
*not* take into consideration the policies that could be defined on the HDFS 
path when the authorization provider is Ranger.

Due to how we implemented 
[FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java],
 it's possible that even though a logged-in user has been authorized to execute 
an INSERT statement into a table via a policy added to Ranger's repository of 
SQL, the query could fail during the analysis, simply because the service user, 
usually named '{*}impala{*}', could not pass the permissions checker. For 
instance, this could occur if the table to insert was created by another query 
engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service 
user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of 
"{*}group::r-x{*}" by default when the table was created. The current 
implementation of Impala's permissions checker would deny the service user 
'{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the 
group of '{*}hive{*}' as shown in the following.
{code:java}
[r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl 

# file:  # owner: hive
# group: hive
user::rwx
group::r-x
other::r-x
 
[r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala
impala : impala hive {code}
 
In 
[IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76],
 we allowed an administrator to specify the name of the super group (from 
catalog server's perspective). Once the *current user* belongs to the specified 
super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' 
("{*}dfs.permissions.superusergroup{*}"), which defaulted to 
'{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then 
catalog server would grant the WRITE request against the corresponding table 
from the current user. Refer t

[jira] [Created] (IMPALA-12965) Add debug query option to skip runtime filter

2024-04-01 Thread Riza Suminto (Jira)

Riza Suminto created IMPALA-12965:
-

 Summary: Add debug query option to skip runtime filter
 Key: IMPALA-12965
 URL: https://issues.apache.org/jira/browse/IMPALA-12965
 Project: IMPALA
  Issue Type: New Feature
  Components: Frontend
Reporter: Riza Suminto
Assignee: Riza Suminto


Runtime filter still have negative effect on certain scenario such as long wait 
time that delays scan and cascading runtime filter chain that prevents parallel 
execution of fragments.

Having debug query option to simply skip a runtime filter id from being 
scheduled can help us investigate and test a solution like IMPALA-12357 early 
before implementing the improvement code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work stopped] (IMPALA-12583) Support reading hive "information_schema" views in Impala

2024-04-01 Thread Wenzhe Zhou (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12583 stopped by Wenzhe Zhou.

> Support reading hive "information_schema" views in Impala
> -
>
> Key: IMPALA-12583
> URL: https://issues.apache.org/jira/browse/IMPALA-12583
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Manish Maheshwari
>Assignee: Wenzhe Zhou
>Priority: Major
> Attachments: image-2023-11-30-02-24-18-869.png, information_schema.txt
>
>
> Hive supports "information_schema" db that all jdbc tables exposed from the 
> HMS database. The same jdbc source tables should be queryable in Impala too.
>  
> !image-2023-11-30-02-24-18-869.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12657) Improve ProcessingCost of ScanNode and NonGroupingAggregator

2024-04-01 Thread Riza Suminto (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto reassigned IMPALA-12657:
-

Assignee: David Rorke  (was: Riza Suminto)

> Improve ProcessingCost of ScanNode and NonGroupingAggregator
> 
>
> Key: IMPALA-12657
> URL: https://issues.apache.org/jira/browse/IMPALA-12657
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Riza Suminto
>Assignee: David Rorke
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: profile_1f4d7a679a3e12d5_42231157.txt
>
>
> Several benchmark run measuring Impala scan performance indicates some 
> costing improvement opportunity around ScanNode and NonGroupingAggregator.
> [^profile_1f4d7a679a3e12d5_42231157.txt] shows an example of simple 
> count query.
> Key takeaway:
>  # There is a strong correlation between total materialized bytes (row-size * 
> cardinality) with total materialized tuple time per fragment. Row 
> materialization cost should be adjusted to be based on this row-sized instead 
> of equal cost per scan range.
>  # NonGroupingAggregator should have much lower cost that GroupingAggregator. 
> In example above, the cost of NonGroupingAggregator dominates the scan 
> fragment even though it only does simple counting instead of hash table 
> operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12964) Implement aggregation capability

2024-04-01 Thread Steve Carlin (Jira)

Steve Carlin created IMPALA-12964:
-

 Summary: Implement aggregation capability
 Key: IMPALA-12964
 URL: https://issues.apache.org/jira/browse/IMPALA-12964
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

2024-04-01 Thread Yida Wu (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832895#comment-17832895
 ] 

Yida Wu commented on IMPALA-12963:
--

Hi [~jasonmfehr], assigning this jira to you because the testcase was added in 
a recent task IMPALA-12426, and you might be familiar with it.

> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
> ---
>
> Key: IMPALA-12963
> URL: https://issues.apache.org/jira/browse/IMPALA-12963
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yida Wu
>Assignee: Jason Fehr
>Priority: Major
>
> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with 
> following messages:
> *Error Message*
> {code:java}
> test setup failure
> {code}
> *Stacktrace*
> {code:java}
> common/custom_cluster_test_suite.py:226: in teardown_method
> impalad.wait_for_exit()
> common/impala_cluster.py:471: in wait_for_exit
> while self.__get_pid() is not None:
> common/impala_cluster.py:414: in __get_pid
> assert len(pids) < 2, "Expected single pid but found %s" % ", 
> ".join(map(str, pids))
> E   AssertionError: Expected single pid but found 892, 31942
> {code}
> *Standard Error*
> {code:java}
> -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=1 --cluster_id=test_max_select 
> --shutdown_grace_period_s=10 --shutdown_deadline_s=60 
> --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' 
> '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 04:21:44 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:21:44 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2
> 04:21:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:50 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:50 MainThread: num_known_live_backends has reached value: 3
> 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25001
> 04:21:51 MainThread: num_known_live_backends has reached value: 3
> 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25002
> 04:21:51 MainThread: num_k

[jira] [Created] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

2024-04-01 Thread Yida Wu (Jira)

Yida Wu created IMPALA-12963:


 Summary: Testcase test_query_log_table_lower_max_sql_plan failed 
in ubsan builds
 Key: IMPALA-12963
 URL: https://issues.apache.org/jira/browse/IMPALA-12963
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Yida Wu
Assignee: Jason Fehr


Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with 
following messages:
*Error Message*
{code:java}
test setup failure
{code}
*Stacktrace*
{code:java}
common/custom_cluster_test_suite.py:226: in teardown_method
impalad.wait_for_exit()
common/impala_cluster.py:471: in wait_for_exit
while self.__get_pid() is not None:
common/impala_cluster.py:414: in __get_pid
assert len(pids) < 2, "Expected single pid but found %s" % ", 
".join(map(str, pids))
E   AssertionError: Expected single pid but found 892, 31942
{code}
*Standard Error*
{code:java}
-- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args=--enable_workload_mgmt 
--query_log_write_interval_s=1 --cluster_id=test_max_select 
--shutdown_grace_period_s=10 --shutdown_deadline_s=60 
--query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' 
'--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' 
--impalad_args=--default_query_options=
04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
04:21:44 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
04:21:44 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
04:21:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
04:21:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
04:21:44 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:47 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0
04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:48 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:49 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2
04:21:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:50 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
04:21:50 MainThread: num_known_live_backends has reached value: 3
04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:51 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25001
04:21:51 MainThread: num_known_live_backends has reached value: 3
04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
04:21:51 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25002
04:21:51 MainThread: num_known_live_backends has reached value: 3
04:21:52 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
executors).
-- 2024-03-28 04:21:52,490 DEBUGMainThread: Found 3 impalad/1 statestored/1 
catalogd process(es)
-- 2024-03-28 04:21:52,490 INFO MainThread: Getting metric: 
statestore.live-backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25010
-- 2024-03-28 04:21:52,492 INFO MainThread: Metric 
'statestore.live-backends' has reached desired value: 4
-- 2024-03-28 04:21:52,493 DEBUGMainThread: G

[jira] [Created] (IMPALA-12962) Estimated metadata size of a table doesn't match the actual java object size

2024-04-01 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-12962:
---

 Summary: Estimated metadata size of a table doesn't match the 
actual java object size
 Key: IMPALA-12962
 URL: https://issues.apache.org/jira/browse/IMPALA-12962
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


Catalogd shows the top-25 largest tables in its WebUI at the "/catalog" 
endpoint. The estimated metadata size is computed in HdfsTable#getTHdfsTable():
[https://github.com/apache/impala/blob/0d49c9d6cc7fc0903d60a78d8aaa996af0249c06/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2414-L2451]
The current formula is
 * memUsageEstimate = numPartitions * 2KB + numFiles * 500B + numBlocks * 150B 
+ (optional) incrementalStats
 * (optional) incrementalStats = numPartitions * numColumns * 200B

It's ok to use this formula to compare tables. But it can't be used to estimate 
the max heap size of catalogd. E.g. it doesn't consider the column comments and 
tblproperties which could have long strings. Column names should also be 
considered in case the table is a wide table.

We can compare the estimated sizes with results from ehcache-sizeof or jamm and 
update the formula. Or use these libraries to estimate the sizes directly if 
they won't impact the performance.

CC [~MikaelSmith] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

[jira] [Commented] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it

[jira] [Commented] (IMPALA-12873) Support password protected keystore

[jira] [Closed] (IMPALA-12722) Add test cases for MySQL and Postgres to set additional properties with jdbc.properties

[jira] [Assigned] (IMPALA-12909) Generate distributed plan for query accessing multiple JDBC tables

[jira] [Assigned] (IMPALA-12583) Support reading hive "information_schema" views in Impala

[jira] [Assigned] (IMPALA-12873) Support password protected keystore

[jira] [Assigned] (IMPALA-12789) Fix unit-test code JdbcDataSourceTest.java

[jira] [Resolved] (IMPALA-12426) SQL Interface to Completed Queries/DDLs/DMLs

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

[jira] [Created] (IMPALA-12965) Add debug query option to skip runtime filter

[jira] [Work stopped] (IMPALA-12583) Support reading hive "information_schema" views in Impala

[jira] [Assigned] (IMPALA-12657) Improve ProcessingCost of ScanNode and NonGroupingAggregator

[jira] [Created] (IMPALA-12964) Implement aggregation capability

[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

[jira] [Created] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

[jira] [Created] (IMPALA-12962) Estimated metadata size of a table doesn't match the actual java object size

18 matches

Site Navigation

Mail list logo

Footer information