[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number
[ https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833010#comment-17833010 ] Maxwell Guo commented on IMPALA-12771: -- hi [~hemanth619], thanks for your review comments, I have responded to your comments and updated the latest code. Looking forward to your reply. :) > Impala catalogd events-skipped may mark the wrong number > > > Key: IMPALA-12771 > URL: https://issues.apache.org/jira/browse/IMPALA-12771 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Minor > > See the description of [event-skipped > metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237] > > {code:java} > // total number of events which are skipped because of the flag setting or > // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were > ignored > // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in > the catalogd. > {code} > > As for CREATE and DROP event on Database/Table/Partition (Also AddPartition > is inclued) when we found that the table/database when the database or table > is not found in the cache then we will skip the event process and make the > event-skipped metric +1. > But I found that there is some question here for alter table and Reload event: > * For Reload event that is not describe in the description of events-skipped, > but the value is +1 when is oldevent; > * Besides if the table is in blacklist the metric will also +1 > In summary, I think this description is inconsistent with the actual > implementation. > So can we also mark the events-skipped metric for alter partition events and > modify the > description to be all the events skipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it
[ https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832984#comment-17832984 ] halim kim commented on IMPALA-12291: [~fangyurao] Thank you for letting me know. I will check it out. > Insert statement fails even if hdfs ranger policy allows it > --- > > Key: IMPALA-12291 > URL: https://issues.apache.org/jira/browse/IMPALA-12291 > Project: IMPALA > Issue Type: Bug > Components: fe, Security > Environment: - Impala Version (4.1.0) > - Ranger admin version (2.0) > - Hive version (3.1.2) >Reporter: halim kim >Assignee: halim kim >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Apache Ranger is framework for providing security and authorization in hadoop > platform. > Impala can also utilize apache ranger via ranger hive policy. > The thing is that insert or some other query is not executed even If you > enable ranger hdfs plugin and set proper allow condition for impala query > excuting. > you can see error log like below. > {code:java} > AnalysisException: Unable to INSERT into target table (testdb.testtable) > because Impala does not have WRITE access to HDFS location: > hdfs://testcluster/warehouse/testdb.db/testtable > {code} > This happens when ranger hdfs plugin is enabled but impala doesn't have > permission for hdfs POSIX permission. > For example, In the case that DB file owner, group and permission is set as > hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows > impala to execute query, Insert Query will be fail. > In my opinion, The main cause is impala fe component doesn't check ranger > policy but hdfs POSIX model permissions. > Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272 > I'm working on resolving this issue by adding hdfs ranger policy checking > code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12291) Insert statement fails even if hdfs ranger policy allows it
[ https://issues.apache.org/jira/browse/IMPALA-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-12291. -- Resolution: Duplicate This seems to be a duplicate of IMPALA-11871. We could probably continue our discussion there. I will also review the patch at https://gerrit.cloudera.org/c/20221/ and see how we could proceed. cc: [~khr9603], [~stigahuang], [~amansinha] > Insert statement fails even if hdfs ranger policy allows it > --- > > Key: IMPALA-12291 > URL: https://issues.apache.org/jira/browse/IMPALA-12291 > Project: IMPALA > Issue Type: Bug > Components: fe, Security > Environment: - Impala Version (4.1.0) > - Ranger admin version (2.0) > - Hive version (3.1.2) >Reporter: halim kim >Assignee: halim kim >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Apache Ranger is framework for providing security and authorization in hadoop > platform. > Impala can also utilize apache ranger via ranger hive policy. > The thing is that insert or some other query is not executed even If you > enable ranger hdfs plugin and set proper allow condition for impala query > excuting. > you can see error log like below. > {code:java} > AnalysisException: Unable to INSERT into target table (testdb.testtable) > because Impala does not have WRITE access to HDFS location: > hdfs://testcluster/warehouse/testdb.db/testtable > {code} > This happens when ranger hdfs plugin is enabled but impala doesn't have > permission for hdfs POSIX permission. > For example, In the case that DB file owner, group and permission is set as > hdfs:hdfs r-xr-xr-- and ranger plugin policy(hdfs, hive and impala) allows > impala to execute query, Insert Query will be fail. > In my opinion, The main cause is impala fe component doesn't check ranger > policy but hdfs POSIX model permissions. > Similar issue : https://issues.apache.org/jira/browse/IMPALA-10272 > I'm working on resolving this issue by adding hdfs ranger policy checking > code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12873) Support password protected keystore
[ https://issues.apache.org/jira/browse/IMPALA-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832969#comment-17832969 ] Wenzhe Zhou commented on IMPALA-12873: -- Did not find document and sample how to use password to protect jcek files. Checked source code of Hive JDBC storage. It does not use password to protect jcek files. It's likely we don't need to do anything. > Support password protected keystore > --- > > Key: IMPALA-12873 > URL: https://issues.apache.org/jira/browse/IMPALA-12873 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Wenzhe Zhou >Assignee: Pranav Yogi Lodha >Priority: Major > > IMPALA-12380 allow user to store jdbc password in a Java keystore file on > HDFS. > Keystores are generally password protected and so user need a password for > accessing keystore. (See > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords). > From the Credential API link, if keystore has a password then it can be > accessed if password is provided using either the environment variable > "HADOOP_CREDSTORE_PASSWORD" or a file containing password and configured in > core-site.xml with key > hadoop.security.credstore.java-keystore-provider.password-file (See > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java#L214) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-12722) Add test cases for MySQL and Postgres to set additional properties with jdbc.properties
[ https://issues.apache.org/jira/browse/IMPALA-12722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou closed IMPALA-12722. Resolution: Won't Do Did not find a way to verify if the settings take effect in Postgres and MySQL. > Add test cases for MySQL and Postgres to set additional properties with > jdbc.properties > --- > > Key: IMPALA-12722 > URL: https://issues.apache.org/jira/browse/IMPALA-12722 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Wenzhe Zhou >Assignee: gaurav singh >Priority: Major > > IMPALA-12642 added supporting query options for Impala external JDBC table. > It uses JDBC connection string to apply query options to the Impala server by > setting the properties in "jdbc.properties" when creating JDBC external > DataSource table. > jdbc.properties can be used for other databases like Postgres and MySQL > to set additional properties. We need to add test cases for Postgres and > MySQL to verify if the settings take effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12909) Generate distributed plan for query accessing multiple JDBC tables
[ https://issues.apache.org/jira/browse/IMPALA-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-12909: Assignee: Pranav Yogi Lodha > Generate distributed plan for query accessing multiple JDBC tables > -- > > Key: IMPALA-12909 > URL: https://issues.apache.org/jira/browse/IMPALA-12909 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Wenzhe Zhou >Assignee: Pranav Yogi Lodha >Priority: Major > > For a query which access multiple JDBC tables, Planner generate single node > plan. It's better to generate distributed plan so that Impala could open > multiple JDBC connections in parallel. This restriction is due to current > design of External data source framework because scan is single threaded. > DataSourceScanNode cannot run in node other than coordinator. > There is no issue for query with join between JDBC table and non JDBC table. > We have this issue only for all scans as JDBC table scans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12583) Support reading hive "information_schema" views in Impala
[ https://issues.apache.org/jira/browse/IMPALA-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-12583: Assignee: Pranav Yogi Lodha (was: Wenzhe Zhou) > Support reading hive "information_schema" views in Impala > - > > Key: IMPALA-12583 > URL: https://issues.apache.org/jira/browse/IMPALA-12583 > Project: IMPALA > Issue Type: Sub-task >Reporter: Manish Maheshwari >Assignee: Pranav Yogi Lodha >Priority: Major > Attachments: image-2023-11-30-02-24-18-869.png, information_schema.txt > > > Hive supports "information_schema" db that all jdbc tables exposed from the > HMS database. The same jdbc source tables should be queryable in Impala too. > > !image-2023-11-30-02-24-18-869.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12873) Support password protected keystore
[ https://issues.apache.org/jira/browse/IMPALA-12873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-12873: Assignee: Pranav Yogi Lodha > Support password protected keystore > --- > > Key: IMPALA-12873 > URL: https://issues.apache.org/jira/browse/IMPALA-12873 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Wenzhe Zhou >Assignee: Pranav Yogi Lodha >Priority: Major > > IMPALA-12380 allow user to store jdbc password in a Java keystore file on > HDFS. > Keystores are generally password protected and so user need a password for > accessing keystore. (See > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords). > From the Credential API link, if keystore has a password then it can be > accessed if password is provided using either the environment variable > "HADOOP_CREDSTORE_PASSWORD" or a file containing password and configured in > core-site.xml with key > hadoop.security.credstore.java-keystore-provider.password-file (See > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ProviderUtils.java#L214) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12789) Fix unit-test code JdbcDataSourceTest.java
[ https://issues.apache.org/jira/browse/IMPALA-12789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-12789: Assignee: Pranav Yogi Lodha (was: Wenzhe Zhou) > Fix unit-test code JdbcDataSourceTest.java > -- > > Key: IMPALA-12789 > URL: https://issues.apache.org/jira/browse/IMPALA-12789 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Wenzhe Zhou >Assignee: Pranav Yogi Lodha >Priority: Major > > This JDBC unit-test > (java/ext-data-source/jdbc/src/test/java/org/apache/impala/extdatasource/jdbc/JdbcDataSourceTest.java) > was implemented with H2 database. We don't have H2 in our environment and > the code was out of date. We need to rewrite this unit-test in Postgres. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12426) SQL Interface to Completed Queries/DDLs/DMLs
[ https://issues.apache.org/jira/browse/IMPALA-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith resolved IMPALA-12426. Fix Version/s: Impala 4.4.0 Resolution: Fixed > SQL Interface to Completed Queries/DDLs/DMLs > > > Key: IMPALA-12426 > URL: https://issues.apache.org/jira/browse/IMPALA-12426 > Project: IMPALA > Issue Type: New Feature > Components: Backend, be >Reporter: Jason Fehr >Assignee: Jason Fehr >Priority: Major > Labels: impala, workload-management > Fix For: Impala 4.4.0 > > > Implement a way of querying (via SQL) information about completed > queries/ddls/dmls. Adds coordinator startup flags for users to specify that > Impala will track completed queries in an internal table. > Impala will create and maintain an internal Iceberg table named > "impala_query_log" in the "system database" that contains all completed > queries. This table is automatically created at startup by each coordinator > if it does not exist. Then, each completed query is queued in memory and > flushed to the query history table either at a set interval (user specified > number of minutes) or when a user specified number of completed queries are > queued in memory. Partition this table by the hour of the query end time. > Data in this table must match the corresponding data in the query profile. > Develop automated testing that asserts this requirement is true. > Don't write use, show, and set queries to this table. > Add the following metrics to the "impala-server" metrics group: > * Number of completed queries queued in memory waiting to be written to the > table. > * Number of completed queries successfully written to the table. > * Number of attempts that failed to write completed queries to the table. > * Number of times completed queries were written at the regularly scheduled > time. > * Number of times completed queries were written before the scheduled time > because the max number of queued records was reached. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832957#comment-17832957 ] Fang-Yu Rao commented on IMPALA-11871: -- After reading some past JIRA's in this area, I think it should be safe to skip {*}analyzeWriteAccess{*}() for the *INSERT* statement (or add a startup flag to disable it). Before the fix is ready, we could add the following to the *core-site.xml* consumed by the catalog server to allow an authorized user (by Ranger via Impala's frontend) to insert values into an HDFS table in the {*}legacy catalog mode{*}. Recall that the catalog server would consider the service user, usually named '{*}impala{*}', as a super user as long as the user '{*}impala{*}' belongs to the specified super group by ''. {code:java} dfs.permissions.superusergroup true {code} This is still secure when Ranger is the authorization provider because of the following. # For the INSERT statement, Impala's frontend makes sure the logged-in user (not necessarily the service user '{*}impala{*}') is granted the necessary privilege on the target table. The respective audit log entry is also produced whether or not the query is authorized even though we skip {*}analyzeWriteAccess{*}(). # For a query that has been authorized by Impala's frontend and sent to the backend for execution, if Impala's backend interacts with the underlying services, e.g., HDFS, as the service user '{*}impala{*}', then this service user should always be considered as a super user or a user in a super group. +*Detailed Analysis*+ We started performing such a permissions checking in [IMPALA-1279: Check ACLs for INSERT and LOAD statements|https://github.com/cloudera/Impala/commit/0b32bbd899d988f1cd5c526597932b67f4c35cce] when we were using Sentry as authorization provider. The reason to implement IMPALA-1279 was also mentioned in the description of the JIRA and is excerpted below for easy reference. In short, we would like to fail a query as early as possible if there could be permissions-related issue. {quote}Impala checks permissions for LOAD and INSERT statements before executing them to allow for early-exit if the query would not succeed. However, it does not take extended ACLs in CDH5 into account. When a directory has restrictive Posix permissions (e.g. 000), but has an ACL allowing writes, Impala should allow INSERTs and LOADs to happen to that directory. Instead, the early check will disallow them. If the checks were disabled, the queries would execute (or not!) correctly, because we delegate to libhdfs or the DistributedFileSystem API to actually perform the operations we need. {quote} We hand-crafted the permissions checker within Impala. Specifically, in our [implementation|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java#L206-L222], Hadoop ACL entries takes precedence over the POSIX permissions and we did *not* take into consideration the policies that could be defined on the HDFS path when the authorization provider is Ranger. Due to how we implemented [FsPermissionChecker|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java], it's possible that even though a logged-in user has been authorized to execute an INSERT statement into a table via a policy added to Ranger's repository of SQL, the query could fail during the analysis, simply because the service user, usually named '{*}impala{*}', could not pass the permissions checker. For instance, this could occur if the table to insert was created by another query engine, e.g., Hive Server2 (HS2) and thus the table is owned by another service user, e.g., '{*}hive{*}'. In addition, we have an ACL entry of "{*}group::r-x{*}" by default when the table was created. The current implementation of Impala's permissions checker would deny the service user '{*}impala{*}' of writing the table even the user '{*}impala{*}' is in the group of '{*}hive{*}' as shown in the following. {code:java} [r...@ccycloud-4.engesc24485d02.root.comops.site ~]# hdfs dfs -getfacl # file: # owner: hive # group: hive user::rwx group::r-x other::r-x [r...@ccycloud-4.engesc24485d02.root.comops.site impalad]# groups impala impala : impala hive {code} In [IMPALA-3143|https://github.com/apache/impala/commit/a0ad1868bda902fd914bc2be39eb9629a6eceb76], we allowed an administrator to specify the name of the super group (from catalog server's perspective). Once the *current user* belongs to the specified super group denoted via '{*}DFS_PERMISSIONS_SUPERUSERGROUP_KEY{*}' ("{*}dfs.permissions.superusergroup{*}"), which defaulted to '{*}DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT{*}' ("{*}supergroup{*}"), then catalog server would grant the WRITE request against the corresponding table from the current user. Refer t
[jira] [Created] (IMPALA-12965) Add debug query option to skip runtime filter
Riza Suminto created IMPALA-12965: - Summary: Add debug query option to skip runtime filter Key: IMPALA-12965 URL: https://issues.apache.org/jira/browse/IMPALA-12965 Project: IMPALA Issue Type: New Feature Components: Frontend Reporter: Riza Suminto Assignee: Riza Suminto Runtime filter still have negative effect on certain scenario such as long wait time that delays scan and cascading runtime filter chain that prevents parallel execution of fragments. Having debug query option to simply skip a runtime filter id from being scheduled can help us investigate and test a solution like IMPALA-12357 early before implementing the improvement code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work stopped] (IMPALA-12583) Support reading hive "information_schema" views in Impala
[ https://issues.apache.org/jira/browse/IMPALA-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12583 stopped by Wenzhe Zhou. > Support reading hive "information_schema" views in Impala > - > > Key: IMPALA-12583 > URL: https://issues.apache.org/jira/browse/IMPALA-12583 > Project: IMPALA > Issue Type: Sub-task >Reporter: Manish Maheshwari >Assignee: Wenzhe Zhou >Priority: Major > Attachments: image-2023-11-30-02-24-18-869.png, information_schema.txt > > > Hive supports "information_schema" db that all jdbc tables exposed from the > HMS database. The same jdbc source tables should be queryable in Impala too. > > !image-2023-11-30-02-24-18-869.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12657) Improve ProcessingCost of ScanNode and NonGroupingAggregator
[ https://issues.apache.org/jira/browse/IMPALA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto reassigned IMPALA-12657: - Assignee: David Rorke (was: Riza Suminto) > Improve ProcessingCost of ScanNode and NonGroupingAggregator > > > Key: IMPALA-12657 > URL: https://issues.apache.org/jira/browse/IMPALA-12657 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Riza Suminto >Assignee: David Rorke >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: profile_1f4d7a679a3e12d5_42231157.txt > > > Several benchmark run measuring Impala scan performance indicates some > costing improvement opportunity around ScanNode and NonGroupingAggregator. > [^profile_1f4d7a679a3e12d5_42231157.txt] shows an example of simple > count query. > Key takeaway: > # There is a strong correlation between total materialized bytes (row-size * > cardinality) with total materialized tuple time per fragment. Row > materialization cost should be adjusted to be based on this row-sized instead > of equal cost per scan range. > # NonGroupingAggregator should have much lower cost that GroupingAggregator. > In example above, the cost of NonGroupingAggregator dominates the scan > fragment even though it only does simple counting instead of hash table > operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-12964) Implement aggregation capability
Steve Carlin created IMPALA-12964: - Summary: Implement aggregation capability Key: IMPALA-12964 URL: https://issues.apache.org/jira/browse/IMPALA-12964 Project: IMPALA Issue Type: Sub-task Reporter: Steve Carlin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
[ https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832895#comment-17832895 ] Yida Wu commented on IMPALA-12963: -- Hi [~jasonmfehr], assigning this jira to you because the testcase was added in a recent task IMPALA-12426, and you might be familiar with it. > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds > --- > > Key: IMPALA-12963 > URL: https://issues.apache.org/jira/browse/IMPALA-12963 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Yida Wu >Assignee: Jason Fehr >Priority: Major > > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with > following messages: > *Error Message* > {code:java} > test setup failure > {code} > *Stacktrace* > {code:java} > common/custom_cluster_test_suite.py:226: in teardown_method > impalad.wait_for_exit() > common/impala_cluster.py:471: in wait_for_exit > while self.__get_pid() is not None: > common/impala_cluster.py:414: in __get_pid > assert len(pids) < 2, "Expected single pid but found %s" % ", > ".join(map(str, pids)) > E AssertionError: Expected single pid but found 892, 31942 > {code} > *Standard Error* > {code:java} > -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--enable_workload_mgmt > --query_log_write_interval_s=1 --cluster_id=test_max_select > --shutdown_grace_period_s=10 --shutdown_deadline_s=60 > --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' > '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' > --impalad_args=--default_query_options= > 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 04:21:44 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:21:44 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2 > 04:21:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:50 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:50 MainThread: num_known_live_backends has reached value: 3 > 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:51 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25001 > 04:21:51 MainThread: num_known_live_backends has reached value: 3 > 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:51 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25002 > 04:21:51 MainThread: num_k
[jira] [Created] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
Yida Wu created IMPALA-12963: Summary: Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds Key: IMPALA-12963 URL: https://issues.apache.org/jira/browse/IMPALA-12963 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Yida Wu Assignee: Jason Fehr Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with following messages: *Error Message* {code:java} test setup failure {code} *Stacktrace* {code:java} common/custom_cluster_test_suite.py:226: in teardown_method impalad.wait_for_exit() common/impala_cluster.py:471: in wait_for_exit while self.__get_pid() is not None: common/impala_cluster.py:414: in __get_pid assert len(pids) < 2, "Expected single pid but found %s" % ", ".join(map(str, pids)) E AssertionError: Expected single pid but found 892, 31942 {code} *Standard Error* {code:java} -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests --log_level=1 '--impalad_args=--enable_workload_mgmt --query_log_write_interval_s=1 --cluster_id=test_max_select --shutdown_grace_period_s=10 --shutdown_deadline_s=60 --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' --impalad_args=--default_query_options= 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 04:21:44 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO 04:21:44 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 04:21:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO 04:21:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 04:21:44 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:49 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2 04:21:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:50 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 04:21:50 MainThread: num_known_live_backends has reached value: 3 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:51 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25001 04:21:51 MainThread: num_known_live_backends has reached value: 3 04:21:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 04:21:51 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25002 04:21:51 MainThread: num_known_live_backends has reached value: 3 04:21:52 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors). -- 2024-03-28 04:21:52,490 DEBUGMainThread: Found 3 impalad/1 statestored/1 catalogd process(es) -- 2024-03-28 04:21:52,490 INFO MainThread: Getting metric: statestore.live-backends from impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25010 -- 2024-03-28 04:21:52,492 INFO MainThread: Metric 'statestore.live-backends' has reached desired value: 4 -- 2024-03-28 04:21:52,493 DEBUGMainThread: G
[jira] [Created] (IMPALA-12962) Estimated metadata size of a table doesn't match the actual java object size
Quanlong Huang created IMPALA-12962: --- Summary: Estimated metadata size of a table doesn't match the actual java object size Key: IMPALA-12962 URL: https://issues.apache.org/jira/browse/IMPALA-12962 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Catalogd shows the top-25 largest tables in its WebUI at the "/catalog" endpoint. The estimated metadata size is computed in HdfsTable#getTHdfsTable(): [https://github.com/apache/impala/blob/0d49c9d6cc7fc0903d60a78d8aaa996af0249c06/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2414-L2451] The current formula is * memUsageEstimate = numPartitions * 2KB + numFiles * 500B + numBlocks * 150B + (optional) incrementalStats * (optional) incrementalStats = numPartitions * numColumns * 200B It's ok to use this formula to compare tables. But it can't be used to estimate the max heap size of catalogd. E.g. it doesn't consider the column comments and tblproperties which could have long strings. Column names should also be considered in case the table is a wide table. We can compare the estimated sizes with results from ehcache-sizeof or jamm and update the formula. Or use these libraries to estimate the sizes directly if they won't impact the performance. CC [~MikaelSmith] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org