[jira] [Comment Edited] (IMPALA-12489) Error when scan kudu-1.17.0
[ https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837958#comment-17837958 ] Alexey Serbin edited comment on IMPALA-12489 at 4/17/24 4:49 AM: - [~MadBeeDo], It turned out the root cause was a bug in Kudu. [~achenn...@cloudera.com] and I were able to reproduce it with {{kudu table scan}}, so the issue has been localized. The fix is available in Kudu's [master|https://github.com/apache/kudu/commit/946acb711d722b1e6fe27af2c7de92960d724980] and [1.17.x|https://github.com/apache/kudu/commit/0de168f7e0abcf0c29facefcc9c0c9e12b284140] branches. Follow-up patches are going to introduce test scenario(s) to reproduce the issue with minimum amount of data to catch future regressions, if any. You could find more details in [KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518]. Thank you for reporting the issue! was (Author: aserbin): [~MadBeeDo], It turned out the root cause was a bug in Kudu. [~achenn...@cloudera.com] and I were able to reproduce it with {{kudu table scan}}, so the issue has been localized. The fix is available in Kudu's main and 1.17.x branches. Follow-up patches are going to introduce test scenario(s) to reproduce the issue with minimum amount of data to catch future regressions, if any. You could find more details in [KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518]. Thank you for reporting the issue! > Error when scan kudu-1.17.0 > --- > > Key: IMPALA-12489 > URL: https://issues.apache.org/jira/browse/IMPALA-12489 > Project: IMPALA > Issue Type: Bug > Components: Backend, be >Affects Versions: Impala 4.3.0 > Environment: centos7.9 >Reporter: Pain Sun >Priority: Major > Labels: scankudu > > Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an > empty string in primary key field. > sql: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > > error:ERROR: Unable to open scanner for node with id '1' for Kudu table > 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such > column: shopnick > > If update sql like this: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shopnick not in ('') > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > no error. > > this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good. > > There is 100 items in this table ,28 items where empty string. > > create sql like this : > CREATE TABLE member.qyexternaluserdetailinfo_new ( > mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION > DEFAULT 0, > buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > PRIMARY KEY ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) > ) PARTITION BY HASH ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES ( > 'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134', > 'kudu.num_tablet_replicas' = '1' > ); > table schema like this: > {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}---
[jira] [Commented] (IMPALA-12489) Error when scan kudu-1.17.0
[ https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837958#comment-17837958 ] Alexey Serbin commented on IMPALA-12489: [~MadBeeDo], It turned out the root cause was a bug in Kudu. [~achenn...@cloudera.com] and I were able to reproduce it with {{kudu table scan}}, so the issue has been localized. The fix is available in Kudu's main and 1.17.x branches. Follow-up patches are going to introduce test scenario(s) to reproduce the issue with minimum amount of data to catch future regressions, if any. You could find more details in [KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518]. Thank you for reporting the issue! > Error when scan kudu-1.17.0 > --- > > Key: IMPALA-12489 > URL: https://issues.apache.org/jira/browse/IMPALA-12489 > Project: IMPALA > Issue Type: Bug > Components: Backend, be >Affects Versions: Impala 4.3.0 > Environment: centos7.9 >Reporter: Pain Sun >Priority: Major > Labels: scankudu > > Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an > empty string in primary key field. > sql: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > > error:ERROR: Unable to open scanner for node with id '1' for Kudu table > 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such > column: shopnick > > If update sql like this: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shopnick not in ('') > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > no error. > > this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good. > > There is 100 items in this table ,28 items where empty string. > > create sql like this : > CREATE TABLE member.qyexternaluserdetailinfo_new ( > mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION > DEFAULT 0, > buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > PRIMARY KEY ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) > ) PARTITION BY HASH ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES ( > 'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134', > 'kudu.num_tablet_replicas' = '1' > ); > table schema like this: > {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+ > |name |type > |comment|primary_key|key_unique|nullable|default_value|encoding > |compression |block_size| > {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+ > |mainshopnick |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |shopnick |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |ownercorpid |string | |true |true |false |
[jira] [Resolved] (IMPALA-12489) Error when scan kudu-1.17.0
[ https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved IMPALA-12489. Resolution: Duplicate > Error when scan kudu-1.17.0 > --- > > Key: IMPALA-12489 > URL: https://issues.apache.org/jira/browse/IMPALA-12489 > Project: IMPALA > Issue Type: Bug > Components: Backend, be >Affects Versions: Impala 4.3.0 > Environment: centos7.9 >Reporter: Pain Sun >Priority: Major > Labels: scankudu > > Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an > empty string in primary key field. > sql: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > > error:ERROR: Unable to open scanner for node with id '1' for Kudu table > 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such > column: shopnick > > If update sql like this: > select > count(distinct thirdnick) > from > member.qyexternaluserdetailinfo_new > where > ( > mainshopnick = "xxx" > and ownercorpid in ("xxx", "") > and shopnick not in ('') > and shoptype not in ("35", "56") > and isDelete = 0 > and thirdnick != "" > and thirdnick is not null > ); > no error. > > this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good. > > There is 100 items in this table ,28 items where empty string. > > create sql like this : > CREATE TABLE member.qyexternaluserdetailinfo_new ( > mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION > DEFAULT 0, > buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION > DEFAULT_COMPRESSION, > PRIMARY KEY ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) > ) PARTITION BY HASH ( > mainshopnick, > shopnick, > ownercorpid, > shoptype, > clientid, > thirdnick, > id > ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES ( > 'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134', > 'kudu.num_tablet_replicas' = '1' > ); > table schema like this: > {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+ > |name |type > |comment|primary_key|key_unique|nullable|default_value|encoding > |compression |block_size| > {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+ > |mainshopnick |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |shopnick |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |ownercorpid |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |shoptype |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |clientid |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |thirdnick |string | |true |true |false | > |AUTO_ENCODING|DEFAULT_COMPRESSION|0 | > |id |bigint | |true |true |false | >
[jira] [Commented] (IMPALA-12443) Add catalog timeline for all DDL profiles
[ https://issues.apache.org/jira/browse/IMPALA-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837912#comment-17837912 ] Quanlong Huang commented on IMPALA-12443: - The original commit is commit d7b5819f906c19281fdf9594ecf4616f6c9f92a0 Author: stiga-huang Date: Wed Aug 30 18:47:55 2023 +0800 IMPALA-12443: Add catalog timeline for all DDL profiles This is a follow-up work of IMPALA-12024 where we add the catalog timeline for CreateTable statements. Using the same mechanism, this patch adds catalog timeline for all DDL/DML profiles, including REFRESH and INSERT. The goal is to add timeline markers after each step that could be blocked, e.g. acquiring locks, external RPCs. So we can better debug slow DDLs with the catalog timeline in profiles. Tried to add some constant strings for widely used events, e.g. "Fetched table from Metastore". Didn't do so for events that only occurs once. Most of the catalog methods now have a new argument for tracking the execution timeline. To avoid adding null checks everywhere, for code paths that don't need a catalog profile, e.g. EventProcessor, uses a static noop EventSequence as the argument. We can replace it in future works, e.g. expose execution timeline of a slow processing on an HMS event. This patch also removes some unused overloads of HdfsTable#load() and HdfsTable#reloadPartitionsFromNames(). Example timeline for a REFRESH statement on an unloaded table (IncompleteTable): Catalog Server Operation: 2s300ms - Got catalog version read lock: 26.407us (26.407us) - Start loading table: 314.663us (288.256us) - Got Metastore client: 629.599us (314.936us) - Fetched table from Metastore: 7.248ms (6.618ms) - Loaded table schema: 27.947ms (20.699ms) - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms) - Got access level: 1s514ms (588.314us) - Created partition builders: 2s103ms (588.270ms) - Start loading file metadata: 2s103ms (49.760us) - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms) - Async loaded table: 2s289ms (6.931ms) - Loaded table from scratch: 2s289ms (72.038us) - Got table read lock: 2s289ms (2.289us) - Finished resetMetadata request: 2s300ms (10.188ms) Example timeline for an INSERT statement: Catalog Server Operation: 178.120ms - Got catalog version read lock: 4.238us (4.238us) - Got catalog version write lock and table write lock: 52.768us (48.530us) - Got Metastore client: 15.768ms (15.715ms) - Fired Metastore events: 156.650ms (140.882ms) - Got Metastore client: 163.317ms (6.666ms) - Fetched table from Metastore: 166.561ms (3.244ms) - Start refreshing file metadata: 167.961ms (1.399ms) - Loaded file metadata for 24 partitions: 177.679ms (9.717ms) - Reloaded table metadata: 178.021ms (342.261us) - Finished updateCatalog request: 178.120ms (98.929us) Example timeline for a "COMPUTE STATS tpcds_parquet.store_sales": Catalog Server Operation: 6s737ms - Got catalog version read lock: 19.971us (19.971us) - Got catalog version write lock and table write lock: 50.255us (30.284us) - Got Metastore client: 171.819us (121.564us) - Updated column stats: 25.560ms (25.388ms) - Got Metastore client: 69.298ms (43.738ms) - Altered 500 partitions in Metastore: 1s894ms (1s825ms) - Altered 1000 partitions in Metastore: 3s558ms (1s664ms) - Altered 1500 partitions in Metastore: 5s144ms (1s586ms) - Altered 1824 partitions in Metastore: 6s205ms (1s060ms) - Got Metastore client: 6s205ms (329.481us) - Altered table in Metastore: 6s216ms (11.073ms) - Got Metastore client: 6s216ms (13.377us) - Fetched table from Metastore: 6s219ms (2.419ms) - Loaded table schema: 6s223ms (4.130ms) - Got current Metastore event id 19017: 6s639ms (415.690ms) - Start loading file metadata: 6s639ms (9.591us) - Loaded file metadata for 1824 partitions: 6s729ms (90.196ms) - Reloaded table metadata: 6s735ms (5.865ms) - DDL finished: 6s737ms (2.255ms) Example timeline for a global INVALIDATE METADATA: Catalog Server Operation: 301.618ms - Got catalog version write lock: 9.908ms (9.908ms) - Got Metastore client: 9.922ms (14.013us) - Got database list: 11.396ms (1.473ms) - Loaded functions of default: 44.919ms (33.523ms) - Loaded TableMeta of 82 tables in database default: 47.524ms (2.604ms) - Loaded functions of functional: 50.846ms (3.321ms) - Loaded TableMeta of 101 tables in database functional: 52.580ms (1.734ms) - Loaded functions of functional_avro: 54.861ms (2.281ms)
[jira] [Updated] (IMPALA-12737) Include List of Referenced Columns in Query Log Table
[ https://issues.apache.org/jira/browse/IMPALA-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-12737: --- Issue Type: Improvement (was: Bug) > Include List of Referenced Columns in Query Log Table > - > > Key: IMPALA-12737 > URL: https://issues.apache.org/jira/browse/IMPALA-12737 > Project: IMPALA > Issue Type: Improvement >Reporter: Manish Maheshwari >Assignee: Michael Smith >Priority: Major > Labels: workload-management > > In the Impala query log table where completed queries are stored, add lists > of columns that were referenced in the query. The purpose behind this > functionality is to know which columns are part of > * Select clause > * Where clause > * Join clause > * Aggegrate clause > * Order by clause > There should be a column for each type of clause, so that decisions can be > made based on specific usage or on the union of those clauses. > With this information, we will feed into compute stats command to collect > stats only on the required columns that are using in joins / filters and > aggegrates and not on all the table columns. > The information can be collected as an array of > [db1.table1.column1,db1.table1.column2] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN
[ https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837895#comment-17837895 ] David Rorke commented on IMPALA-8042: - There are specific cases for BETWEEN expressions where we should be able to make a much more accurate selectivity estimate, in particular date columns (and maybe other column types with discrete values) if we know or strongly suspect the values are all unique (or at least very high NDV) and there are few "missing" values. In cases like this we might simply assume that the number of rows selected is the number of possible distinct values in the range. This can be wrong in a couple cases: * Duplicate values. So we should only apply this when we suspect uniqueness or something close to uniqueness (very high NDV relative to the total row count). * Missing values (again we can probably use some NDV-based heuristics to make a good guess about whether most of the possible values are populated). Even with some possible inaccuracy from these factors it's likely we can be much more accurate using this approach under high NDV situations vs just using the current 10 percent selectivity guess. > Better selectivity estimate for BETWEEN > --- > > Key: IMPALA-8042 > URL: https://issues.apache.org/jira/browse/IMPALA-8042 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: Paul Rogers >Priority: Minor > > The analyzer rewrites a BETWEEN expression into a pair of inequalities. > IMPALA-8037 explains that the planner then groups all such non-quality > conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains > that the analyzer should handle inequalities better. > BETWEEN is a special case and informs the final result. If we assume a > selectivity of s for inequality, then BETWEEN should be something like s/2. > The intuition is that if c >= x includes, say, ⅓ of values, and c <= y > includes a third of values, then c BETWEEN x AND y should be a narrower set > of values, say ⅙. > [Ramakrishnan an > Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ > recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the > general expression x <= c AND c <= Y. Note the discrepancy between the > compound inequality case and the BETWEEN case, likely reflecting the > additional information we obtain when the user chooses to use BETWEEN. > To implement a special BETWEEN selectivity in Impala, we must remember the > selectivity of BETWEEN during the rewrite to a compound inequality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException
[ https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-13003: --- Labels: iceberg (was: ) > Server exits early failing to create impala_query_log with > AlreadyExistsException > - > > Key: IMPALA-13003 > URL: https://issues.apache.org/jira/browse/IMPALA-13003 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > Labels: iceberg > > At startup workload management tries to create the query log table here: > {code:java} > // The initialization code only works when run in a separate thread for > reasons unknown. > ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name)); > {code} > This code is exiting: > {code:java} > I0413 23:40:05.183876 21006 client-request-state.cc:1348] > 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making > 'createTable' RPC to Hive Metastore: > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection > 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server > internal-server closed. The connection had 1 associated session(s). > I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: > 27432606d99dcdae:218860164eb206bb > I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: > 27432606d99dcdae:218860164eb206bb, client address: . > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully > unregistered: query_id=1d4878dbc9214c81:6dc8cc2e > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > Wrote minidump to > /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp > {code} > with stack > {code:java} > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > *** Check failure stack trace: *** > @ 0x8e96a4d google::LogMessage::Fail() > @ 0x8e98984 google::LogMessage::SendToLog() > @ 0x8e9642c google::LogMessage::Flush() > @ 0x8e98ea9 google::LogMessageFatal::~LogMessageFatal() > @ 0x3da3a9a impala::ImpalaServer::CompletedQueriesThread() > @ 0x3a8df93 boost::_mfi::mf0<>::operator()() > @ 0x3a8de97 boost::_bi::list1<>::operator()<>() > @ 0x3a8dd77 boost::_bi::bind_t<>::operator()() > @ 0x3a8d672 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x301e7d0 boost::function0<>::operator()() > @ 0x43ce415 impala::Thread::SuperviseThread() > @ 0x43e2dc7 boost::_bi::list5<>::operator()<>() > @ 0x43e29e7 boost::_bi::bind_t<>::operator()() > @ 0x43e21c5 boost::detail::thread_data<>::run() > @ 0x7984c37 thread_proxy > @ 0x7f75b6982ea5 start_thread > @ 0x7f75b36a7b0d __clone > Picked up JAVA_TOOL_OPTIONS: > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n > -Dsun.java.command=impalad > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > {code} > I think the key error is > {code} > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > {code} > which suggests that creating the table with "if not exists" is not sufficient > to protect against concurrent creations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException
[ https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Smith updated IMPALA-13003: --- Labels: iceberg workload-management (was: iceberg) > Server exits early failing to create impala_query_log with > AlreadyExistsException > - > > Key: IMPALA-13003 > URL: https://issues.apache.org/jira/browse/IMPALA-13003 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > Labels: iceberg, workload-management > > At startup workload management tries to create the query log table here: > {code:java} > // The initialization code only works when run in a separate thread for > reasons unknown. > ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name)); > {code} > This code is exiting: > {code:java} > I0413 23:40:05.183876 21006 client-request-state.cc:1348] > 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making > 'createTable' RPC to Hive Metastore: > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection > 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server > internal-server closed. The connection had 1 associated session(s). > I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: > 27432606d99dcdae:218860164eb206bb > I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: > 27432606d99dcdae:218860164eb206bb, client address: . > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully > unregistered: query_id=1d4878dbc9214c81:6dc8cc2e > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > Wrote minidump to > /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp > {code} > with stack > {code:java} > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > *** Check failure stack trace: *** > @ 0x8e96a4d google::LogMessage::Fail() > @ 0x8e98984 google::LogMessage::SendToLog() > @ 0x8e9642c google::LogMessage::Flush() > @ 0x8e98ea9 google::LogMessageFatal::~LogMessageFatal() > @ 0x3da3a9a impala::ImpalaServer::CompletedQueriesThread() > @ 0x3a8df93 boost::_mfi::mf0<>::operator()() > @ 0x3a8de97 boost::_bi::list1<>::operator()<>() > @ 0x3a8dd77 boost::_bi::bind_t<>::operator()() > @ 0x3a8d672 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x301e7d0 boost::function0<>::operator()() > @ 0x43ce415 impala::Thread::SuperviseThread() > @ 0x43e2dc7 boost::_bi::list5<>::operator()<>() > @ 0x43e29e7 boost::_bi::bind_t<>::operator()() > @ 0x43e21c5 boost::detail::thread_data<>::run() > @ 0x7984c37 thread_proxy > @ 0x7f75b6982ea5 start_thread > @ 0x7f75b36a7b0d __clone > Picked up JAVA_TOOL_OPTIONS: > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n > -Dsun.java.command=impalad > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > {code} > I think the key error is > {code} > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > {code} > which suggests that creating the table with "if not exists" is not sufficient > to protect against concurrent creations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13008) test_metadata_tables failed in Ubuntu 20 build
Zoltán Borók-Nagy created IMPALA-13008: -- Summary: test_metadata_tables failed in Ubuntu 20 build Key: IMPALA-13008 URL: https://issues.apache.org/jira/browse/IMPALA-13008 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Assignee: Daniel Becker test_metadata_tables failed in an Ubuntu 20 release test build: * https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-dockerised-tests%5E1642%5E.log * https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-from-scratch%5E2363%5E.log h2. Error {noformat} E assert Comparing QueryTestResults (expected vs actual): E 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"1","total-files-size":"351","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' != 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"1","total-files-size":"350","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' E 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"2","total-files-size":"702","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' != 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"2","total-files-size":"700","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' E 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"3","total-files-size":"1053","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' != 'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"3","total-files-size":"1050","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}' E row_regex:'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"[1-9][0-9]*","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"[1-9][0-9]*","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}' == 'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"1551","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"2601","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}' {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13007) [DOCS] Add description for setting capacity for spilling to s3
Yida Wu created IMPALA-13007: Summary: [DOCS] Add description for setting capacity for spilling to s3 Key: IMPALA-13007 URL: https://issues.apache.org/jira/browse/IMPALA-13007 Project: IMPALA Issue Type: Improvement Components: Docs Reporter: Yida Wu Assignee: Yida Wu The current document seems not mentioning the capacity setting in configuration for spilling to s3, user could easily meet some space usage issue without setting proper capacity. https://docs.cloudera.com/cdw-runtime/cloud/impala-reference/topics/impala_spill_s3.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount
[ https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-12679 started by Kurt Deschler. -- > test_rows_sent_counters failed to match RPCCount > > > Key: IMPALA-12679 > URL: https://issues.apache.org/jira/browse/IMPALA-12679 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Kurt Deschler >Priority: Major > > {code} > query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | > exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > {code} > failed with > {code} > query_test/test_fetch.py:69: in test_rows_sent_counters > assert re.search("RPCCount: [5-9]", runtime_profile) > E assert None > E+ where None = ('RPCCount: [5-9]', > 'Query (id=c8476e5c065757bf:b4367698):\n DEBUG MODE WARNING: Query > profile created while running a DEBUG buil...: 0.000ns\n - > WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - > WriteIoWaitTime: 0.000ns\n') > E+where = re.search > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount
[ https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837805#comment-17837805 ] Kurt Deschler commented on IMPALA-12679: Updated assert to print Actual RPC Count https://gerrit.cloudera.org/#/c/21310/ > test_rows_sent_counters failed to match RPCCount > > > Key: IMPALA-12679 > URL: https://issues.apache.org/jira/browse/IMPALA-12679 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Kurt Deschler >Priority: Major > > {code} > query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | > exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > {code} > failed with > {code} > query_test/test_fetch.py:69: in test_rows_sent_counters > assert re.search("RPCCount: [5-9]", runtime_profile) > E assert None > E+ where None = ('RPCCount: [5-9]', > 'Query (id=c8476e5c065757bf:b4367698):\n DEBUG MODE WARNING: Query > profile created while running a DEBUG buil...: 0.000ns\n - > WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - > WriteIoWaitTime: 0.000ns\n') > E+where = re.search > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12313) Add support for UPDATE statements for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-12313: --- Issue Type: New Feature (was: Bug) > Add support for UPDATE statements for Iceberg tables > > > Key: IMPALA-12313 > URL: https://issues.apache.org/jira/browse/IMPALA-12313 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.4.0 > > > Add support for UPDATE statements for Iceberg tables. > Initial design doc of DELETEs and UPDATEs: > [https://docs.google.com/document/d/1GuRiJ3jjqkwINsSCKYaWwcfXHzbMrsd3WEMDOB11Xqw/edit#heading=h.5bmfhbmb4qdk] > Limitations: > * only write position delete files -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException
[ https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837796#comment-17837796 ] Michael Smith commented on IMPALA-13003: This looks to be a generic iceberg issue. In this case the AlreadyExistsException comes from https://github.com/apache/iceberg/blob/apache-iceberg-1.4.3/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L201. https://github.com/apache/impala/blob/4.4.0-rc1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4045 handles {{org.apache.hadoop.hive.metastore.api.AlreadyExistsException}}, but not {{org.apache.iceberg.exceptions.AlreadyExistsException}}, so we likely need to add that as well. I should be able to rig up a debug action to reproduce this. > Server exits early failing to create impala_query_log with > AlreadyExistsException > - > > Key: IMPALA-13003 > URL: https://issues.apache.org/jira/browse/IMPALA-13003 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > > At startup workload management tries to create the query log table here: > {code:java} > // The initialization code only works when run in a separate thread for > reasons unknown. > ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name)); > {code} > This code is exiting: > {code:java} > I0413 23:40:05.183876 21006 client-request-state.cc:1348] > 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making > 'createTable' RPC to Hive Metastore: > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection > 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server > internal-server closed. The connection had 1 associated session(s). > I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: > 27432606d99dcdae:218860164eb206bb > I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: > 27432606d99dcdae:218860164eb206bb, client address: . > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully > unregistered: query_id=1d4878dbc9214c81:6dc8cc2e > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > Wrote minidump to > /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp > {code} > with stack > {code:java} > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > *** Check failure stack trace: *** > @ 0x8e96a4d google::LogMessage::Fail() > @ 0x8e98984 google::LogMessage::SendToLog() > @ 0x8e9642c google::LogMessage::Flush() > @ 0x8e98ea9 google::LogMessageFatal::~LogMessageFatal() > @ 0x3da3a9a impala::ImpalaServer::CompletedQueriesThread() > @ 0x3a8df93 boost::_mfi::mf0<>::operator()() > @ 0x3a8de97 boost::_bi::list1<>::operator()<>() > @ 0x3a8dd77 boost::_bi::bind_t<>::operator()() > @ 0x3a8d672 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x301e7d0 boost::function0<>::operator()() > @ 0x43ce415 impala::Thread::SuperviseThread() > @ 0x43e2dc7 boost::_bi::list5<>::operator()<>() > @ 0x43e29e7 boost::_bi::bind_t<>::operator()() > @ 0x43e21c5 boost::detail::thread_data<>::run() > @ 0x7984c37 thread_proxy > @ 0x7f75b6982ea5 start_thread > @ 0x7f75b36a7b0d __clone > Picked up JAVA_TOOL_OPTIONS: > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n > -Dsun.java.command=impalad > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > {code} > I think the key error is > {code} > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > {code} > which suggests that creating the table with "if not exists" is not sufficient > to protect against concurrent creations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.ap
[jira] [Work started] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException
[ https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-13003 started by Michael Smith. -- > Server exits early failing to create impala_query_log with > AlreadyExistsException > - > > Key: IMPALA-13003 > URL: https://issues.apache.org/jira/browse/IMPALA-13003 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.4.0 >Reporter: Andrew Sherman >Assignee: Michael Smith >Priority: Critical > > At startup workload management tries to create the query log table here: > {code:java} > // The initialization code only works when run in a separate thread for > reasons unknown. > ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name)); > {code} > This code is exiting: > {code:java} > I0413 23:40:05.183876 21006 client-request-state.cc:1348] > 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making > 'createTable' RPC to Hive Metastore: > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection > 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server > internal-server closed. The connection had 1 associated session(s). > I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: > 27432606d99dcdae:218860164eb206bb > I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: > 27432606d99dcdae:218860164eb206bb, client address: . > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully > unregistered: query_id=1d4878dbc9214c81:6dc8cc2e > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > Wrote minidump to > /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp > {code} > with stack > {code:java} > F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out > waiting for results > . Impalad exiting. > *** Check failure stack trace: *** > @ 0x8e96a4d google::LogMessage::Fail() > @ 0x8e98984 google::LogMessage::SendToLog() > @ 0x8e9642c google::LogMessage::Flush() > @ 0x8e98ea9 google::LogMessageFatal::~LogMessageFatal() > @ 0x3da3a9a impala::ImpalaServer::CompletedQueriesThread() > @ 0x3a8df93 boost::_mfi::mf0<>::operator()() > @ 0x3a8de97 boost::_bi::list1<>::operator()<>() > @ 0x3a8dd77 boost::_bi::bind_t<>::operator()() > @ 0x3a8d672 > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x301e7d0 boost::function0<>::operator()() > @ 0x43ce415 impala::Thread::SuperviseThread() > @ 0x43e2dc7 boost::_bi::list5<>::operator()<>() > @ 0x43e29e7 boost::_bi::bind_t<>::operator()() > @ 0x43e21c5 boost::detail::thread_data<>::run() > @ 0x7984c37 thread_proxy > @ 0x7f75b6982ea5 start_thread > @ 0x7f75b36a7b0d __clone > Picked up JAVA_TOOL_OPTIONS: > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n > -Dsun.java.command=impalad > Minidump in thread [20955]completed-queries running query > :, fragment instance > : > {code} > I think the key error is > {code} > CAUSED BY: AlreadyExistsException: Table was created concurrently: > sys.impala_query_log > {code} > which suggests that creating the table with "if not exists" is not sufficient > to protect against concurrent creations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12653) Update documentation about the UPDATE statement
[ https://issues.apache.org/jira/browse/IMPALA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-12653. Fix Version/s: Impala 4.4.0 Resolution: Fixed > Update documentation about the UPDATE statement > --- > > Key: IMPALA-12653 > URL: https://issues.apache.org/jira/browse/IMPALA-12653 > Project: IMPALA > Issue Type: Sub-task >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.4.0 > > > Update documentation about the UPDATE statement > Also list the limitations of UPDATE/DELETE -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12894) Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files
[ https://issues.apache.org/jira/browse/IMPALA-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-12894. Fix Version/s: Impala 4.4.0 Resolution: Fixed > Optimized count(*) for Iceberg gives wrong results after a Spark > rewrite_data_files > --- > > Key: IMPALA-12894 > URL: https://issues.apache.org/jira/browse/IMPALA-12894 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Gabor Kaszab >Assignee: Zoltán Borók-Nagy >Priority: Critical > Labels: correctness, impala-iceberg > Fix For: Impala 4.4.0 > > Attachments: count_star_correctness_repro.tar.gz > > > Issue was introduced by https://issues.apache.org/jira/browse/IMPALA-11802 > that implemented an optimized way to get results for count(*). However, if > the table was compacted by Spark this optimization can give incorrect results. > The reason is that Spark can[ skip dropping delete > files|https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files] > that are pointing to compacted data files, as a result there might be delete > files after compaction that are no longer applied to any data files. > Repro: > With Impala > {code:java} > create table default.iceberg_testing (id int, j bigint) STORED AS ICEBERG > TBLPROPERTIES('iceberg.catalog'='hadoop.catalog', > 'iceberg.catalog_location'='/tmp/spark_iceberg_catalog/', > 'iceberg.table_identifier'='iceberg_testing', > 'format-version'='2'); > insert into iceberg_testing values > (1, 1), (2, 4), (3, 9), (4, 16), (5, 25); > update iceberg_testing set j = -100 where id = 4; > delete from iceberg_testing where id = 4;{code} > Count * returns 4 at this point. > Run compaction in Spark: > {code:java} > spark.sql(s"CALL local.system.rewrite_data_files(table => > 'default.iceberg_testing', options => map('min-input-files','2') )").show() > {code} > Now count * in Impala returns 8 (might require an IM if in HadoopCatalog). > Hive returns correct results. Also a SELECT * returns correct results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12903) Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala
[ https://issues.apache.org/jira/browse/IMPALA-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-12903. Fix Version/s: Impala 4.4.0 Resolution: Fixed > Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala > -- > > Key: IMPALA-12903 > URL: https://issues.apache.org/jira/browse/IMPALA-12903 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 4.4.0 > > > Repro: > {noformat} > select file__position from functional.alltypes; => CRASH > select file__position from functional_json.alltypes; => CRASH{noformat} > Stack trace: > {noformat} > ... > #6 > #7 0x031a671c in impala::ScannerContext::Stream::file_desc > (this=0x0) at /home/boroknagyz/Impala/be/src/exec/scanner-context.h:157 > #8 0x03351630 in impala::HdfsJsonScanner::Close (this=0xea22d80, > row_batch=0xed63a20) at > /home/boroknagyz/Impala/be/src/exec/json/hdfs-json-scanner.cc:99 > #9 0x031c3eff in impala::HdfsScanner::Close (this=0xea22d80) at > /home/boroknagyz/Impala/be/src/exec/hdfs-scanner.cc:176 > #10 0x032f057f in impala::HdfsScanNode::ProcessSplit > (this=0x14eb9000, filter_ctxs=..., expr_results_pool=0x7fa54b5cf400, > scan_range=0xf2bb680, scanner_thread_reservation=0x7fa54b5cf328) > at /home/boroknagyz/Impala/be/src/exec/hdfs-scan-node.cc:500 > #11 0x032ef94c in impala::HdfsScanNode::ScannerThread > (this=0x14eb9000, first_thread=true, scanner_thread_reservation=131072) at > /home/boroknagyz/Impala/be/src/exec/hdfs-scan-node.cc:422{noformat} > At frame #7 stream is NULL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11495) Add glibc version and effective locale to the Web UI
[ https://issues.apache.org/jira/browse/IMPALA-11495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837780#comment-17837780 ] ASF subversion and git services commented on IMPALA-11495: -- Commit 0606fc760f21587206cfb4f8256c7cd575050cf2 in impala's branch refs/heads/master from Saurabh Katiyal [ https://gitbox.apache.org/repos/asf?p=impala.git;h=0606fc760 ] IMPALA-11495: Add glibc version and effective locale to the Web UI Added a new section "Other Info" in root page for WebUI, displaying effective locale and glibc version. Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7 Reviewed-on: http://gerrit.cloudera.org:8080/21252 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add glibc version and effective locale to the Web UI > > > Key: IMPALA-11495 > URL: https://issues.apache.org/jira/browse/IMPALA-11495 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Quanlong Huang >Assignee: Saurabh Katiyal >Priority: Major > Labels: newbie, observability, supportability > > When debugging utf8 mode string functions, it's essential to know the > effective Unicode version and locale. The Unicode standard version can be > deduced from the glibc version which can be got by command "ldd --version". > We need to find a programmatic way to get it. > The effective locale is already logged here: > https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/be/src/common/init.cc#L406 > We just need to show it in impalad's Web UI as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
[ https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837779#comment-17837779 ] ASF subversion and git services commented on IMPALA-12963: -- Commit 74ff59b9138f325fd22ce198bd01423abafd3688 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=74ff59b91 ] IMPALA-12963: Return parent PID when children spawned Returns the original PID for a command rather than any children that may be active. This happens during graceful shutdown in UBSAN tests. Also updates 'kill' to use the version of 'get_pid' that logs details to help with debugging. Moves try block in test_query_log.py to after client2 has been initialized. Removes 'drop table' on unique_database, since test suite already handles cleanup. Change-Id: I214e79507c717340863d27f68f6ea54c169e4090 Reviewed-on: http://gerrit.cloudera.org:8080/21278 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds > --- > > Key: IMPALA-12963 > URL: https://issues.apache.org/jira/browse/IMPALA-12963 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Yida Wu >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.4.0 > > > Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with > following messages: > *Error Message* > {code:java} > test setup failure > {code} > *Stacktrace* > {code:java} > common/custom_cluster_test_suite.py:226: in teardown_method > impalad.wait_for_exit() > common/impala_cluster.py:471: in wait_for_exit > while self.__get_pid() is not None: > common/impala_cluster.py:414: in __get_pid > assert len(pids) < 2, "Expected single pid but found %s" % ", > ".join(map(str, pids)) > E AssertionError: Expected single pid but found 892, 31942 > {code} > *Standard Error* > {code:java} > -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--enable_workload_mgmt > --query_log_write_interval_s=1 --cluster_id=test_max_select > --shutdown_grace_period_s=10 --shutdown_deadline_s=60 > --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' > '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' > --impalad_args=--default_query_options= > 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 04:21:44 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:21:44 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:21:44 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:47 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 04:21:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000 > 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2 > 04:21:50 M
[jira] [Commented] (IMPALA-12350) Daemon fails to initialize large catalog
[ https://issues.apache.org/jira/browse/IMPALA-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837712#comment-17837712 ] Quanlong Huang commented on IMPALA-12350: - [~saulius.vl] Thanks for reporting this! There seems to be several issues. The size of a catalog topic message (topic delta) will have a limit (configured by thrift_rpc_max_message_size) after upgrading to thrift-0.16 since Impala-4.2. When transfering the whole catalog topic to a newly added/restarted coordinator, the topic message size could hit the limit. {quote}Interestingly the catalog topic increased significantly after upgrading from 3.4.0 to 4.2.0 - from ~800mb to ~3.4gb. {quote} In 4.2.0, catalogd sends catalog updates in partition level (enable_incremental_metadata_updates=true). In 3.4.0, it sends them in table level. So there are more topic entrics in 4.2.0. On the other hand. the compression rate of catalog objects will be smaller in 4.2.0 since compressing the whole table save more space than compressing partitions individually. We recommend switching from the legacy catalog mode to the local catalog mode so catalog objects sent to the catalog topics will be pretty small, which will solve this issue. To turn on local catalog mode, set use_local_catalog=true on all coordinators and set catalog_topic_mode=minimal on catalogd. Sorry for the late reply. Any feedbacks will be appreciated! > Daemon fails to initialize large catalog > > > Key: IMPALA-12350 > URL: https://issues.apache.org/jira/browse/IMPALA-12350 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.2.0 >Reporter: Saulius Valatka >Priority: Major > > When the statestored catalog topic is large enough (>2gb) daemons fail to > restart and get stuck in a loop: > {{I0808 13:07:17.702653 3633556 Frontend.java:1618] Waiting for local catalog > to be initialized, attempt: 2068}} > > The statestored reports errors as follows: > {{I0808 13:07:05.587296 2134270 thrift-util.cc:196] TSocket::write_partial() > send() : Broken pipe}} > {{I0808 13:07:05.587356 2134270 client-cache.h:362] RPC Error: Client for > gs1-hdp-data70:23000 hit an unexpected exception: write() send(): Broken > pipe, type: N6apache6thrift9transport19TTransportExceptionE, rpc: > N6impala20TUpdateStateResponseE, send: not done}} > {{I0808 13:07:05.587365 2134270 client-cache.cc:174] Broken Connection, > destroy client for gs1-hdp-data70:23000}} > > If this happens we are forced to restart statestore and thus the whole > cluster, meaning that we can't tolerate failure from even a single daemon. > Interestingly the catalog topic increased significantly after upgrading from > 3.4.0 to 4.2.0 - from ~800mb to ~3.4gb. Invalidate/refresh operations also > became significantly slower (~10ms -> 5s). > Probably related to thrift_rpc_max_message_size? but I see the maximum value > is 2gb. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet
[ https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noemi Pap-Takacs reassigned IMPALA-13006: - Assignee: Noemi Pap-Takacs (was: Daniel Becker) > Some Iceberg test tables are not restricted to Parquet > -- > > Key: IMPALA-13006 > URL: https://issues.apache.org/jira/browse/IMPALA-13006 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Noemi Pap-Takacs >Priority: Major > Labels: impala-iceberg > > Our Iceberg test tables/views are restricted to the Parquet file format in > functional/schema_constraints.csv except for the following two: > {code:java} > iceberg_query_metadata > iceberg_view{code} > This is not intentional, so we should add the constraint for these tables too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet
Daniel Becker created IMPALA-13006: -- Summary: Some Iceberg test tables are not restricted to Parquet Key: IMPALA-13006 URL: https://issues.apache.org/jira/browse/IMPALA-13006 Project: IMPALA Issue Type: Bug Reporter: Daniel Becker Assignee: Daniel Becker Our Iceberg test tables/views are restricted to the Parquet file format in functional/schema_constraints.csv except for the following two: {code:java} iceberg_query_metadata iceberg_view{code} This is not intentional, so we should add the constraint for these tables too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12997) test_query_log tests get stuck trying to write to the log
[ https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837620#comment-17837620 ] Zoltán Borók-Nagy commented on IMPALA-12997: Under the hood Iceberg uses HMS locks for its transactions (if the table is stored in the HiveCatalog): * [https://github.com/apache/iceberg/blob/fc5b2b336c774b0b8b032f7d87a1fb21e76b3f20/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L182] * [https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java] These operations transactions should be very fast as usually they just set the table property 'metadata_location' (and 'previous_metadata_location'). Normally the lock should be cleaned up at the end of the operation. If the process dies before it could free the locks then HMS will free them up after some time (due to lack of heartbeating). The timeout should be 5 mins by default: [https://github.com/apache/hive/blob/f06cc2920424817da6405e0efe268ce6cd64a363/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1642] But I also saw cases when it took much more time than that. > test_query_log tests get stuck trying to write to the log > - > > Key: IMPALA-12997 > URL: https://issues.apache.org/jira/browse/IMPALA-12997 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > In some test runs, most tests under test_query_log will start to fail on > various conditions like > {code} > custom_cluster/test_query_log.py:452: in > test_query_log_table_query_select_mt_dop > "impala-server.completed-queries.written", 1, 60) > common/impala_service.py:144: in wait_for_metric_value > self.__metric_timeout_assert(metric_name, expected_value, timeout) > common/impala_service.py:213: in __metric_timeout_assert > assert 0, assert_string > E AssertionError: Metric impala-server.completed-queries.written did not > reach value 1 in 60s. > E Dumping debug webpages in JSON format... > E Dumped memz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/memz.json > E Dumped metrics JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/metrics.json > E Dumped queries JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/queries.json > E Dumped sessions JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/sessions.json > E Dumped threadz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/threadz.json > E Dumped rpcz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/rpcz.json > E Dumping minidumps for impalads/catalogds... > E Dumped minidump for Impalad PID 3680802 > E Dumped minidump for Impalad PID 3680805 > E Dumped minidump for Impalad PID 3680809 > E Dumped minidump for Catalogd PID 3680732 > {code} > or > {code} > custom_cluster/test_query_log.py:921: in test_query_log_ignored_sqls > assert len(sql_results.data) == 1, "query not found in completed queries > table" > E AssertionError: query not found in completed queries table > E assert 0 == 1 > E+ where 0 = len([]) > E+where [] = object at 0xa00cc350>.data > {code} > One symptom that seems related to this is INSERT operations into > sys.impala_query_log that start "UnregisterQuery()" but never finish (with > "Query successfully unregistered"). > We can identify cases like that with > {code} > for log in $(ag -l 'INSERT INTO sys.impala_query_log' impalad.*); do echo > $log; for qid in $(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO > sys.impala_query_log' $log | cut -d']' -f1); do if ! ag "Query successfully > unregistered: query_id=$qid" $log; then echo "$qid not unregistered"; fi; > done; done > {code} > A similar case may occur with creating the table too > {code} > for log in $(ag -l 'CREATE TABLE IF NOT EXISTS sys.impala_query_log' > impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-*); > do QID=$(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO > sys.impala_query_log' $log | cut -d']' -f1); echo $log; ag "Query > successfully unregistered: query_id=$QID" $log; done > {code} > although these frequently fail because the test completes and shuts down > Impala before the CREATE TABLE query completes. > Tracking one of those cases led to catalogd errors that repeated for 1m27s > before the test suite restarted catalogd: > {code} > W0410 12:48:05.051760 3681790 Tasks.java:456] > 6647229faf7637d5:3ec7565b] Retrying task after failure:
[jira] [Updated] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package
[ https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12979: Affects Version/s: Impala 3.4.2 > Wildcard in CLASSPATH might not work in the RPM package > --- > > Key: IMPALA-12979 > URL: https://issues.apache.org/jira/browse/IMPALA-12979 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.2 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 3.4.2 > > > I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS > 7.9 and found launching catalogd failed by the following error (in > catalogd.INFO): > {noformat} > Wrote minidump to > /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build > 1.8.0_141-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [catalogd+0x27af14c] > llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, > llvm::ScalarEvolution&) const+0x73c > # > # Core dump written. Default location: /opt/impala/core or core.156082 > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid156082.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # {noformat} > There are other logs in catalogd.ERROR > {noformat} > Log file created at: 2024/04/08 04:49:28 > Running on machine: ccycloud-1.quanlong.root.comops.site > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this > file. > Wrote minidump to > /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp > could not find method getRootCauseMessage from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > could not find method getStackTrace from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > FileSystem: loadFileSystems failed error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError){noformat} > Resolving the minidump shows me the following stacktrace: > {noformat} > (gdb) bt > #0 0x02baf14c in ?? () > #1 0x02baee24 in getJNIEnv () > #2 0x02bacb71 in hdfsBuilderConnect () > #3 0x012e6ae2 in impala::JniUtil::InitLibhdfs() () > #4 0x012e7897 in impala::JniUtil::Init() () > #5 0x00be9297 in impala::InitCommonRuntime(int, char**, bool, > impala::TestInfo::Mode) () > #6 0x00bb604a in CatalogdMain(int, char**) () > #7 0x00b33f97 in main (){noformat} > It indicates something wrong in initializing the JVM. Here are the env vars: > {noformat} > Environment Variables: > JAVA_HOME=/usr/java/jdk1.8.0_141 > CLASSPATH=/opt/impala/conf:/opt/impala/jar/* > PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin > LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64 > SHELL=/bin/bash{noformat} > We use wildcard "*" in the classpath which seems to be the cause. The issue > was resolved after using explicit paths in the classpath. Here are what I > changed in bin/impala-env.sh: > {code:bash} > #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*" > CLASSPATH=/opt/impala/conf > for jar in /opt/impala/jar/*.jar; do > CLASSPATH="$CLASSPATH:$jar" > done > export CLASSPATH > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8778) Support read Apache Hudi Read Optimized tables
[ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-8778. Fix Version/s: Impala 3.4.0 Resolution: Implemented > Support read Apache Hudi Read Optimized tables > -- > > Key: IMPALA-8778 > URL: https://issues.apache.org/jira/browse/IMPALA-8778 > Project: IMPALA > Issue Type: New Feature >Reporter: Yuanbin Cheng >Assignee: Yanjia Gary Li >Priority: Major > Fix For: Impala 3.4.0 > > > Apache Impala currently not support Apache Hudi, cannot even pull metadata > from Hive. > Related issue: > [https://github.com/apache/incubator-hudi/issues/179] > [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-8778) Support read Apache Hudi Read Optimized tables
[ https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reopened IMPALA-8778: > Support read Apache Hudi Read Optimized tables > -- > > Key: IMPALA-8778 > URL: https://issues.apache.org/jira/browse/IMPALA-8778 > Project: IMPALA > Issue Type: New Feature >Reporter: Yuanbin Cheng >Assignee: Yanjia Gary Li >Priority: Major > > Apache Impala currently not support Apache Hudi, cannot even pull metadata > from Hive. > Related issue: > [https://github.com/apache/incubator-hudi/issues/179] > [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12997) test_query_log tests get stuck trying to write to the log
[ https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837540#comment-17837540 ] Quanlong Huang commented on IMPALA-12997: - It happens in committing the iceberg transaction. At the first glance, I thought it's due to too many concurrent INSERTs into this sys.impala_query_log table. However, while looking into the logs, they all happened in custom-cluster tests. So it's not a concurrent issue. I found it happened in consecutive custom-cluster tests and then recovered at some point after 1h. E.g. in one of the occurence, the log of WaitingForLockException occurs in the following catalogd.INFO files: {noformat} catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-124753.3680732 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-124935.3682321 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125217.3683855 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125529.3685531 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125838.3687542 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130107.3689427 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130450.3691640 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130759.3693690 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131108.3695684 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131417.3697665 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131726.3699674 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132038.3701683 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132346.3703644 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132555.3705491 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132804.3707310 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133113.3709349 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133425.3711438 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133738.3713509 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-134047.3715490 catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-134359.3717486{noformat} They are consecutive custom-cluster tests (based on the timestamps in the filename). All other custom-cluster tests before or after them are fine. [~boroknagyz] Is it possible that an Iceberg table is in a locked stage and got recovered after a timeout of 1h? > test_query_log tests get stuck trying to write to the log > - > > Key: IMPALA-12997 > URL: https://issues.apache.org/jira/browse/IMPALA-12997 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > In some test runs, most tests under test_query_log will start to fail on > various conditions like > {code} > custom_cluster/test_query_log.py:452: in > test_query_log_table_query_select_mt_dop > "impala-server.completed-queries.written", 1, 60) > common/impala_service.py:144: in wait_for_metric_value > self.__metric_timeout_assert(metric_name, expected_value, timeout) > common/impala_service.py:213: in __metric_timeout_assert > assert 0, assert_string > E AssertionError: Metric impala-server.completed-queries.written did not > reach value 1 in 60s. > E Dumping debug webpages in JSON format... > E Dumped memz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/memz.json > E Dumped metrics JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/metrics.json > E Dumped queries JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/queries.json > E Dumped sessions JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/sessions.json > E Dumped threadz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/threadz.json > E Dumped rpcz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/rpcz.json > E Dumping minidumps for