[jira] [Comment Edited] (IMPALA-12489) Error when scan kudu-1.17.0

2024-04-16 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837958#comment-17837958
 ] 

Alexey Serbin edited comment on IMPALA-12489 at 4/17/24 4:49 AM:
-

[~MadBeeDo],

It turned out the root cause was a bug in Kudu.  [~achenn...@cloudera.com] and 
I were able to reproduce it with {{kudu table scan}}, so the issue has been 
localized.

The fix is available in Kudu's 
[master|https://github.com/apache/kudu/commit/946acb711d722b1e6fe27af2c7de92960d724980]
 and 
[1.17.x|https://github.com/apache/kudu/commit/0de168f7e0abcf0c29facefcc9c0c9e12b284140]
 branches.  Follow-up patches are going to introduce test scenario(s) to 
reproduce the issue with minimum amount of data to catch future regressions, if 
any.

You could find more details in 
[KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518].

Thank you for reporting the issue!


was (Author: aserbin):
[~MadBeeDo],

It turned out the root cause was a bug in Kudu.  [~achenn...@cloudera.com] and 
I were able to reproduce it with {{kudu table scan}}, so the issue has been 
localized.

The fix is available in Kudu's main and 1.17.x branches.  Follow-up patches are 
going to introduce test scenario(s) to reproduce the issue with minimum amount 
of data to catch future regressions, if any.

You could find more details in 
[KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518].

Thank you for reporting the issue!

> Error when scan kudu-1.17.0
> ---
>
> Key: IMPALA-12489
> URL: https://issues.apache.org/jira/browse/IMPALA-12489
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, be
>Affects Versions: Impala 4.3.0
> Environment: centos7.9
>Reporter: Pain Sun
>Priority: Major
>  Labels: scankudu
>
> Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an 
> empty string in primary key field.
> sql:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
>  
> error:ERROR: Unable to open scanner for node with id '1' for Kudu table 
> 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such 
> column: shopnick
>  
> If update sql like this:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shopnick not in ('')
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
> no error.
>  
> this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good.
>  
> There is 100 items in this table ,28 items where empty string.
>  
> create sql like this :
> CREATE TABLE member.qyexternaluserdetailinfo_new (
>   mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION 
> DEFAULT 0,
>   buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   PRIMARY KEY (
>     mainshopnick,
>     shopnick,
>     ownercorpid,
>     shoptype,
>     clientid,
>     thirdnick,
>     id
>   )
> ) PARTITION BY HASH (
>   mainshopnick,
>   shopnick,
>   ownercorpid,
>   shoptype,
>   clientid,
>   thirdnick,
>   id
> ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES (
>   'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134',
>   'kudu.num_tablet_replicas' = '1'
> );
> table schema like this:
> {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}---

[jira] [Commented] (IMPALA-12489) Error when scan kudu-1.17.0

2024-04-16 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837958#comment-17837958
 ] 

Alexey Serbin commented on IMPALA-12489:


[~MadBeeDo],

It turned out the root cause was a bug in Kudu.  [~achenn...@cloudera.com] and 
I were able to reproduce it with {{kudu table scan}}, so the issue has been 
localized.

The fix is available in Kudu's main and 1.17.x branches.  Follow-up patches are 
going to introduce test scenario(s) to reproduce the issue with minimum amount 
of data to catch future regressions, if any.

You could find more details in 
[KUDU-3518|https://issues.apache.org/jira/browse/KUDU-3518].

Thank you for reporting the issue!

> Error when scan kudu-1.17.0
> ---
>
> Key: IMPALA-12489
> URL: https://issues.apache.org/jira/browse/IMPALA-12489
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, be
>Affects Versions: Impala 4.3.0
> Environment: centos7.9
>Reporter: Pain Sun
>Priority: Major
>  Labels: scankudu
>
> Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an 
> empty string in primary key field.
> sql:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
>  
> error:ERROR: Unable to open scanner for node with id '1' for Kudu table 
> 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such 
> column: shopnick
>  
> If update sql like this:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shopnick not in ('')
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
> no error.
>  
> this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good.
>  
> There is 100 items in this table ,28 items where empty string.
>  
> create sql like this :
> CREATE TABLE member.qyexternaluserdetailinfo_new (
>   mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION 
> DEFAULT 0,
>   buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   PRIMARY KEY (
>     mainshopnick,
>     shopnick,
>     ownercorpid,
>     shoptype,
>     clientid,
>     thirdnick,
>     id
>   )
> ) PARTITION BY HASH (
>   mainshopnick,
>   shopnick,
>   ownercorpid,
>   shoptype,
>   clientid,
>   thirdnick,
>   id
> ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES (
>   'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134',
>   'kudu.num_tablet_replicas' = '1'
> );
> table schema like this:
> {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+
> |name          |type      
> |comment|primary_key|key_unique|nullable|default_value|encoding      
> |compression        |block_size|
> {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+
> |mainshopnick  |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |shopnick      |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |ownercorpid    |string    |       |true        |true      |false    |        

[jira] [Resolved] (IMPALA-12489) Error when scan kudu-1.17.0

2024-04-16 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin resolved IMPALA-12489.

Resolution: Duplicate

> Error when scan kudu-1.17.0
> ---
>
> Key: IMPALA-12489
> URL: https://issues.apache.org/jira/browse/IMPALA-12489
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, be
>Affects Versions: Impala 4.3.0
> Environment: centos7.9
>Reporter: Pain Sun
>Priority: Major
>  Labels: scankudu
>
> Scan kudu with impala-4.3.0 ,there is a bug when reading a table with an 
> empty string in primary key field.
> sql:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
>  
> error:ERROR: Unable to open scanner for node with id '1' for Kudu table 
> 'impala::member.qyexternaluserdetailinfo_new': Invalid argument: No such 
> column: shopnick
>  
> If update sql like this:
> select
>     count(distinct thirdnick)
> from
>     member.qyexternaluserdetailinfo_new
> where
>     (
>         mainshopnick = "xxx"
>         and ownercorpid in ("xxx", "")
>         and shopnick not in ('')
>         and shoptype not in ("35", "56")
>         and isDelete = 0
>         and thirdnick != ""
>         and thirdnick is not null
>     );
> no error.
>  
> this error appears in kudu-1.17.0 ,but kudu-1.16.0 is good.
>  
> There is 100 items in this table ,28 items where empty string.
>  
> create sql like this :
> CREATE TABLE member.qyexternaluserdetailinfo_new (
>   mainshopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shopnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   ownercorpid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   shoptype STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   clientid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdnick STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   receivermobile STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   thirdrealname STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   remark STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>   createtime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   updatetime TIMESTAMP NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   isdelete INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION 
> DEFAULT 0,
>   buyernick STRING NULL ENCODING AUTO_ENCODING COMPRESSION 
> DEFAULT_COMPRESSION,
>   PRIMARY KEY (
>     mainshopnick,
>     shopnick,
>     ownercorpid,
>     shoptype,
>     clientid,
>     thirdnick,
>     id
>   )
> ) PARTITION BY HASH (
>   mainshopnick,
>   shopnick,
>   ownercorpid,
>   shoptype,
>   clientid,
>   thirdnick,
>   id
> ) PARTITIONS 10 STORED AS KUDU TBLPROPERTIES (
>   'kudu.master_addresses' = '192.168.134.132,192.168.134.133,192.168.134.134',
>   'kudu.num_tablet_replicas' = '1'
> );
> table schema like this:
> {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+
> |name          |type      
> |comment|primary_key|key_unique|nullable|default_value|encoding      
> |compression        |block_size|
> {+}---{-}{-}{+}-{-}++{-}---{-}{-}---{-}++{-}--{-}{-}{-}++{-}-{-}{-}-{-}++{-}---{-}{-}---+
> |mainshopnick  |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |shopnick      |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |ownercorpid    |string    |       |true        |true      |false    |        
>      |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |shoptype      |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |clientid      |string    |       |true        |true      |false    |         
>     |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |thirdnick      |string    |       |true        |true      |false    |        
>      |AUTO_ENCODING|DEFAULT_COMPRESSION|0          |
> |id            |bigint    |       |true        |true      |false    |         
>     

[jira] [Commented] (IMPALA-12443) Add catalog timeline for all DDL profiles

2024-04-16 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837912#comment-17837912
 ] 

Quanlong Huang commented on IMPALA-12443:
-

The original commit is

commit d7b5819f906c19281fdf9594ecf4616f6c9f92a0
Author: stiga-huang 
Date:   Wed Aug 30 18:47:55 2023 +0800

IMPALA-12443: Add catalog timeline for all DDL profiles

This is a follow-up work of IMPALA-12024 where we add the catalog
timeline for CreateTable statements. Using the same mechanism, this
patch adds catalog timeline for all DDL/DML profiles, including
REFRESH and INSERT.

The goal is to add timeline markers after each step that could be
blocked, e.g. acquiring locks, external RPCs. So we can better debug
slow DDLs with the catalog timeline in profiles.

Tried to add some constant strings for widely used events, e.g. "Fetched
table from Metastore". Didn't do so for events that only occurs once.

Most of the catalog methods now have a new argument for tracking the
execution timeline. To avoid adding null checks everywhere, for code
paths that don't need a catalog profile, e.g. EventProcessor, uses a
static noop EventSequence as the argument. We can replace it in future
works, e.g. expose execution timeline of a slow processing on an HMS
event.

This patch also removes some unused overloads of HdfsTable#load() and
HdfsTable#reloadPartitionsFromNames().

Example timeline for a REFRESH statement on an unloaded table
(IncompleteTable):
Catalog Server Operation: 2s300ms
   - Got catalog version read lock: 26.407us (26.407us)
   - Start loading table: 314.663us (288.256us)
   - Got Metastore client: 629.599us (314.936us)
   - Fetched table from Metastore: 7.248ms (6.618ms)
   - Loaded table schema: 27.947ms (20.699ms)
   - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms)
   - Got access level: 1s514ms (588.314us)
   - Created partition builders: 2s103ms (588.270ms)
   - Start loading file metadata: 2s103ms (49.760us)
   - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms)
   - Async loaded table: 2s289ms (6.931ms)
   - Loaded table from scratch: 2s289ms (72.038us)
   - Got table read lock: 2s289ms (2.289us)
   - Finished resetMetadata request: 2s300ms (10.188ms)

Example timeline for an INSERT statement:
Catalog Server Operation: 178.120ms
   - Got catalog version read lock: 4.238us (4.238us)
   - Got catalog version write lock and table write lock: 52.768us 
(48.530us)
   - Got Metastore client: 15.768ms (15.715ms)
   - Fired Metastore events: 156.650ms (140.882ms)
   - Got Metastore client: 163.317ms (6.666ms)
   - Fetched table from Metastore: 166.561ms (3.244ms)
   - Start refreshing file metadata: 167.961ms (1.399ms)
   - Loaded file metadata for 24 partitions: 177.679ms (9.717ms)
   - Reloaded table metadata: 178.021ms (342.261us)
   - Finished updateCatalog request: 178.120ms (98.929us)

Example timeline for a "COMPUTE STATS tpcds_parquet.store_sales":
Catalog Server Operation: 6s737ms
   - Got catalog version read lock: 19.971us (19.971us)
   - Got catalog version write lock and table write lock: 50.255us 
(30.284us)
   - Got Metastore client: 171.819us (121.564us)
   - Updated column stats: 25.560ms (25.388ms)
   - Got Metastore client: 69.298ms (43.738ms)
   - Altered 500 partitions in Metastore: 1s894ms (1s825ms)
   - Altered 1000 partitions in Metastore: 3s558ms (1s664ms)
   - Altered 1500 partitions in Metastore: 5s144ms (1s586ms)
   - Altered 1824 partitions in Metastore: 6s205ms (1s060ms)
   - Got Metastore client: 6s205ms (329.481us)
   - Altered table in Metastore: 6s216ms (11.073ms)
   - Got Metastore client: 6s216ms (13.377us)
   - Fetched table from Metastore: 6s219ms (2.419ms)
   - Loaded table schema: 6s223ms (4.130ms)
   - Got current Metastore event id 19017: 6s639ms (415.690ms)
   - Start loading file metadata: 6s639ms (9.591us)
   - Loaded file metadata for 1824 partitions: 6s729ms (90.196ms)
   - Reloaded table metadata: 6s735ms (5.865ms)
   - DDL finished: 6s737ms (2.255ms)

Example timeline for a global INVALIDATE METADATA:
Catalog Server Operation: 301.618ms
   - Got catalog version write lock: 9.908ms (9.908ms)
   - Got Metastore client: 9.922ms (14.013us)
   - Got database list: 11.396ms (1.473ms)
   - Loaded functions of default: 44.919ms (33.523ms)
   - Loaded TableMeta of 82 tables in database default: 47.524ms (2.604ms)
   - Loaded functions of functional: 50.846ms (3.321ms)
   - Loaded TableMeta of 101 tables in database functional: 52.580ms 
(1.734ms)
   - Loaded functions of functional_avro: 54.861ms (2.281ms)

[jira] [Updated] (IMPALA-12737) Include List of Referenced Columns in Query Log Table

2024-04-16 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12737:
---
Issue Type: Improvement  (was: Bug)

> Include List of Referenced Columns in Query Log Table
> -
>
> Key: IMPALA-12737
> URL: https://issues.apache.org/jira/browse/IMPALA-12737
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Manish Maheshwari
>Assignee: Michael Smith
>Priority: Major
>  Labels: workload-management
>
> In the Impala query log table where completed queries are stored, add lists 
> of columns that were referenced in the query. The purpose behind this 
> functionality is to know which columns are part of 
>  * Select clause
>  * Where clause
>  * Join clause
>  * Aggegrate clause
>  * Order by clause
> There should be a column for each type of clause, so that decisions can be 
> made based on specific usage or on the union of those clauses.
> With this information, we will feed into compute stats command to collect 
> stats only on the required columns that are using in joins / filters and 
> aggegrates and not on all the table columns.
> The information can be collected as an array of 
> [db1.table1.column1,db1.table1.column2]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN

2024-04-16 Thread David Rorke (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837895#comment-17837895
 ] 

David Rorke commented on IMPALA-8042:
-

There are specific cases for BETWEEN expressions where we should be able to 
make a much more accurate selectivity estimate, in particular date columns (and 
maybe other column types with discrete values) if we know or strongly suspect 
the values are all unique (or at least very high NDV) and there are few 
"missing" values. In cases like this we might simply assume that the number of 
rows selected is the number of possible distinct values in the range.  This can 
be wrong in a couple cases:
 * Duplicate values. So we should only apply this when we suspect uniqueness or 
something close to uniqueness (very high NDV relative to the total row count).
 * Missing values (again we can probably use some NDV-based heuristics to make 
a good guess about whether most of the possible values are populated).

Even with some possible inaccuracy from these factors it's likely we can be 
much more accurate using this approach under high NDV situations vs just using 
the current 10 percent selectivity guess.

> Better selectivity estimate for BETWEEN
> ---
>
> Key: IMPALA-8042
> URL: https://issues.apache.org/jira/browse/IMPALA-8042
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The analyzer rewrites a BETWEEN expression into a pair of inequalities.  
> IMPALA-8037 explains that the planner then groups all such non-quality 
> conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains 
> that the analyzer should handle inequalities better.
> BETWEEN is a special case and informs the final result. If we assume a 
> selectivity of s for inequality, then BETWEEN should be something like s/2. 
> The intuition is that if c >= x includes, say, ⅓ of values, and c <= y 
> includes a third of values, then c BETWEEN x AND y should be a narrower set 
> of values, say ⅙.
> [Ramakrishnan an 
> Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
>  recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the 
> general expression x <= c AND c <= Y. Note the discrepancy between the 
> compound inequality case and the BETWEEN case, likely reflecting the 
> additional information we obtain when the user chooses to use BETWEEN.
> To implement a special BETWEEN selectivity in Impala, we must remember the 
> selectivity of BETWEEN during the rewrite to a compound inequality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-16 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13003:
---
Labels: iceberg  (was: )

> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>  Labels: iceberg
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> {code}
> which suggests that creating the table with "if not exists" is not sufficient 
> to protect against concurrent creations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-16 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13003:
---
Labels: iceberg workload-management  (was: iceberg)

> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>  Labels: iceberg, workload-management
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> {code}
> which suggests that creating the table with "if not exists" is not sufficient 
> to protect against concurrent creations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13008) test_metadata_tables failed in Ubuntu 20 build

2024-04-16 Thread Jira
Zoltán Borók-Nagy created IMPALA-13008:
--

 Summary: test_metadata_tables failed in Ubuntu 20 build
 Key: IMPALA-13008
 URL: https://issues.apache.org/jira/browse/IMPALA-13008
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy
Assignee: Daniel Becker


test_metadata_tables failed in an Ubuntu 20 release test build:
* 
https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-dockerised-tests%5E1642%5E.log
* 
https://jenkins.impala.io/job/parallel-all-tests-ub2004/1059/artifact/https_%5E%5Ejenkins.impala.io%5Ejob%5Eubuntu-20.04-from-scratch%5E2363%5E.log
h2. Error

{noformat}
E   assert Comparing QueryTestResults (expected vs actual):
E 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"1","total-files-size":"351","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
 != 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"1","total-files-size":"350","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
E 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"2","total-files-size":"702","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
 != 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"2","total-files-size":"700","total-data-files":"2","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
E 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"351","changed-partition-count":"1","total-records":"3","total-files-size":"1053","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
 != 
'append',true,'{"added-data-files":"1","added-records":"1","added-files-size":"350","changed-partition-count":"1","total-records":"3","total-files-size":"1050","total-data-files":"3","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}'
E 
row_regex:'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"[1-9][0-9]*","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"[1-9][0-9]*","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}'
 == 
'overwrite',true,'{"added-position-delete-files":"1","added-delete-files":"1","added-files-size":"1551","added-position-deletes":"1","changed-partition-count":"1","total-records":"3","total-files-size":"2601","total-data-files":"3","total-delete-files":"1","total-position-deletes":"1","total-equality-deletes":"0"}'

{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13007) [DOCS] Add description for setting capacity for spilling to s3

2024-04-16 Thread Yida Wu (Jira)
Yida Wu created IMPALA-13007:


 Summary: [DOCS] Add description for setting capacity for spilling 
to s3
 Key: IMPALA-13007
 URL: https://issues.apache.org/jira/browse/IMPALA-13007
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Reporter: Yida Wu
Assignee: Yida Wu


The current document seems not mentioning the capacity setting in configuration 
for spilling to s3, user could easily meet some space usage issue without 
setting proper capacity.
https://docs.cloudera.com/cdw-runtime/cloud/impala-reference/topics/impala_spill_s3.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount

2024-04-16 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12679 started by Kurt Deschler.
--
> test_rows_sent_counters failed to match RPCCount
> 
>
> Key: IMPALA-12679
> URL: https://issues.apache.org/jira/browse/IMPALA-12679
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Kurt Deschler
>Priority: Major
>
> {code}
> query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> {code}
> failed with
> {code}
> query_test/test_fetch.py:69: in test_rows_sent_counters
> assert re.search("RPCCount: [5-9]", runtime_profile)
> E   assert None
> E+  where None = ('RPCCount: [5-9]', 
> 'Query (id=c8476e5c065757bf:b4367698):\n  DEBUG MODE WARNING: Query 
> profile created while running a DEBUG buil...: 0.000ns\n - 
> WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n')
> E+where  = re.search
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12679) test_rows_sent_counters failed to match RPCCount

2024-04-16 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837805#comment-17837805
 ] 

Kurt Deschler commented on IMPALA-12679:


Updated assert to print Actual RPC Count https://gerrit.cloudera.org/#/c/21310/

> test_rows_sent_counters failed to match RPCCount
> 
>
> Key: IMPALA-12679
> URL: https://issues.apache.org/jira/browse/IMPALA-12679
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Kurt Deschler
>Priority: Major
>
> {code}
> query_test.test_fetch.TestFetch.test_rows_sent_counters[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> {code}
> failed with
> {code}
> query_test/test_fetch.py:69: in test_rows_sent_counters
> assert re.search("RPCCount: [5-9]", runtime_profile)
> E   assert None
> E+  where None = ('RPCCount: [5-9]', 
> 'Query (id=c8476e5c065757bf:b4367698):\n  DEBUG MODE WARNING: Query 
> profile created while running a DEBUG buil...: 0.000ns\n - 
> WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n')
> E+where  = re.search
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12313) Add support for UPDATE statements for Iceberg tables

2024-04-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-12313:
---
Issue Type: New Feature  (was: Bug)

> Add support for UPDATE statements for Iceberg tables
> 
>
> Key: IMPALA-12313
> URL: https://issues.apache.org/jira/browse/IMPALA-12313
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.4.0
>
>
> Add support for UPDATE statements for Iceberg tables.
> Initial design doc of DELETEs and UPDATEs: 
> [https://docs.google.com/document/d/1GuRiJ3jjqkwINsSCKYaWwcfXHzbMrsd3WEMDOB11Xqw/edit#heading=h.5bmfhbmb4qdk]
> Limitations:
>  * only write position delete files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-16 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837796#comment-17837796
 ] 

Michael Smith commented on IMPALA-13003:


This looks to be a generic iceberg issue. In this case the 
AlreadyExistsException comes from 
https://github.com/apache/iceberg/blob/apache-iceberg-1.4.3/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L201.
 
https://github.com/apache/impala/blob/4.4.0-rc1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4045
 handles {{org.apache.hadoop.hive.metastore.api.AlreadyExistsException}}, but 
not {{org.apache.iceberg.exceptions.AlreadyExistsException}}, so we likely need 
to add that as well. I should be able to rig up a debug action to reproduce 
this.

> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> {code}
> which suggests that creating the table with "if not exists" is not sufficient 
> to protect against concurrent creations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.ap

[jira] [Work started] (IMPALA-13003) Server exits early failing to create impala_query_log with AlreadyExistsException

2024-04-16 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13003 started by Michael Smith.
--
> Server exits early failing to create impala_query_log with 
> AlreadyExistsException
> -
>
> Key: IMPALA-13003
> URL: https://issues.apache.org/jira/browse/IMPALA-13003
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.4.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
>
> At startup workload management tries to create the query log table here:
> {code:java}
>   // The initialization code only works when run in a separate thread for 
> reasons unknown.
>   ABORT_IF_ERROR(SetupDbTable(internal_server_.get(), table_name));
> {code}
> This code is exiting:
> {code:java}
> I0413 23:40:05.183876 21006 client-request-state.cc:1348] 
> 1d4878dbc9214c81:6dc8cc2e] ImpalaRuntimeException: Error making 
> 'createTable' RPC to Hive Metastore:
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> I0413 23:40:05.184055 20955 impala-server.cc:2582] Connection 
> 27432606d99dcdae:218860164eb206bb from client in-memory.localhost:0 to server 
> internal-server closed. The connection had 1 associated session(s).
> I0413 23:40:05.184067 20955 impala-server.cc:1780] Closing session: 
> 27432606d99dcdae:218860164eb206bb
> I0413 23:40:05.184083 20955 impala-server.cc:1836] Closed session: 
> 27432606d99dcdae:218860164eb206bb, client address: .
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> I0413 23:40:05.184728 20883 impala-server.cc:1564] Query successfully 
> unregistered: query_id=1d4878dbc9214c81:6dc8cc2e
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> Wrote minidump to 
> /data/jenkins/workspace/impala-cdw-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/minidumps/impalad/402f37cc-4663-4c78-086ca295-a9e5943c.dmp
> {code}
> with stack
> {code:java}
> F0413 23:40:05.184111 20955 workload-management.cc:304] query timed out 
> waiting for results
> . Impalad exiting.
> *** Check failure stack trace: ***
> @  0x8e96a4d  google::LogMessage::Fail()
> @  0x8e98984  google::LogMessage::SendToLog()
> @  0x8e9642c  google::LogMessage::Flush()
> @  0x8e98ea9  google::LogMessageFatal::~LogMessageFatal()
> @  0x3da3a9a  impala::ImpalaServer::CompletedQueriesThread()
> @  0x3a8df93  boost::_mfi::mf0<>::operator()()
> @  0x3a8de97  boost::_bi::list1<>::operator()<>()
> @  0x3a8dd77  boost::_bi::bind_t<>::operator()()
> @  0x3a8d672  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x301e7d0  boost::function0<>::operator()()
> @  0x43ce415  impala::Thread::SuperviseThread()
> @  0x43e2dc7  boost::_bi::list5<>::operator()<>()
> @  0x43e29e7  boost::_bi::bind_t<>::operator()()
> @  0x43e21c5  boost::detail::thread_data<>::run()
> @  0x7984c37  thread_proxy
> @ 0x7f75b6982ea5  start_thread
> @ 0x7f75b36a7b0d  __clone
> Picked up JAVA_TOOL_OPTIONS: 
> -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n   
> -Dsun.java.command=impalad
> Minidump in thread [20955]completed-queries running query 
> :, fragment instance 
> :
> {code}
> I think the key error is 
> {code}
> CAUSED BY: AlreadyExistsException: Table was created concurrently: 
> sys.impala_query_log
> {code}
> which suggests that creating the table with "if not exists" is not sufficient 
> to protect against concurrent creations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12653) Update documentation about the UPDATE statement

2024-04-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-12653.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Update documentation about the UPDATE statement
> ---
>
> Key: IMPALA-12653
> URL: https://issues.apache.org/jira/browse/IMPALA-12653
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.4.0
>
>
> Update documentation about the UPDATE statement
> Also list the limitations of UPDATE/DELETE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12894) Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files

2024-04-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-12894.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Optimized count(*) for Iceberg gives wrong results after a Spark 
> rewrite_data_files
> ---
>
> Key: IMPALA-12894
> URL: https://issues.apache.org/jira/browse/IMPALA-12894
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Gabor Kaszab
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: correctness, impala-iceberg
> Fix For: Impala 4.4.0
>
> Attachments: count_star_correctness_repro.tar.gz
>
>
> Issue was introduced by https://issues.apache.org/jira/browse/IMPALA-11802 
> that implemented an optimized way to get results for count(*). However, if 
> the table was compacted by Spark this optimization can give incorrect results.
> The reason is that Spark can[ skip dropping delete 
> files|https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files]
>  that are pointing to compacted data files, as a result there might be delete 
> files after compaction that are no longer applied to any data files.
> Repro:
> With Impala
> {code:java}
> create table default.iceberg_testing (id int, j bigint) STORED AS ICEBERG
> TBLPROPERTIES('iceberg.catalog'='hadoop.catalog',
>               'iceberg.catalog_location'='/tmp/spark_iceberg_catalog/',
>               'iceberg.table_identifier'='iceberg_testing',
>               'format-version'='2');
> insert into iceberg_testing values
> (1, 1), (2, 4), (3, 9), (4, 16), (5, 25);
> update iceberg_testing set j = -100 where id = 4;
> delete from iceberg_testing where id = 4;{code}
> Count * returns 4 at this point.
> Run compaction in Spark:
> {code:java}
> spark.sql(s"CALL local.system.rewrite_data_files(table => 
> 'default.iceberg_testing', options => map('min-input-files','2') )").show() 
> {code}
> Now count * in Impala returns 8 (might require an IM if in HadoopCatalog). 
> Hive returns correct results. Also a SELECT * returns correct results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12903) Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala

2024-04-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-12903.

Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Querying virtual column FILE__POSITION for TEXT and JSON tables crashes Impala
> --
>
> Key: IMPALA-12903
> URL: https://issues.apache.org/jira/browse/IMPALA-12903
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Repro:
> {noformat}
> select file__position from functional.alltypes;  => CRASH
> select file__position from functional_json.alltypes; => CRASH{noformat}
> Stack trace:
> {noformat}
> ...
> #6  
> #7  0x031a671c in impala::ScannerContext::Stream::file_desc 
> (this=0x0) at /home/boroknagyz/Impala/be/src/exec/scanner-context.h:157
> #8  0x03351630 in impala::HdfsJsonScanner::Close (this=0xea22d80, 
> row_batch=0xed63a20) at 
> /home/boroknagyz/Impala/be/src/exec/json/hdfs-json-scanner.cc:99
> #9  0x031c3eff in impala::HdfsScanner::Close (this=0xea22d80) at 
> /home/boroknagyz/Impala/be/src/exec/hdfs-scanner.cc:176
> #10 0x032f057f in impala::HdfsScanNode::ProcessSplit 
> (this=0x14eb9000, filter_ctxs=..., expr_results_pool=0x7fa54b5cf400, 
> scan_range=0xf2bb680, scanner_thread_reservation=0x7fa54b5cf328)
>     at /home/boroknagyz/Impala/be/src/exec/hdfs-scan-node.cc:500
> #11 0x032ef94c in impala::HdfsScanNode::ScannerThread 
> (this=0x14eb9000, first_thread=true, scanner_thread_reservation=131072) at 
> /home/boroknagyz/Impala/be/src/exec/hdfs-scan-node.cc:422{noformat}
> At frame #7 stream is NULL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11495) Add glibc version and effective locale to the Web UI

2024-04-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837780#comment-17837780
 ] 

ASF subversion and git services commented on IMPALA-11495:
--

Commit 0606fc760f21587206cfb4f8256c7cd575050cf2 in impala's branch 
refs/heads/master from Saurabh Katiyal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0606fc760 ]

IMPALA-11495: Add glibc version and effective locale to the Web UI

Added a new section "Other Info" in root page for WebUI,
displaying effective locale and glibc version.

Change-Id: Ia69c4d63df4beae29f5261691a8dcdd04b931de7
Reviewed-on: http://gerrit.cloudera.org:8080/21252
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add glibc version and effective locale to the Web UI
> 
>
> Key: IMPALA-11495
> URL: https://issues.apache.org/jira/browse/IMPALA-11495
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: newbie, observability, supportability
>
> When debugging utf8 mode string functions, it's essential to know the 
> effective Unicode version and locale. The Unicode standard version can be 
> deduced from the glibc version which can be got by command "ldd --version". 
> We need to find a programmatic way to get it.
> The effective locale is already logged here:
> https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/be/src/common/init.cc#L406
> We just need to show it in impalad's Web UI as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12963) Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds

2024-04-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837779#comment-17837779
 ] 

ASF subversion and git services commented on IMPALA-12963:
--

Commit 74ff59b9138f325fd22ce198bd01423abafd3688 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=74ff59b91 ]

IMPALA-12963: Return parent PID when children spawned

Returns the original PID for a command rather than any children that may
be active. This happens during graceful shutdown in UBSAN tests. Also
updates 'kill' to use the version of 'get_pid' that logs details to help
with debugging.

Moves try block in test_query_log.py to after client2 has been
initialized. Removes 'drop table' on unique_database, since test suite
already handles cleanup.

Change-Id: I214e79507c717340863d27f68f6ea54c169e4090
Reviewed-on: http://gerrit.cloudera.org:8080/21278
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds
> ---
>
> Key: IMPALA-12963
> URL: https://issues.apache.org/jira/browse/IMPALA-12963
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Yida Wu
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Testcase test_query_log_table_lower_max_sql_plan failed in ubsan builds with 
> following messages:
> *Error Message*
> {code:java}
> test setup failure
> {code}
> *Stacktrace*
> {code:java}
> common/custom_cluster_test_suite.py:226: in teardown_method
> impalad.wait_for_exit()
> common/impala_cluster.py:471: in wait_for_exit
> while self.__get_pid() is not None:
> common/impala_cluster.py:414: in __get_pid
> assert len(pids) < 2, "Expected single pid but found %s" % ", 
> ".join(map(str, pids))
> E   AssertionError: Expected single pid but found 892, 31942
> {code}
> *Standard Error*
> {code:java}
> -- 2024-03-28 04:21:44,105 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--enable_workload_mgmt 
> --query_log_write_interval_s=1 --cluster_id=test_max_select 
> --shutdown_grace_period_s=10 --shutdown_deadline_s=60 
> --query_log_max_sql_length=2000 --query_log_max_plan_length=2000 ' 
> '--state_store_args=None ' '--catalogd_args=--enable_workload_mgmt ' 
> --impalad_args=--default_query_options=
> 04:21:44 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 04:21:44 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:21:44 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:21:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-cdw-master-staging-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:47 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:47 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 04:21:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 04:21:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-174b.vpc.cloudera.com:25000
> 04:21:49 MainThread: Waiting for num_known_live_backends=3. Current value: 2
> 04:21:50 M

[jira] [Commented] (IMPALA-12350) Daemon fails to initialize large catalog

2024-04-16 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837712#comment-17837712
 ] 

Quanlong Huang commented on IMPALA-12350:
-

[~saulius.vl] Thanks for reporting this! There seems to be several issues.

The size of a catalog topic message (topic delta) will have a limit (configured 
by thrift_rpc_max_message_size) after upgrading to thrift-0.16 since 
Impala-4.2. When transfering the whole catalog topic to a newly added/restarted 
coordinator, the topic message size could hit the limit.
{quote}Interestingly the catalog topic increased significantly after upgrading 
from 3.4.0 to 4.2.0 - from ~800mb to ~3.4gb.
{quote}
In 4.2.0, catalogd sends catalog updates in partition level 
(enable_incremental_metadata_updates=true). In 3.4.0, it sends them in table 
level. So there are more topic entrics in 4.2.0. On the other hand. the 
compression rate of catalog objects will be smaller in 4.2.0 since compressing 
the whole table save more space than compressing partitions individually. We 
recommend switching from the legacy catalog mode to the local catalog mode so 
catalog objects sent to the catalog topics will be pretty small, which will 
solve this issue.

To turn on local catalog mode, set use_local_catalog=true on all coordinators 
and set catalog_topic_mode=minimal on catalogd.

Sorry for the late reply. Any feedbacks will be appreciated!

> Daemon fails to initialize large catalog
> 
>
> Key: IMPALA-12350
> URL: https://issues.apache.org/jira/browse/IMPALA-12350
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Saulius Valatka
>Priority: Major
>
> When the statestored catalog topic is large enough (>2gb) daemons fail to 
> restart and get stuck in a loop:
> {{I0808 13:07:17.702653 3633556 Frontend.java:1618] Waiting for local catalog 
> to be initialized, attempt: 2068}}
>  
> The statestored reports errors as follows:
> {{I0808 13:07:05.587296 2134270 thrift-util.cc:196] TSocket::write_partial() 
> send() : Broken pipe}}
> {{I0808 13:07:05.587356 2134270 client-cache.h:362] RPC Error: Client for 
> gs1-hdp-data70:23000 hit an unexpected exception: write() send(): Broken 
> pipe, type: N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala20TUpdateStateResponseE, send: not done}}
> {{I0808 13:07:05.587365 2134270 client-cache.cc:174] Broken Connection, 
> destroy client for gs1-hdp-data70:23000}}
>  
> If this happens we are forced to restart statestore and thus the whole 
> cluster, meaning that we can't tolerate failure from even a single daemon.
> Interestingly the catalog topic increased significantly after upgrading from 
> 3.4.0 to 4.2.0 - from ~800mb to ~3.4gb. Invalidate/refresh operations also 
> became significantly slower (~10ms -> 5s).
> Probably related to thrift_rpc_max_message_size? but I see the maximum value 
> is 2gb.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet

2024-04-16 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs reassigned IMPALA-13006:
-

Assignee: Noemi Pap-Takacs  (was: Daniel Becker)

> Some Iceberg test tables are not restricted to Parquet
> --
>
> Key: IMPALA-13006
> URL: https://issues.apache.org/jira/browse/IMPALA-13006
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Our Iceberg test tables/views are restricted to the Parquet file format in 
> functional/schema_constraints.csv except for the following two:
> {code:java}
> iceberg_query_metadata
> iceberg_view{code}
> This is not intentional, so we should add the constraint for these tables too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13006) Some Iceberg test tables are not restricted to Parquet

2024-04-16 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-13006:
--

 Summary: Some Iceberg test tables are not restricted to Parquet
 Key: IMPALA-13006
 URL: https://issues.apache.org/jira/browse/IMPALA-13006
 Project: IMPALA
  Issue Type: Bug
Reporter: Daniel Becker
Assignee: Daniel Becker


Our Iceberg test tables/views are restricted to the Parquet file format in 
functional/schema_constraints.csv except for the following two:
{code:java}
iceberg_query_metadata
iceberg_view{code}
This is not intentional, so we should add the constraint for these tables too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12997) test_query_log tests get stuck trying to write to the log

2024-04-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837620#comment-17837620
 ] 

Zoltán Borók-Nagy commented on IMPALA-12997:


Under the hood Iceberg uses HMS locks for its transactions (if the table is 
stored in the HiveCatalog):
 * 
[https://github.com/apache/iceberg/blob/fc5b2b336c774b0b8b032f7d87a1fb21e76b3f20/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L182]
 * 
[https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java]

These operations transactions should be very fast as usually they just set the 
table property 'metadata_location' (and 'previous_metadata_location').

Normally the lock should be cleaned up at the end of the operation. If the 
process dies before it could free the locks then HMS will free them up after 
some time (due to lack of heartbeating). The timeout should be 5 mins by 
default:
[https://github.com/apache/hive/blob/f06cc2920424817da6405e0efe268ce6cd64a363/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java#L1642]
But I also saw cases when it took much more time than that.

> test_query_log tests get stuck trying to write to the log
> -
>
> Key: IMPALA-12997
> URL: https://issues.apache.org/jira/browse/IMPALA-12997
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> In some test runs, most tests under test_query_log will start to fail on 
> various conditions like
> {code}
> custom_cluster/test_query_log.py:452: in 
> test_query_log_table_query_select_mt_dop
> "impala-server.completed-queries.written", 1, 60)
> common/impala_service.py:144: in wait_for_metric_value
> self.__metric_timeout_assert(metric_name, expected_value, timeout)
> common/impala_service.py:213: in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.completed-queries.written did not 
> reach value 1 in 60s.
> E   Dumping debug webpages in JSON format...
> E   Dumped memz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/memz.json
> E   Dumped metrics JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/metrics.json
> E   Dumped queries JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/queries.json
> E   Dumped sessions JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/sessions.json
> E   Dumped threadz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/threadz.json
> E   Dumped rpcz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/rpcz.json
> E   Dumping minidumps for impalads/catalogds...
> E   Dumped minidump for Impalad PID 3680802
> E   Dumped minidump for Impalad PID 3680805
> E   Dumped minidump for Impalad PID 3680809
> E   Dumped minidump for Catalogd PID 3680732
> {code}
> or
> {code}
> custom_cluster/test_query_log.py:921: in test_query_log_ignored_sqls
> assert len(sql_results.data) == 1, "query not found in completed queries 
> table"
> E   AssertionError: query not found in completed queries table
> E   assert 0 == 1
> E+  where 0 = len([])
> E+where [] =  object at 0xa00cc350>.data
> {code}
> One symptom that seems related to this is INSERT operations into 
> sys.impala_query_log that start "UnregisterQuery()" but never finish (with 
> "Query successfully unregistered").
> We can identify cases like that with
> {code}
> for log in $(ag -l 'INSERT INTO sys.impala_query_log' impalad.*); do echo 
> $log; for qid in $(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO 
> sys.impala_query_log' $log | cut -d']' -f1); do if ! ag "Query successfully 
> unregistered: query_id=$qid" $log; then echo "$qid not unregistered"; fi; 
> done; done
> {code}
> A similar case may occur with creating the table too
> {code}
> for log in $(ag -l 'CREATE TABLE IF NOT EXISTS sys.impala_query_log' 
> impalad.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-*);
>  do QID=$(ag -o '[0-9a-f]*:[0-9a-f]*\] Analyzing query: INSERT INTO 
> sys.impala_query_log' $log | cut -d']' -f1); echo $log; ag "Query 
> successfully unregistered: query_id=$QID" $log; done
> {code}
> although these frequently fail because the test completes and shuts down 
> Impala before the CREATE TABLE query completes.
> Tracking one of those cases led to catalogd errors that repeated for 1m27s 
> before the test suite restarted catalogd:
> {code}
> W0410 12:48:05.051760 3681790 Tasks.java:456] 
> 6647229faf7637d5:3ec7565b] Retrying task after failure:

[jira] [Updated] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package

2024-04-16 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12979:

Affects Version/s: Impala 3.4.2

> Wildcard in CLASSPATH might not work in the RPM package
> ---
>
> Key: IMPALA-12979
> URL: https://issues.apache.org/jira/browse/IMPALA-12979
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.2
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 3.4.2
>
>
> I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS 
> 7.9 and found launching catalogd failed by the following error (in 
> catalogd.INFO):
> {noformat}
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build 
> 1.8.0_141-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [catalogd+0x27af14c]  
> llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, 
> llvm::ScalarEvolution&) const+0x73c
> #
> # Core dump written. Default location: /opt/impala/core or core.156082
> #
> # An error report file with more information is saved as:
> # /tmp/hs_err_pid156082.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> # {noformat}
> There are other logs in catalogd.ERROR
> {noformat}
> Log file created at: 2024/04/08 04:49:28
> Running on machine: ccycloud-1.quanlong.root.comops.site
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this 
> file.
> Wrote minidump to 
> /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp
> could not find method getRootCauseMessage from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> could not find method getStackTrace from class (null) with signature 
> (Ljava/lang/Throwable;)Ljava/lang/String;
> FileSystem: loadFileSystems failed error:
> (unable to get root cause for java.lang.NoClassDefFoundError)
> (unable to get stack trace for java.lang.NoClassDefFoundError){noformat}
> Resolving the minidump shows me the following stacktrace:
> {noformat}
> (gdb) bt
> #0  0x02baf14c in ?? ()
> #1  0x02baee24 in getJNIEnv ()
> #2  0x02bacb71 in hdfsBuilderConnect ()
> #3  0x012e6ae2 in impala::JniUtil::InitLibhdfs() ()
> #4  0x012e7897 in impala::JniUtil::Init() ()
> #5  0x00be9297 in impala::InitCommonRuntime(int, char**, bool, 
> impala::TestInfo::Mode) ()
> #6  0x00bb604a in CatalogdMain(int, char**) ()
> #7  0x00b33f97 in main (){noformat}
> It indicates something wrong in initializing the JVM. Here are the env vars:
> {noformat}
> Environment Variables:
> JAVA_HOME=/usr/java/jdk1.8.0_141
> CLASSPATH=/opt/impala/conf:/opt/impala/jar/*
> PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin
> LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64
> SHELL=/bin/bash{noformat}
> We use wildcard "*" in the classpath which seems to be the cause. The issue 
> was resolved after using explicit paths in the classpath. Here are what I 
> changed in bin/impala-env.sh:
> {code:bash}
> #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*"
> CLASSPATH=/opt/impala/conf
> for jar in /opt/impala/jar/*.jar; do
>   CLASSPATH="$CLASSPATH:$jar"
> done
> export CLASSPATH
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8778) Support read Apache Hudi Read Optimized tables

2024-04-16 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-8778.

Fix Version/s: Impala 3.4.0
   Resolution: Implemented

> Support read Apache Hudi Read Optimized tables
> --
>
> Key: IMPALA-8778
> URL: https://issues.apache.org/jira/browse/IMPALA-8778
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Yuanbin Cheng
>Assignee: Yanjia Gary Li
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Apache Impala currently not support Apache Hudi, cannot even pull metadata 
> from Hive.
> Related issue: 
> [https://github.com/apache/incubator-hudi/issues/179] 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-8778) Support read Apache Hudi Read Optimized tables

2024-04-16 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reopened IMPALA-8778:


> Support read Apache Hudi Read Optimized tables
> --
>
> Key: IMPALA-8778
> URL: https://issues.apache.org/jira/browse/IMPALA-8778
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Yuanbin Cheng
>Assignee: Yanjia Gary Li
>Priority: Major
>
> Apache Impala currently not support Apache Hudi, cannot even pull metadata 
> from Hive.
> Related issue: 
> [https://github.com/apache/incubator-hudi/issues/179] 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146|https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146?filter=allopenissues]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12997) test_query_log tests get stuck trying to write to the log

2024-04-16 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837540#comment-17837540
 ] 

Quanlong Huang commented on IMPALA-12997:
-

It happens in committing the iceberg transaction. At the first glance, I 
thought it's due to too many concurrent INSERTs into this sys.impala_query_log 
table. However, while looking into the logs, they all happened in 
custom-cluster tests. So it's not a concurrent issue.

I found it happened in consecutive custom-cluster tests and then recovered at 
some point after 1h. E.g. in one of the occurence, the log of 
WaitingForLockException occurs in the following catalogd.INFO files:
{noformat}
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-124753.3680732
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-124935.3682321
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125217.3683855
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125529.3685531
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-125838.3687542
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130107.3689427
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130450.3691640
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-130759.3693690
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131108.3695684
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131417.3697665
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-131726.3699674
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132038.3701683
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132346.3703644
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132555.3705491
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-132804.3707310
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133113.3709349
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133425.3711438
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-133738.3713509
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-134047.3715490
catalogd.impala-ec2-rhel88-m7g-4xlarge-ondemand-0a5a.vpc.cloudera.com.jenkins.log.INFO.20240410-134359.3717486{noformat}
They are  consecutive custom-cluster tests (based on the timestamps in the 
filename). All other custom-cluster tests before or after them are fine.

[~boroknagyz] Is it possible that an Iceberg table is in a locked stage and got 
recovered after a timeout of 1h?

> test_query_log tests get stuck trying to write to the log
> -
>
> Key: IMPALA-12997
> URL: https://issues.apache.org/jira/browse/IMPALA-12997
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> In some test runs, most tests under test_query_log will start to fail on 
> various conditions like
> {code}
> custom_cluster/test_query_log.py:452: in 
> test_query_log_table_query_select_mt_dop
> "impala-server.completed-queries.written", 1, 60)
> common/impala_service.py:144: in wait_for_metric_value
> self.__metric_timeout_assert(metric_name, expected_value, timeout)
> common/impala_service.py:213: in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.completed-queries.written did not 
> reach value 1 in 60s.
> E   Dumping debug webpages in JSON format...
> E   Dumped memz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/memz.json
> E   Dumped metrics JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/metrics.json
> E   Dumped queries JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/queries.json
> E   Dumped sessions JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/sessions.json
> E   Dumped threadz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/threadz.json
> E   Dumped rpcz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20240410_12:49:04/json/rpcz.json
> E   Dumping minidumps for