[jira] [Commented] (IMPALA-12693) Typo in link for ltrim in string functions docs

2024-03-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826019#comment-17826019
 ] 

ASF subversion and git services commented on IMPALA-12693:
--

Commit eb2939245f58a8612a4f68c866abefdf42ab6113 in impala's branch 
refs/heads/master from Saurabh Katiyal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=eb2939245 ]

IMPALA-12693: [DOCS] Typo in link for ltrim in string functions docs

Fixed documentation typo for LTRIM string function, from LTRI to LTRIM.

Change-Id: If4345fc6d19f04d0c0c6feef3e0c8598271224fe
Reviewed-on: http://gerrit.cloudera.org:8080/21123
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 


> Typo in link for ltrim in string functions docs
> ---
>
> Key: IMPALA-12693
> URL: https://issues.apache.org/jira/browse/IMPALA-12693
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Saurabh Katiyal
>Priority: Major
>  Labels: newbie
>
> The link text for this URL is wrong:
> {noformat}
>       
>         LTRI 
>       {noformat}
> It should be LTRIM, not LTRI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12899) Temporary workaround for BINARY in complex types

2024-03-13 Thread Daniel Becker (Jira)
Daniel Becker created IMPALA-12899:
--

 Summary: Temporary workaround for BINARY in complex types
 Key: IMPALA-12899
 URL: https://issues.apache.org/jira/browse/IMPALA-12899
 Project: IMPALA
  Issue Type: Improvement
Reporter: Daniel Becker
Assignee: Daniel Becker


The BINARY type is currently not supported inside complex types and a 
cross-component decision is probably needed to support it (see IMPALA-11491). 
We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables 
(IMPALA-12612), which requires that queries with BINARY inside complex types 
don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than 
IMPALA-11491, so we should come up with a temporary solution, e.g. NULLing 
BINARY values in complex types and logging a warning, or setting these BINARY 
values to a warning string.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12899) Temporary workaround for BINARY in complex types

2024-03-13 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-12899:
---
Parent: IMPALA-10947
Issue Type: Sub-task  (was: Improvement)

> Temporary workaround for BINARY in complex types
> 
>
> Key: IMPALA-12899
> URL: https://issues.apache.org/jira/browse/IMPALA-12899
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The BINARY type is currently not supported inside complex types and a 
> cross-component decision is probably needed to support it (see IMPALA-11491). 
> We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables 
> (IMPALA-12612), which requires that queries with BINARY inside complex types 
> don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than 
> IMPALA-11491, so we should come up with a temporary solution, e.g. NULLing 
> BINARY values in complex types and logging a warning, or setting these BINARY 
> values to a warning string.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12809) Iceberg metadata table scanner should always be scheduled to the coordinator

2024-03-13 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker updated IMPALA-12809:
---
Summary: Iceberg metadata table scanner should always be scheduled to the 
coordinator  (was: Iceberg metadata table scanner can be scheduled to executors)

> Iceberg metadata table scanner should always be scheduled to the coordinator
> 
>
> Key: IMPALA-12809
> URL: https://issues.apache.org/jira/browse/IMPALA-12809
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Tamas Mate
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> On larger clusters the Iceberg metadata scanner can be scheduled to 
> executors, for example during a join. The fragment in this case will fail a 
> precondition check, because either the frontend_ object will not be present 
> or the table. Setting {{exec_at_coord}} to true is not enough and these 
> fragments should be scheduled to the {{{}coord_only_executor_group{}}}.
> Additionally, setting NUM_NODES=1 should be a viable workaround.
> Reproducible with the following local dev Impala cluster:
> {{./bin/start-impala-cluster.py --cluster_size=3 --num_coordinators=1 
> --use_exclusive_coordinators}}
> and query:
> {{select count(b.parent_id) from 
> functional_parquet.iceberg_query_metadata.history a}}
> {{join functional_parquet.iceberg_query_metadata.history b on a.snapshot_id = 
> b.snapshot_id;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12900) Compile binutils with -O3 in the toolchain

2024-03-13 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-12900:
--

 Summary: Compile binutils with -O3 in the toolchain
 Key: IMPALA-12900
 URL: https://issues.apache.org/jira/browse/IMPALA-12900
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell


Since the toolchain builds binutils with the native compiler (as the toolchain 
compiler hasn't been built yet), we haven't set CFLAGS yet. The default CFLAGS 
for binutils use -O2. It's possible that we could get a bit more speed by 
building with -O3. We should set CFLAGS/CXXFLAGS to use -O3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12896) Avoid JDBC table to be set as transactional table

2024-03-13 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-12896.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Avoid JDBC table to be set as transactional table
> -
>
> Key: IMPALA-12896
> URL: https://issues.apache.org/jira/browse/IMPALA-12896
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Found following issues in downstream integration.
> 1) JDBC tables created in some deployment environment were set as 
> transactional tables by default. This caused catalogd failed to load the 
> metadata for JDBC tables. We have to explicitly set table properties with 
> "transactional=false" for JDBC tables.
> 2) FileSystemUtil.copyFileFromUriToLocal() function wrote log message only 
> for IOException. We should write log message for all types of exceptions so 
> that we can captures errors which caused failures to load JDBC drivers. 
> 3) The operations on JDBC table are processed only on coordinator. The
> processed rows should be estimated as 0 for DataSourceScanNode by planner so 
> that  coordinator-only query plans are generated for simple queries on JDBC 
> tables and queries could be executed without invoking executor nodes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12894) Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files

2024-03-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826868#comment-17826868
 ] 

ASF subversion and git services commented on IMPALA-12894:
--

Commit ada4090e0989805ed884e135356c6b688e7ccc96 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ada4090e0 ]

IMPALA-12894: (part 1) Turn off the count(*) optimisation for V2 Iceberg tables

This is a part 1 change that turns off the count(*) optimisations for
V2 tables as there is a correctness issue with it. The reason is that
Spark compaction may leave some dangling delete files that mess up
the logic in Impala.

Change-Id: Ida9fb04fd076c987b6b5257ad801bf30f5900237
Reviewed-on: http://gerrit.cloudera.org:8080/21139
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Optimized count(*) for Iceberg gives wrong results after a Spark 
> rewrite_data_files
> ---
>
> Key: IMPALA-12894
> URL: https://issues.apache.org/jira/browse/IMPALA-12894
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Gabor Kaszab
>Priority: Critical
>  Labels: correctness, impala-iceberg
> Attachments: count_star_correctness_repro.tar.gz
>
>
> Issue was introduced by https://issues.apache.org/jira/browse/IMPALA-11802 
> that implemented an optimized way to get results for count(*). However, if 
> the table was compacted by Spark this optimization can give incorrect results.
> The reason is that Spark can[ skip dropping delete 
> files|https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files]
>  that are pointing to compacted data files, as a result there might be delete 
> files after compaction that are no longer applied to any data files.
> Repro:
> With Impala
> {code:java}
> create table default.iceberg_testing (id int, j bigint) STORED AS ICEBERG
> TBLPROPERTIES('iceberg.catalog'='hadoop.catalog',
>               'iceberg.catalog_location'='/tmp/spark_iceberg_catalog/',
>               'iceberg.table_identifier'='iceberg_testing',
>               'format-version'='2');
> insert into iceberg_testing values
> (1, 1), (2, 4), (3, 9), (4, 16), (5, 25);
> update iceberg_testing set j = -100 where id = 4;
> delete from iceberg_testing where id = 4;{code}
> Count * returns 4 at this point.
> Run compaction in Spark:
> {code:java}
> spark.sql(s"CALL local.system.rewrite_data_files(table => 
> 'default.iceberg_testing', options => map('min-input-files','2') )").show() 
> {code}
> Now count * in Impala returns 8 (might require an IM if in HadoopCatalog). 
> Hive returns correct results. Also a SELECT * returns correct results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12896) Avoid JDBC table to be set as transactional table

2024-03-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826869#comment-17826869
 ] 

ASF subversion and git services commented on IMPALA-12896:
--

Commit 6c0c26146d956ad771cee27283c1371b9c23adce in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6c0c26146 ]

IMPALA-12896: Avoid JDBC table to be set as transactional table

In some deployment environment, JDBC tables are set as transactional
tables by default. This causes catalogd failed to load the metadata for
JDBC tables. This patch explicitly add table properties with
"transactional=false" for JDBC table to avoid the JDBC to be set as
transactional table.

The operations on JDBC table are processed only on coordinator. The
processed rows should be estimated as 0 for DataSourceScanNode by
planner so that coordinator-only query plans are generated for simple
queries on JDBC tables and queries could be executed without invoking
executor nodes. Also adds Preconditions.check to make sure numNodes
equals 1 for DataSourceScanNode.

Updates FileSystemUtil.copyFileFromUriToLocal() function to write log
message for all types of exceptions.

Testing:
 - Fixed planer tests for data source tables.
 - Ran end-to-end tests of JDBC tables with query option
   'exec_single_node_rows_threshold' as default value 100.
 - Passed core-tests.

Change-Id: I556faeda923a4a11d4bef8c1250c9616f77e6fa6
Reviewed-on: http://gerrit.cloudera.org:8080/21141
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Avoid JDBC table to be set as transactional table
> -
>
> Key: IMPALA-12896
> URL: https://issues.apache.org/jira/browse/IMPALA-12896
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Found following issues in downstream integration.
> 1) JDBC tables created in some deployment environment were set as 
> transactional tables by default. This caused catalogd failed to load the 
> metadata for JDBC tables. We have to explicitly set table properties with 
> "transactional=false" for JDBC tables.
> 2) FileSystemUtil.copyFileFromUriToLocal() function wrote log message only 
> for IOException. We should write log message for all types of exceptions so 
> that we can captures errors which caused failures to load JDBC drivers. 
> 3) The operations on JDBC table are processed only on coordinator. The
> processed rows should be estimated as 0 for DataSourceScanNode by planner so 
> that  coordinator-only query plans are generated for simple queries on JDBC 
> tables and queries could be executed without invoking executor nodes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12896) Avoid JDBC table to be set as transactional table

2024-03-13 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826895#comment-17826895
 ] 

Wenzhe Zhou commented on IMPALA-12896:
--

Planner.checkForSmallQueryOptimization() use MaxRowsProcessedVisitor to find 
the maxRowsProcessed_ from the nodes in the plan tree.  For DataSourceScanNode, 
its numRows (caller.getInputCardinality()) equals 0, there is no stats and does 
not have simple 'limit' for most queries. So MaxRowsProcessedVisitor.visit() 
set valid_ as false.  This causes Planner to create distributed plan for query 
on JDBC tables.  The merged patch change MaxRowsProcessedVisitor.visit() for 
DataSourceScanNode and estimate numRows as 0. If all the scan nodes are 
DataSourceScanNode, then maxRowsProcessed_ will be determined by non scan 
nodes. It's more likely to make Planner to create one fragment plan.
Should we create distributed plan for queries with join on multiple JDBC 
tables? 

> Avoid JDBC table to be set as transactional table
> -
>
> Key: IMPALA-12896
> URL: https://issues.apache.org/jira/browse/IMPALA-12896
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Found following issues in downstream integration.
> 1) JDBC tables created in some deployment environment were set as 
> transactional tables by default. This caused catalogd failed to load the 
> metadata for JDBC tables. We have to explicitly set table properties with 
> "transactional=false" for JDBC tables.
> 2) FileSystemUtil.copyFileFromUriToLocal() function wrote log message only 
> for IOException. We should write log message for all types of exceptions so 
> that we can captures errors which caused failures to load JDBC drivers. 
> 3) The operations on JDBC table are processed only on coordinator. The
> processed rows should be estimated as 0 for DataSourceScanNode by planner so 
> that  coordinator-only query plans are generated for simple queries on JDBC 
> tables and queries could be executed without invoking executor nodes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-03-13 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826933#comment-17826933
 ] 

Maxwell Guo commented on IMPALA-12771:
--

[~stigahuang][~mylogi...@gmail.com][~VenuReddy]
Hi  can  you help to take a look again ? 

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12901) Add table property to inject delay in event processing

2024-03-13 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12901:
---

 Summary: Add table property to inject delay in event processing
 Key: IMPALA-12901
 URL: https://issues.apache.org/jira/browse/IMPALA-12901
 Project: IMPALA
  Issue Type: Test
  Components: Catalog
Reporter: Quanlong Huang


We have tests that verify the behaviors during event processing. We use global 
debug action like "--debug_actions=catalogd_event_processing_delay:SLEEP@2000" 
to inject the delay.
It'd be helpful to add a table property for the same which only affects 
processing events on that table. So we can control the delay more specifically.

This is pointed by [~csringhofer] during the review of 
https://gerrit.cloudera.org/c/20986/8/tests/custom_cluster/test_web_pages.py#444



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org