[jira] [Resolved] (IMPALA-12977) add search and pagination to /hadoop-varz

2024-05-07 Thread Saurabh Katiyal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Katiyal resolved IMPALA-12977.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

>  add search and pagination to /hadoop-varz
> --
>
> Key: IMPALA-12977
> URL: https://issues.apache.org/jira/browse/IMPALA-12977
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Saurabh Katiyal
>Assignee: Saurabh Katiyal
>Priority: Minor
>  Labels: newbie
> Fix For: Impala 4.4.0
>
>
> /hadoop-varz has 2000+ configurations , It'd be nice to have some of the 
> tools like search and pagination that we get on /varz (flags.tmpl)
> existing template to /hadoop-varz:
> [https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/www/hadoop-varz.tmpl]
> existing template to /varz:
> [https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/www/flags.tmpl]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13009) Potential leak of partition deletions in the catalog topic

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844278#comment-17844278
 ] 

ASF subversion and git services commented on IMPALA-13009:
--

Commit 5d32919f46117213249c60574f77e3f9bb66ed90 in impala's branch 
refs/heads/branch-4.4.0 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d32919f4 ]

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.
See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta().

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Reviewed-on: http://gerrit.cloudera.org:8080/21326
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit ee21427d26620b40d38c706b4944d2831f84f6f5)


> Potential leak of partition deletions in the catalog topic
> --
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, 
> Impala 4.1.2, Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Some partitions of a table are dropped.
> * The HdfsTable object is removed sequentially before catalogd collects the 
> dropped partitions.
> In such case, catalogd loses track of the dropped partitions so their updates 
> keep existing in the catalog topic, until the partition names are reused 
> again.
> Note that the HdfsTable object can be removed by commands like DropTable or 
> INVALIDATE.
> The leaked partitions will be detected when a coordinator restarts. An 
> IllegalStateException complaining stale partitions will be reported, causing 
> the table not being added to the catalog cache of coordinator.
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)

[jira] [Commented] (IMPALA-12831) HdfsTable.toMinimalTCatalogObject() should hold table read lock to generate incremental updates

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844280#comment-17844280
 ] 

ASF subversion and git services commented on IMPALA-12831:
--

Commit 5d32919f46117213249c60574f77e3f9bb66ed90 in impala's branch 
refs/heads/branch-4.4.0 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d32919f4 ]

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.
See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta().

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Reviewed-on: http://gerrit.cloudera.org:8080/21326
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit ee21427d26620b40d38c706b4944d2831f84f6f5)


> HdfsTable.toMinimalTCatalogObject() should hold table read lock to generate 
> incremental updates
> ---
>
> Key: IMPALA-12831
> URL: https://issues.apache.org/jira/browse/IMPALA-12831
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, 
> Impala 4.1.2, Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> When enable_incremental_metadata_updates=true (default), catalogd sends 
> incremental partition updates to coordinators, which goes into 
> HdfsTable.toMinimalTCatalogObject():
> {code:java}
>   public TCatalogObject toMinimalTCatalogObject() {
> TCatalogObject catalogObject = super.toMinimalTCatalogObject();
> if (!BackendConfig.INSTANCE.isIncrementalMetadataUpdatesEnabled()) {
>   return catalogObject;
> }
> catalogObject.getTable().setTable_type(TTableType.HDFS_TABLE);
> THdfsTable hdfsTable = new THdfsTable(hdfsBaseDir_, getColumnNames(),
> nullPartitionKeyValue_, nullColumnValue_,
> /*idToPartition=*/ new HashMap<>(),
> /*prototypePartition=*/ new THdfsPartition());
> for (HdfsPartition part : partitionMap_.values()) {
>   hdfsTable.partitions.put(part.getId(), part.toMinimalTHdfsPartition());
> }
> hdfsTable.setHas_full_partitions(false);
> // The minimal catalog object of partitions contain the partition names.
> hdfsTable.setHas_partition_names(true);
> catalogObject.getTable().setHdfs_table(hdfsTable);
> return catalogObject;
>   }{code}
> Accessing table fields without holding the table read lock might be failed by 
> concurrent DDLs. All workloads that use this method (e.g. INVALIDATE 
> commands) could hit this issue. We've saw event-processor failed in 
> processing a RELOAD event that want to invalidates an HdfsTable:
> {noformat}
> E0216 16:23:44.283689   253 MetastoreEventsProcessor.java:899] Unexpected 
> exception received while processing event
> Java exc

[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844279#comment-17844279
 ] 

ASF subversion and git services commented on IMPALA-3127:
-

Commit 5d32919f46117213249c60574f77e3f9bb66ed90 in impala's branch 
refs/heads/branch-4.4.0 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d32919f4 ]

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.
See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta().

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Reviewed-on: http://gerrit.cloudera.org:8080/21326
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit ee21427d26620b40d38c706b4944d2831f84f6f5)


> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-server, performance
> Fix For: Impala 4.0.0
>
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12977) add search and pagination to /hadoop-varz

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844281#comment-17844281
 ] 

ASF subversion and git services commented on IMPALA-12977:
--

Commit 74960cca6f8edbda1cca4fbed43a2c75a89a690d in impala's branch 
refs/heads/branch-4.4.0 from Saurabh Katiyal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=74960cca6 ]

IMPALA-12977: add search and pagination to /hadoop-varz

Added search and pagination feature to /hadoop-varz

Change-Id: Ic8cac23b655fa58ce12d9857649705574614a5f0
Reviewed-on: http://gerrit.cloudera.org:8080/21329
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 7c98ebb7b149a03a1fdb10f0da124c3fd2265f5d)


>  add search and pagination to /hadoop-varz
> --
>
> Key: IMPALA-12977
> URL: https://issues.apache.org/jira/browse/IMPALA-12977
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Saurabh Katiyal
>Assignee: Saurabh Katiyal
>Priority: Minor
>  Labels: newbie
> Fix For: Impala 4.4.0
>
>
> /hadoop-varz has 2000+ configurations , It'd be nice to have some of the 
> tools like search and pagination that we get on /varz (flags.tmpl)
> existing template to /hadoop-varz:
> [https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/www/hadoop-varz.tmpl]
> existing template to /varz:
> [https://github.com/apache/impala/blob/ba4cb95b6251911fa9e057cea1cb37958d339fed/www/flags.tmpl]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-13054) EnqueueCompletedQuery very slow on deeply nested plans

2024-05-07 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13054.

Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> EnqueueCompletedQuery very slow on deeply nested plans
> --
>
> Key: IMPALA-13054
> URL: https://issues.apache.org/jira/browse/IMPALA-13054
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When running deeply nested queries - such as those in 
> tests/query_test/test_nested_types.py::TestMaxNestingDepth::test_max_nesting_depth
>  - with query logging enabled, the UnregisterQuery queue gets backed up due 
> to very slow execution of EnqueueCompletedQuery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure

2024-05-07 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-13018:


Assignee: Wenzhe Zhou  (was: Pranav Yogi Lodha)

> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
> failure
> 
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure

2024-05-07 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-13018:
-
Summary: Fix 
test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
failure  (was: Fix 
test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcdstpcds-decimal_v2-q80a
 failure)

> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
> failure
> 
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Pranav Yogi Lodha
>Priority: Major
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure

2024-05-07 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13018 started by Wenzhe Zhou.

> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
> failure
> 
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4024) Expose Impala metrics as a table

2024-05-07 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr updated IMPALA-4024:
---
Epic Link: IMPALA-12427

> Expose Impala metrics as a table
> 
>
> Key: IMPALA-4024
> URL: https://issues.apache.org/jira/browse/IMPALA-4024
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: supportability, usability, workload-management
> Attachments: 2cb86db8.diff, ce6933d3.diff, d556ca9.diff
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12768) Set iceberg.engine.hive.lock-enabled To false By Default

2024-05-07 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr updated IMPALA-12768:

Epic Link: IMPALA-12427

> Set iceberg.engine.hive.lock-enabled To false By Default 
> -
>
> Key: IMPALA-12768
> URL: https://issues.apache.org/jira/browse/IMPALA-12768
> Project: IMPALA
>  Issue Type: Improvement
>  Components: be
>Reporter: Jason Fehr
>Priority: Major
>  Labels: backend, iceberg, impala
>
> Iceberg tables are currently prone to leaving a permanent lock in HMS if 
> Impala shuts down while the lock is held. This was observed intermittently 
> during {{sys.impala_query_log}} testing and noted in 
> https://github.com/apache/iceberg/issues/2301. It's a short lock, so it's 
> fairly rare, but could happen and the only recourse is to delete rows from 
> HIVE_LOCKS in HMS's database.
> https://github.com/apache/iceberg/pull/6570 introduced an alternative update 
> mechanism in Iceberg 1.3 that depends on HIVE-26682. [Iceberg 
> documentation|https://iceberg.apache.org/docs/latest/configuration/#hadoop-configuration]
>  says to set the property "iceberg.engine.hive.lock-enabled
> {noformat}
> Warn: Setting iceberg.engine.hive.lock-enabled=false will cause HiveCatalog 
> to commit to tables without using Hive locks. This should only be set to 
> false if all following conditions are met:
> * HIVE-26882 is available on the Hive Metastore server
> * All other HiveCatalogs committing to tables that this HiveCatalog commits 
> to are also on Iceberg 1.3 or later
> * All other HiveCatalogs committing to tables that this HiveCatalog commits 
> to have also disabled Hive locks on commit.
> Failing to ensure these conditions risks corrupting the table.
> {noformat}
> Setting this property to false uses a new Hive atomic operation and avoids 
> taking a lock, so it can't get stuck if Impala shuts down at the wrong time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13062) Add TExprs of the columns to TBinaryPredicate for JDBC table

2024-05-07 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-13062:


 Summary: Add TExprs of the columns to TBinaryPredicate for JDBC 
table
 Key: IMPALA-13062
 URL: https://issues.apache.org/jira/browse/IMPALA-13062
 Project: IMPALA
  Issue Type: Sub-task
  Components: Frontend
Reporter: Wenzhe Zhou


We need to add TExprs of the columns to TBinaryPredicate for JDBC table so that 
we can support pushing down casting for JDBC table in future, and make JDBC 
DataSource decide if the predicates with casting can be pushed down. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13060) TestQueryLogTableBeeswax.test_query_log_table_query_select_dedicate_coordinator flaky

2024-05-07 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Fehr updated IMPALA-13060:

Epic Link: IMPALA-12427

> TestQueryLogTableBeeswax.test_query_log_table_query_select_dedicate_coordinator
>  flaky
> -
>
> Key: IMPALA-13060
> URL: https://issues.apache.org/jira/browse/IMPALA-13060
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Wenzhe Zhou
>Assignee: Jason Fehr
>Priority: Major
>
> The unit-test failed since the expected message "'Query successfully 
> unregistered: query_id=554dcf86d11dbd5f:0ea9f28d'" was not found in 
> the log file. 
> Stacktrace:
> custom_cluster/test_query_log.py:414: in 
> test_query_log_table_query_select_dedicate_coordinator
> client = self.get_client(vector.get_value('protocol'))
> custom_cluster/test_query_log.py:73: in get_client
> self.assert_impalad_log_contains("INFO", finish_re)
> common/impala_test_suite.py:1271: in assert_impalad_log_contains
> "impalad", level, line_regex, expected_count, timeout_s)
> common/impala_test_suite.py:1322: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-centos79-m6i-4xlarge-xldisk-04d5.vpc.cloudera.com.jenkins.log.INFO.20240506-052112.12754
>  matching regex 'Query successfully unregistered: 
> query_id=554dcf86d11dbd5f:0ea9f28d', but found 0 lines. Last line 
> was: 
> E   I0506 05:21:32.810438 14218 TAcceptQueueServer.cpp:355] New connection to 
> server beeswax-frontend from client 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-13060) TestQueryLogTableBeeswax.test_query_log_table_query_select_dedicate_coordinator flaky

2024-05-07 Thread Jason Fehr (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13060 started by Jason Fehr.
---
> TestQueryLogTableBeeswax.test_query_log_table_query_select_dedicate_coordinator
>  flaky
> -
>
> Key: IMPALA-13060
> URL: https://issues.apache.org/jira/browse/IMPALA-13060
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Wenzhe Zhou
>Assignee: Jason Fehr
>Priority: Major
>
> The unit-test failed since the expected message "'Query successfully 
> unregistered: query_id=554dcf86d11dbd5f:0ea9f28d'" was not found in 
> the log file. 
> Stacktrace:
> custom_cluster/test_query_log.py:414: in 
> test_query_log_table_query_select_dedicate_coordinator
> client = self.get_client(vector.get_value('protocol'))
> custom_cluster/test_query_log.py:73: in get_client
> self.assert_impalad_log_contains("INFO", finish_re)
> common/impala_test_suite.py:1271: in assert_impalad_log_contains
> "impalad", level, line_regex, expected_count, timeout_s)
> common/impala_test_suite.py:1322: in assert_log_contains
> (expected_count, log_file_path, line_regex, found, line)
> E   AssertionError: Expected 1 lines in file 
> /data0/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad.impala-ec2-centos79-m6i-4xlarge-xldisk-04d5.vpc.cloudera.com.jenkins.log.INFO.20240506-052112.12754
>  matching regex 'Query successfully unregistered: 
> query_id=554dcf86d11dbd5f:0ea9f28d', but found 0 lines. Last line 
> was: 
> E   I0506 05:21:32.810438 14218 TAcceptQueueServer.cpp:355] New connection to 
> server beeswax-frontend from client 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters

2024-05-07 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-9577:
---
Fix Version/s: Impala 4.0.0
   (was: Impala 3.4.0)

> Use `system_unsync` time for Kudu test clusters
> ---
>
> Key: IMPALA-9577
> URL: https://issues.apache.org/jira/browse/IMPALA-9577
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> Recently Kudu made enhancements to time source configuration and adjusted the 
> time source for local clusters/tests to `system_unsync`. Impala should mirror 
> that behavior in Impala test clusters given there is no need to require 
> NTP-synchronized clock for a test where all the participating Kudu masters 
> and tablet servers are run at the same node using the same local wallclock.
>  
> See the Kudu commit here for details: 
> [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11622) Impala load data command fails when the impala user has access on source file through Ranger policy

2024-05-07 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-11622.
--
Resolution: Duplicate

This is a duplicate of IMPALA-10272, which has already been resolved.

> Impala load data command fails when the impala user has access on source file 
> through Ranger policy
> ---
>
> Key: IMPALA-11622
> URL: https://issues.apache.org/jira/browse/IMPALA-11622
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek
>Priority: Major
>
> When trying to run the load data command in Impala, 
> if the Impala user has access on the source file through a Ranger HDFS policy,
> then the load data command fails.
> If the impala user has access on the source file through HDFS ACLs,
> then the load data command executes successfully.
> Steps to reproduce :-
> Ranger policy setup
> HDFS policies
> Policy 1 :-
> All access policy for HDFS user
> user - hdfs
> resources - * , recursive=true
> access - all access allowed
> Policy 2 :-
> Access for impala user on /root_test_dir/test_dir_2
> user - impala 
> resources - /root_test_dir/test_dir_2 , recursive = true
> access - all access allowed
> Hadoop SQL policies
> Policy 1 : All access policy for hrt_qa, hive and impala user
> users - hrt_qa, impala, hive
> resources - db - *, table - *, column - *
> access - all access allowed
> Policy 2 : Url policy for hrt_qa user
> users - hrt_qa
> resources :- url - *
> access - all access allowed
> Data setup :-
> In HDFS,
> create the following directories as the hdfs user
> {code:java|bgColor=#f4f5f7}
> /root_test_dir
> /root_test_dir/test_dir_1
> /root_test_dir/test_dir_2{code}
> Create a text file in local machine temp.txt with the any content ( for ex :- 
> Hello World)
> Then copy the temp.txt file to the HDFS dirs /root_test_dir/test_dir_1 and 
> /root_test_dir/test_dir_2 
> Set the ACLs for /root_test_dir/test_dir_1 to 777 recursively
> {code:java|bgColor=#f4f5f7}
> hdfs dfs -chmod -R 777 /root_test_dir/test_dir_1 {code}
>  
> Set the ACLs for /root_test_dir/test_dir_2 to 000 recursively
> {code:java|bgColor=#f4f5f7}
> hdfs dfs -chmod -R 000 /root_test_dir/test_dir_2{code}
> (Run all the hdfs commands as the hdfs user)
> In Impala-shell, as hrt_qa user
> create a test_db and create a test_table under test_db.
> {code:java|bgColor=#f4f5f7}
> CREATE TABLE test_db.test_table(c0 string) STORED AS TEXTFILE 
> TBLPROPERTIES('transactional'='false'){code}
>  
> Run the LOAD DATA command as hrt_qa user :-
> {code:java|bgColor=#f4f5f7}
> test_db> LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE 
> test_db.test_table
>                                                            > ;
> Query: LOAD DATA INPATH '/root_test_dir/test_dir_1/temp.txt' INTO TABLE 
> test_db.test_table
> +--+
> | summary                                                  |
> +--+
> | Loaded 1 file(s). Total files in destination location: 1 |
> +--+
> Fetched 1 row(s) in 6.56s {code}
> Failing case :-
> {code:java}
> test_db> LOAD DATA INPATH '/root_test_dir/test_dir_2/temp.txt' INTO TABLE 
> test_db.test_table; Query: LOAD DATA INPATH 
> '/root_test_dir/test_dir_2/temp.txt' INTO TABLE test_db.test_table ERROR: 
> AccessControlException: Permission denied: user=impala, access=READ, 
> inode="/warehouse/tablespace/external/hive/test_db.db/test_table/.tmp_4b9b3a83-f4f9-4363-81ae-21f5c170c1bd/temp.txt":hdfs:supergroup:--
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13061) Query Live table fails to load if default_transactional_type=insert_only set globally

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844535#comment-17844535
 ] 

ASF subversion and git services commented on IMPALA-13061:
--

Commit 1233ac3c579b5929866dba23debae63e5d2aae90 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1233ac3c5 ]

IMPALA-13061: Create query live as external table

Impala determines whether a managed table is transactional based on the
'transactional' table property. It assumes any managed table with
transactional=true returns non-null getValidWriteIds.

When 'default_transactional_type=insert_only' is set at startup (via
default_query_options), impala_query_live is created as a managed table
with transactional=true, but SystemTables don't implement
getValidWriteIds and are not meant to be transactional.

DataSourceTable has a similar problem, and when a JDBC table is
created setJdbcDataSourceProperties sets transactional=false. This
patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not
created as a managed table and 'transactional' is not set. That avoids
creating a SystemTable that Impala can't read (it encounters an
IllegalStateException).

Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65
Reviewed-on: http://gerrit.cloudera.org:8080/21401
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Query Live table fails to load if default_transactional_type=insert_only set 
> globally
> -
>
> Key: IMPALA-13061
> URL: https://issues.apache.org/jira/browse/IMPALA-13061
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> If transactional type defaults to insert_only for all queries via
> {code}
> --default_query_options=default_transactional_type=insert_only
> {code}
> the table definition for {{sys.impala_query_live}} is set to transactional, 
> which causes an exception in catalogd
> {code}
> I0506 22:07:42.808758  3972 jni-util.cc:302] 
> 4547b965aeebc5f0:8ba96c58] java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at org.apache.impala.catalog.Table.getPartialInfo(Table.java:851)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3818)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3714)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3681)
> at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$10(JniCatalog.java:431)
> at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
> at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:253)
> at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:430)
> {code}
> We need to override that setting while creating {{sys.impala_query_live}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org