[jira] [Commented] (IMPALA-12771) Impala catalogd events-skipped may mark the wrong number

2024-05-30 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850908#comment-17850908
 ] 

Maxwell Guo commented on IMPALA-12771:
--

ping again , and update the pr for the latest master branch code in case of 
merge conflict.  [~mylogi...@gmail.com][~stigahuang][~VenuReddy]

> Impala catalogd events-skipped may mark the wrong number
> 
>
> Key: IMPALA-12771
> URL: https://issues.apache.org/jira/browse/IMPALA-12771
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>
> See the description of [event-skipped 
> metric|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L237]
>  
> {code:java}
>  // total number of events which are skipped because of the flag setting or
>   // in case of [CREATE|DROP] events on [DATABASE|TABLE|PARTITION] which were 
> ignored
>   // because the [DATABASE|TABLE|PARTITION] was already [PRESENT|ABSENT] in 
> the catalogd.
> {code}
>  
> As for CREATE and DROP event on Database/Table/Partition (Also AddPartition 
> is inclued) when we found that the table/database when the database or table 
> is not found in the cache then we will skip the event process and make the 
> event-skipped metric +1.
> But I found that there is some question here for alter table and Reload event:
> * For Reload event that is not describe in the description of events-skipped, 
> but the value is +1 when is oldevent;
> * Besides if the table is in blacklist the metric will also +1
> In summary, I think this description is inconsistent with the actual 
> implementation.
> So can we also mark the events-skipped metric for alter partition events and 
> modify the 
> description  to be all the events skipped 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12705) Add a page to show the catalog's HA information

2024-05-30 Thread Zhi Tang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849608#comment-17849608
 ] 

Zhi Tang edited comment on IMPALA-12705 at 5/31/24 2:41 AM:


Add /catalog_ha_info page on the WebUI of statestored, which will display the 
following information:

!image-2024-05-27-10-57-37-158.png|width=607,height=332!


was (Author: tangzhi):
Add /catalog-ha-info page on the WebUI of statestored, which will display the 
following information:

!image-2024-05-27-10-57-37-158.png|width=607,height=332!

> Add a page to show the catalog's HA information
> ---
>
> Key: IMPALA-12705
> URL: https://issues.apache.org/jira/browse/IMPALA-12705
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.3.0
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Major
> Attachments: image-2024-05-27-10-57-37-158.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850883#comment-17850883
 ] 

ASF subversion and git services commented on IMPALA-13057:
--

Commit 825900fa6c3a51941b7b90edb8af6f7dba5e5fe8 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=825900fa6 ]

IMPALA-13057: Incorporate tuple/slot information into tuple cache key

The tuple cache keys currently do not include information about
the tuples or slots, as that information is stored outside
the PlanNode thrift structures. The tuple/slot information is
critical to determining which columns are referenced and what
data layout the result tuple has. This adds code to incorporate
the TupleDescriptors and SlotDescriptors into the cache key.

Since the tuple and slot ids are indexes into a global structure
(the descriptor table), they hinder cache key matches across
different queries. If a query has an extra filter, it can shift
all the slot ids. If the query has an extra join, it can
shift all the tuple ids. To eliminate this effect, this adds the
ability to translate tuple and slot ids from global indices to
local indices. The translation only contains information from the
subtree below that point, so it is not influenced by unrelated
parts of the query.

When the code registers a tuple with the TupleCacheInfo, it also
registers a translation from the global index to a local index.
Any code that puts SlotIds or TupleIds into a Thrift data structure
can use the translateTupleId() and translateSlotId() functions to
get the local index. These are exposed on ThriftSerializationCtx
by functions of the same name, but those functions apply the
translation only when working for the tuple cache.

This passes the ThriftSerializationCtx into Exprs that have
TupleIds or SlotIds and applies the translation. It also passes
the ThriftSerializationCtx into PlanNode::toThrift(), which is
used to translate TupleIds in HdfsScanNode.

This also adds a way to register a table with the tuple cache
and incorporate information about it. This allows us to mask
out additional fields in PlanNode and enable a test case that
relies on matching with different table aliases.

Testing:
 - This fixes some commented out test cases in TupleCacheTest
   (specifically telling columns apart)
 - This adds new test cases that match due to id translation
   (extra filters, extra joins)
 - This adds a unit test for the id translation to
   TupleCacheInfoTest

Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Reviewed-on: http://gerrit.cloudera.org:8080/21398
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Incorporate tuple/slot information into the tuple cache key
> ---
>
> Key: IMPALA-13057
> URL: https://issues.apache.org/jira/browse/IMPALA-13057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the tuple and slot information is kept separately in the descriptor 
> table, it does not get incorporated into the PlanNode thrift used for the 
> tuple cache key. This means that the tuple cache can't distinguish between 
> these two queries:
> {noformat}
> select int_col1 from table;
> select int_col2 from table;{noformat}
> To solve this, the tuple/slot information needs to be incorporated into the 
> cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
> place to serialize the TupleDescriptor/SlotDescriptors and incorporate it 
> into the hash.
> The tuple ids and slot ids are global ids, so the value is influenced by the 
> entirety of the query. This is a problem for matching cache results across 
> different queries. As part of incorporating the tuple/slot information, we 
> should also add an ability to translate tuple/slot ids into ids local to a 
> subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-05-30 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-12800:
--

Assignee: Michael Smith

> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-05-30 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12800 started by Michael Smith.
--
> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     at 
> org.apache.impala.analysis.Analyzer.isTrueWithNullSlots(Analyzer.java:2888)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.requiresNullWrapping(TupleIsNullPredicate.java:181)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExpr(TupleIsNullPredicate.java:147)
>     at 
> org.apache.impala.analysis.TupleIsNullPredicate.wrapExprs(TupleIsNullPredicate.java:136){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13119) CostingSegment.java is initialized with wrong cost

2024-05-30 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-13119:
-

 Summary: CostingSegment.java is initialized with wrong cost
 Key: IMPALA-13119
 URL: https://issues.apache.org/jira/browse/IMPALA-13119
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


CostingSegment.java has two public constructor: one accept PlanNode, while the 
other accept DataSink as parameter. Both call appendCost method, which sum the 
additionalCost with the segment's current cost_.

However, if cost_ were ProcessingCost.zero(), it can mistakenly 
setNumRowToConsume to 0.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/CostingSegment.java#L114]
 

The public constructor should just initialize cost_ with ProcessingCost from 
PlanNode or DataSink from constructor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-915) Ability to cancel queries while in the FE.

2024-05-30 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850828#comment-17850828
 ] 

Michael Smith commented on IMPALA-915:
--

We might be able to identify the thread processing query analysis and send an 
interrupt rather than have to add a bunch of cancellation checks.

> Ability to cancel queries while in the FE.
> --
>
> Key: IMPALA-915
> URL: https://issues.apache.org/jira/browse/IMPALA-915
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 1.3, Impala 2.3.0
>Reporter: Alexander Behm
>Assignee: Michael Smith
>Priority: Major
>  Labels: query-lifecycle
>
> We currently can't cancel queries that are being analyzed/planned. In 
> particular, this is undesirable if we have to wait for metadata to be loaded 
> from the CatalogServer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions

2024-05-30 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated IMPALA-13117:
---
Labels: catalog-2024  (was: )

> Improve the heap usage during metadata loading and DDL/DML executions
> -
>
> Key: IMPALA-13117
> URL: https://issues.apache.org/jira/browse/IMPALA-13117
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: catalog-2024
>
> The JVM heap size of catalogd is not just used by the metadata cache. The 
> in-progress metadata loading threads and DDL/DML executions also creates temp 
> objects, which introduces spikes in the heap usage. We should improve the 
> heap usage in this part, especially when the metadata loading is slow due to 
> external slowness (e.g. listing files on S3).
> CC [~mylogi...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated IMPALA-13116:
---
Labels: catalog-2024  (was: )

> In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if 
> the table is invalidated
> ---
>
> Key: IMPALA-13116
> URL: https://issues.apache.org/jira/browse/IMPALA-13116
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: catalog-2024
>
> A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
>  * User can explictly trigger an INVALIDATE METADATA  command
>  * The table could be invalidated by CatalogdTableInvalidator when 
> invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned 
> on
> Note that invalidating a table doesn't require holding the lock of the 
> HdfsTable object so it can finish even if there are on-going updates on the 
> table.
> The updated HdfsTable object won't be added to the metadata cache since it 
> has been replaced with an IncompleteTable object. It's only used in the 
> DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
> representation which is mostly the table name and catalog version. We don't 
> need the updates on the HdfsTable object to be finished. Thus, we can 
> consider aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-13118) Removing explicit ascii encoding of kerberos_host_fqdn when -b/--kerberos_host_fqdn is used

2024-05-30 Thread Vincent Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Tran closed IMPALA-13118.
-
Resolution: Duplicate

> Removing explicit ascii encoding of kerberos_host_fqdn when 
> -b/--kerberos_host_fqdn is used
> ---
>
> Key: IMPALA-13118
> URL: https://issues.apache.org/jira/browse/IMPALA-13118
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 4.4.0
>Reporter: Vincent Tran
>Assignee: Vincent Tran
>Priority: Major
>  Labels: Python3, impala-shell
>
> IMPALA-651 added an explicit encoding for sasl_host to ascii before passing 
> it to sasl_client.setAttr(). This is no longer required after sasl was 
> upgraded to 0.2.1 in IMPALA-9719.  This explicit encoding also causes an 
> error in Python 3 when -b is used:
> {noformat}
> Starting Impala Shell with Kerberos authentication using Python 3.6.8
> Using service name 'impala'
> SSL is enabled. Impala server certificates will NOT be verified (set 
> --ca_cert to change)
> Error connecting: TTransportException, Could not start SASL: b'Error in 
> sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Failure: no 
> serverFQDN'
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13118) Removing explicit ascii encoding of kerberos_host_fqdn when -b/--kerberos_host_fqdn is used

2024-05-30 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850796#comment-17850796
 ] 

Vincent Tran commented on IMPALA-13118:
---

This duplicates IMPALA-12552.

> Removing explicit ascii encoding of kerberos_host_fqdn when 
> -b/--kerberos_host_fqdn is used
> ---
>
> Key: IMPALA-13118
> URL: https://issues.apache.org/jira/browse/IMPALA-13118
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 4.4.0
>Reporter: Vincent Tran
>Assignee: Vincent Tran
>Priority: Major
>  Labels: Python3, impala-shell
>
> IMPALA-651 added an explicit encoding for sasl_host to ascii before passing 
> it to sasl_client.setAttr(). This is no longer required after sasl was 
> upgraded to 0.2.1 in IMPALA-9719.  This explicit encoding also causes an 
> error in Python 3 when -b is used:
> {noformat}
> Starting Impala Shell with Kerberos authentication using Python 3.6.8
> Using service name 'impala'
> SSL is enabled. Impala server certificates will NOT be verified (set 
> --ca_cert to change)
> Error connecting: TTransportException, Could not start SASL: b'Error in 
> sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Failure: no 
> serverFQDN'
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13118) Removing explicit ascii encoding of kerberos_host_fqdn when -b/--kerberos_host_fqdn is used

2024-05-30 Thread Vincent Tran (Jira)
Vincent Tran created IMPALA-13118:
-

 Summary: Removing explicit ascii encoding of kerberos_host_fqdn 
when -b/--kerberos_host_fqdn is used
 Key: IMPALA-13118
 URL: https://issues.apache.org/jira/browse/IMPALA-13118
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Affects Versions: Impala 4.4.0
Reporter: Vincent Tran
Assignee: Vincent Tran


IMPALA-651 added an explicit encoding for sasl_host to ascii before passing it 
to sasl_client.setAttr(). This is no longer required after sasl was upgraded to 
0.2.1 in IMPALA-9719.  This explicit encoding also causes an error in Python 3 
when -b is used:

{noformat}
Starting Impala Shell with Kerberos authentication using Python 3.6.8
Using service name 'impala'
SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert 
to change)
Error connecting: TTransportException, Could not start SASL: b'Error in 
sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Failure: no serverFQDN'
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions

2024-05-30 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13117:
---

 Summary: Improve the heap usage during metadata loading and 
DDL/DML executions
 Key: IMPALA-13117
 URL: https://issues.apache.org/jira/browse/IMPALA-13117
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


The JVM heap size of catalogd is not just used by the metadata cache. The 
in-progress metadata loading threads and DDL/DML executions also creates temp 
objects, which introduces spikes in the heap usage. We should improve the heap 
usage in this part, especially when the metadata loading is slow due to 
external slowness (e.g. listing files on S3).

CC [~mylogi...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13116:
---

Assignee: Quanlong Huang

> In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if 
> the table is invalidated
> ---
>
> Key: IMPALA-13116
> URL: https://issues.apache.org/jira/browse/IMPALA-13116
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
>  * User can explictly trigger an INVALIDATE METADATA  command
>  * The table could be invalidated by CatalogdTableInvalidator when 
> invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned 
> on
> Note that invalidating a table doesn't require holding the lock of the 
> HdfsTable object so it can finish even if there are on-going updates on the 
> table.
> The updated HdfsTable object won't be added to the metadata cache since it 
> has been replaced with an IncompleteTable object. It's only used in the 
> DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
> representation which is mostly the table name and catalog version. We don't 
> need the updates on the HdfsTable object to be finished. Thus, we can 
> consider aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13116:
---

 Summary: In local-catalog mode, abort REFRESH and metadata 
reloading of DDL/DMLs if the table is invalidated
 Key: IMPALA-13116
 URL: https://issues.apache.org/jira/browse/IMPALA-13116
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang


A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
 * User can explictly trigger an INVALIDATE METADATA  command
 * The table could be invalidated by CatalogdTableInvalidator when 
invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on

Note that invalidating a table doesn't require holding the lock of the 
HdfsTable object so it can finish even if there are on-going updates on the 
table.

The updated HdfsTable object won't be added to the metadata cache since it has 
been replaced with an IncompleteTable object. It's only used in the 
DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
representation which is mostly the table name and catalog version. We don't 
need the updates on the HdfsTable object to be finished. Thus, we can consider 
aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org