[ 
https://issues.apache.org/jira/browse/IMPALA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522592#comment-17522592
 ] 

Quanlong Huang commented on IMPALA-7168:
----------------------------------------

I think another solution is setting a retry limit for UpdateCatalogCache(). 
E.g. if it continously fails 10 times, the coordinator should shutdown itself.

> DML query may hang if CatalogUpdateCallback() encounters repeated error
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-7168
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7168
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>            Reporter: Pranay Singh
>            Priority: Major
>
> DML queries or INSERT  will encounter a hang, if 
> exec_env_->frontend()->UpdateCatalogCache() in 
> ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 
> This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
> it's catalog version to become current.
> The scenario shows up like this, lets say there are two coordinator nodes , 
> Node A, Node B
> and catalogd and statestored are running on Node C.
> a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
> running the query is going to block in 
> impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
> version to become current.
> b) Meanwhile statestored running on Node C would call 
> ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
> topic update, which would not happen if we encounter repeated errors, say 
> front end is low on memory (low JVM heap situation).
> c) In such case Node A will wait indefinitely waiting for it's catalog 
> version to become current, till Node B is shutdown voluntarily.
> Note: This is a case where Node B is reachable (hearbeat is fine, but node is 
> in a bad state, non working).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to