[ 
https://issues.apache.org/jira/browse/IMPALA-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163169#comment-17163169
 ] 

Quanlong Huang commented on IMPALA-9848:
----------------------------------------

The cause is the table got invalidated in LocalCatalog when processing the 
statestore catalog update. If we run the query twice before killing the 
catalogd and make sure the second run is triggered after processing the 
statestore update (e.g. 2s after the first run), then the table can still be 
accessable after the catalogd is killed. I think we can fix this by not sending 
invalidations when an IncompleteTable become loaded in catalogd.

However, LocalCatalog coordinators are not guaranteed to work without catalogd. 
Only loaded metadata in LocalCatalog can survive in a catalogd crash. For 
instance, queries on unloaded partitions still depend on a live catalogd. So if 
the pattern is 1) select some partitions of a table 2) catalogd crash 3) select 
some other partitions of the same table on the same LocalCatalog coordinator, 
the query will still fail. On the other hand, cached items will age out and be 
evicted (default to 1h) in LocalCatalog. So metadata that haven't been accessed 
for 1h will depend on a live catalogd too.

 

> Coordinator unnecessarily invalidating locally cached table metadata
> --------------------------------------------------------------------
>
>                 Key: IMPALA-9848
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9848
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>            Reporter: Sahil Takiar
>            Priority: Major
>         Attachments: IMPALA-9848-catalogd.INFO, IMPALA-9848-impalad.INFO
>
>
> The following fails when run locally on master:
> {code:java}
> ./bin/start-impala-cluster.py --catalogd_args='--catalog_topic_mode=minimal' 
> --impalad_args='--use_local_catalog'
> ./bin/impala-shell.sh
> [localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- 
> THIS WORKS
> # kill the catalogd process
> [localhost:21000] default> select count(l_comment) from tpch.lineitem; <--- 
> THIS FAILS
> ERROR: AnalysisException: Failed to load metadata for table: 'tpch.lineitem'
> CAUSED BY: TableLoadingException: Could not load table tpch.lineitem from 
> catalog
> CAUSED BY: TException: org.apache.impala.common.InternalException: Couldn't 
> open transport for localhost:26000 (connect() failed: Connection 
> refused)CAUSED BY: InternalException: Couldn't open transport for 
> localhost:26000 (connect() failed: Connection refused {code}
> The above experiment works with catalog v1 - e.g. if you remove the startup 
> flags in the {{./bin/start-impala-cluster.py}} everything works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to