[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868173#comment-16868173 ] Robbie Zhang commented on IMPALA-7093: -- Hi [~bharathv] , thanks for your comment. My fix only changes the behavior within one catalog update. The risks are: 1) The ImpaladCatalog keeps the removed table/function objects 2) The ImpaladCatalog doesn't replace stale table/function objects For 1), I think it's alright unless for some reason the catalogd doesn't include the removed table/function objects into deleteLog_. But for 2), I do find a scenario in which it could happen. For example, when the catalogd is restarted while impala daemons are running, the catalog object versions are reset and might be lower than the version of objects in impala daemons. It will definitely break my fix. So I just came up with an idea to improve my fix. It's not so smart but it should work. I think my fix can be improved as: a) [ImpaladCatalog.addCatalogObject|https://github.com/apache/impala/blob/30c3cd95a42cacbfa2dbb0b29a4757745af942c3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java#L291] removes existing table object first then adds it back, just as what we do for function objects; b) ImpaladCatalog.addCatalogObject adds the name of all updated table/function objects to a list or map, and [ImpaladCatalog.updateCatalog|https://github.com/apache/impala/blob/30c3cd95a42cacbfa2dbb0b29a4757745af942c3/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java#L226] removes all table/function object which are not in the list or map. With the above improvement, I believe the fix has no side effect as long as the performance is acceptable. How do you reckon? > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868033#comment-16868033 ] bharath v commented on IMPALA-7093: --- I think the root cause of the problem is that, unlike other DDLs, the result of "invalidate metadata" propagates async and is not directly applied on the coordinator and may span multiple statestore updates. Roughly, it is as follows. 1. Coordinator calls Catalog Server with execResetMetadata() 2. Catalog server calls reset() and returns the latest version number of the Catalog server after reset()ing. 3. Coordinator waits in WaitForMinCatalogUpdate() until the update version in step 2 is applied locally. Between (2) and (3) statestore broadcast loop runs propagating the changes happened in reset(). Now depending when the statestore update propagation is scheduled, the "target version" of the update is already decided, which means that the subsequent update may or may not propagate the entire delta of reset() (which is potentially why the loop in step (3) can span multiple updates intervals). Now if a partial update is applied and new queries are fired, we might see various weird errors like mentioned above. Thoughts? [~robbiezhang] I'm not super convinced about the fix. I'm wondering if there are any side effects of mixing older versions with newer top level objects. I see your point that the older objects are removed eventually by subsequent updates but I'm not convinced that it doesn't break anything else. I'm also wondering if we should make "invalidate metadata" like any other DDL, basically by propagating the result as the DDL output and applying it right away. (the footprint could be bigger than usual, bunch of Dbs, functions, privileges, and a ton of IncompleteTables) . > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867278#comment-16867278 ] Robbie Zhang commented on IMPALA-7093: -- I added a parameter --running-invalidate-metadata for tests/stress/concurrent_select.py to start a thread which keeps running 'invalidate metadata'. By running concurrent_select.py with "--running-invalidate-metadata=true", I can reproduce this issue easily: {code:java} # tests/stress/concurrent_select.py --minicluster-num-impalads 3 --max-queries=1000 --startup-queries-per-second=10 --tpch-db=tpch_parquet --running-invalidate-metadata=true Cluster Impalad Version Info: localhost: impalad version 3.3.0-SNAPSHOT DEBUG (build ab5ee0b7857c6ad19f244dc308210f2809436684) Built on Tue Jun 18 05:04:27 PDT 2019 localhost: impalad version 3.3.0-SNAPSHOT DEBUG (build ab5ee0b7857c6ad19f244dc308210f2809436684) Built on Tue Jun 18 05:04:27 PDT 2019 localhost: impalad version 3.3.0-SNAPSHOT DEBUG (build ab5ee0b7857c6ad19f244dc308210f2809436684) Built on Tue Jun 18 05:04:27 PDT 2019 2019-06-18 06:24:39,494 27754 Thread-8 INFO:cluster[705]:Finding impalad binary location 2019-06-18 06:24:39,494 27754 Thread-7 INFO:cluster[705]:Finding impalad binary location 2019-06-18 06:24:39,494 27754 Thread-9 INFO:cluster[705]:Finding impalad binary location 2019-06-18 06:24:39,843 27754 MainThread INFO:queries[115]:Loading tpch queries 2019-06-18 06:24:39,843 27754 MainThread INFO:test_file_parser[336]:Loading tpch queries Using 25 queries 2019-06-18 06:24:39,865 27754 MainThread INFO:concurrent_select[1508]:Number of queries in the list: 25 Done | Active | Executing | Mem Lmt Ex | AC Reject | AC Timeout | Cancel | Err | Incorrect | Next Qry Mem Lmt | Tot Qry Mem Lmt | Tracked Mem | RSS Mem 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 | 0 | | 5 | 43 |28 | 0 | 0 | 0 | 3 | 0 | 0 | 92 |6802 | 617 |1964 8 | 75 |39 | 0 | 0 | 0 | 5 | 0 | 0 | 155 | 12286 |2693 |2521 Process Process-49: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "tests/stress/concurrent_select.py", line 663, in _start_single_runner mesg=error_msg)) Exception: Query tpch_parquet_TPCH-Q16 ID None failed: AnalysisException: Could not resolve table reference: 'part' Aborting due to 1 successive errors encountered {code} After I changed ImpaladCatalogd.java, concurrent_select.py can execute 1 queries without exception: {code:java} 9998 | 2 | 2 | 0 | 0 | 0 | 1018 | 0 | 0 | 420 | 602 | | Query runner (16866) exited with exit code 0 Query runner (15888) exited with exit code 0 1 | 0 | 0 | 0 | 0 | 0 | 1018 | 0 | 0 | 420 | 0 | | 2019-06-18 23:11:38,444 15605 MainThread INFO:concurrent_select[844]:Test Duration: 12071 seconds {code} I also started another test in which concurrent_select.py starts 10 queries on another cluster. It's still in progress. Nearly 6 queries have been executed without exception so far. > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862718#comment-16862718 ] Robbie Zhang commented on IMPALA-7093: -- Thank you, [~tarmstrong]! I find the problem is in [Catalogd.updateCatalog()|https://github.com/apache/impala/blob/ab908d54c22861967f693428ec7d9f6d7008607f/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java#L191]. This function always adds top-level catalog objects first. When we run 'invalidate metadata', it adds the database objects then table/view/function objects. But at that time the new database objects are empty, no table/view/function object in them. After function [Catalogd.addDB()|https://github.com/apache/impala/blob/ab908d54c22861967f693428ec7d9f6d7008607f/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java#L391] replaces the existing database objects with the new database objects, the existing table/view/function objects are lost until Catalogd.updateCatalog() adds these objects back. If the impalad compiles a query when the table/view/function objects disappear, the query will fail with AnalysisException. The error message various in the different type of queries. For example, for 'desc table', we can see 'Could not resolve path', for 'select * from table', we can see 'Could not resolve table reference', for 'insert into table', we can see 'Table does not exist', etc. I can reproduce this issue by running two scripts. The first script keeps running 'invalidate metadata': {code:java} #!/bin/bash while [ 1 ] do shell/impala-shell -q "invalidate metadata" done {code} After I start the first script, I run the second script which keeps running a query: {code:java} #!/bin/bash while [ 1 ] do #shell/impala-shell -q "desc test" 2>&1| tee test.output #shell/impala-shell -q "select * from test" 2>&1| tee test.output shell/impala-shell -q "insert overwrite test(i) values(1)" 2>&1| tee test.output n=`egrep "Fetched |Modified " test.output | wc -l` if [ $n -lt 1 ]; then exit fi done{code} The more table/view/function objects there are, the longer the objects disappear, and the easier the second script hit AnalysisException. I created thousands tables on my cluster. Sometimes the second script hit AnalysisException in a couple of minutes while sometimes it takes nearly half an hour. Anyway, it's repeatable. I changed ImpaladCatalog.java as the following. So far, I haven't see the AnalysisException again. Seems the issue has gone. {code:java} diff --git a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java index 13cb620..23a7d68 100644 --- a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java +++ b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java @@ -20,6 +20,8 @@ package org.apache.impala.catalog; import java.nio.ByteBuffer; import java.util.ArrayDeque; import java.util.Set; +import java.util.Map; +import java.util.List; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.atomic.AtomicReference; @@ -388,6 +390,19 @@ public class ImpaladCatalog extends Catalog implements FeCatalog { existingDb.getCatalogVersion() < catalogVersion) { Db newDb = Db.fromTDatabase(thriftDb); newDb.setCatalogVersion(catalogVersion); + if (existingDb != null) { + // Migrant all existing table/view/function to newDb. Otherwise they + // will disappear temporarily. + for (Table tbl: existingDb.getTables()) { + newDb.addTable(tbl); + } + Map> functions = existingDb.getAllFunctions(); + for (List fns: existingDb.getAllFunctions().values()) { + for (Function f: fns) { + newDb.addFunction(f); + } + } + } addDb(newDb); if (existingDb != null) { CatalogObjectVersionSet.INSTANCE.updateVersions( {code} Adding a lock into Catalog is another solution. But the change will be more complex. In my change, one possible problem is that if the new database object has less table/view/function objects than the existing database object, the deleted object might be left in Catalog forever. According to my test, the deleted objects should be in sequencer.getDeletedObjects() and will be removed by [ImpaladCatalog.removeCatalogObject()|https://github.com/apache/impala/blob/ab908d54c22861967f693428ec7d9f6d7008607f/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java#L229]. So I think my change is fine. Please correct me if I'm wrong. > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA >
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862590#comment-16862590 ] Tim Armstrong commented on IMPALA-7093: --- [~robbie] had a theory that this had the same root cause as IMPALA-5087 > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495808#comment-16495808 ] Todd Lipcon commented on IMPALA-7093: - I've now managed to reproduce on trunk as well as some earlier versions (2.12, 2.11). I've also reproed with -load_catalog_in_background set to both false and true. So, I don't think this is a regression after all. > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495543#comment-16495543 ] Todd Lipcon commented on IMPALA-7093: - Been digging a bit more. This seems to affect one cluster which has 'load_catalog_in_background' set, but can't seem to repro on another cluster which does not. > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7093) Tables briefly appear to not exist after INVALIDATE METADATA or catalog restart
[ https://issues.apache.org/jira/browse/IMPALA-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494542#comment-16494542 ] Todd Lipcon commented on IMPALA-7093: - Testing on Impala 2.10 I haven't been able to reproduce this, so appears it might be a regression, though I'll continue to attempt it. > Tables briefly appear to not exist after INVALIDATE METADATA or catalog > restart > --- > > Key: IMPALA-7093 > URL: https://issues.apache.org/jira/browse/IMPALA-7093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0, Impala 2.13.0 >Reporter: Todd Lipcon >Priority: Major > > I'm doing some stress testing of Impala 2.13 (recent snapshot build) and hit > the following sequence: > {code} > {"query": "SHOW TABLES in consistency_test", "type": "call", "id": 3} > {"type": "response", "id": 3, "results": [["t1"]]} > {"query": "INVALIDATE METADATA", "type": "call", "id": 7} > {"type": "response", "id": 7} > {"query": "DESCRIBE consistency_test.t1", "type": "call", "id": 9} > {"type": "response", "id": 9, "error": "AnalysisException: Could not resolve > path: 'consistency_test.t1'\n"} > {code} > i.e. 'SHOW TABLES' shows that a table exists, but then shortly after an > INVALIDATE METADATA, an attempt to describe a table indicates that the table > does not exist. This is a single-threaded test case against a single impalad. > I also saw a similar behavior that issuing queries to an impalad shortly > after a catalogd restart could transiently show tables not existing that in > fact exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org