[jira] [Commented] (IMPALA-12788) HBaseTable still get loaded even if HBase is down

ASF subversion and git services (Jira) Fri, 09 Feb 2024 22:36:08 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816281#comment-17816281
 ]


ASF subversion and git services commented on IMPALA-12788:
----------------------------------------------------------

Commit 11d2fe4fc00a1e6ef2d3a45825be9845456adc1d in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=11d2fe4fc ]

IMPALA-12788: Fix HBaseTable still get loaded even if HBase is down

When loading a table backed by HBase, it's intended to check whether the
table exists in HBase in HBaseTable.load() and loadFromThrift().
However, the current check just gets the table object and then closes
it. This won't fail even if HBase is down. See JIRA description for the
stacktrace.

This patch fixes the check to fetch the column family names which is a
light-weight request and will fail if HBase is down or the table doesn't
exist in HBase.

Splits the following tests to skip the HBase part when running on S3:
 - TestNestedStructsInSelectList.test_struct_in_select_list
 - TestDdlStatements.test_alter_set_column_stats
 - TestShowCreateTable.test_show_create_table

Tests:
 - Run CORE tests on S3

Change-Id: Ib497f11ecc338d0f84d3d7bd8ccfcf8da4def0cb
Reviewed-on: http://gerrit.cloudera.org:8080/21003
Reviewed-by: Quanlong Huang <huangquanl...@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> HBaseTable still get loaded even if HBase is down
> -------------------------------------------------
>
>                 Key: IMPALA-12788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12788
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> This is identified by an internal S3 build that doesn't launch HBase. There 
> are some tests that still run queries on HBase tables, e.g. 
> TestDdlStatements::test_alter_set_column_stats. But they don't fail on even 
> if the table can't be correctly loaded. Catalogd logs show that the 
> connection failure to HBase is ignored:
> {noformat}
> I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for 
> table: functional_hbase.alltypes
> I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: 
> functional_hbase.alltypes (background load)
> I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in 
> queue: 0. Loads in progress: 1
> I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: 
> DROP_DATABASE test_compute_stats_9c95c5d8 issued by jenkins
> I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: 
> functional_hbase.alltypes set to: -1
> ......
> W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
>         at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         ... 1 more
> I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: 
> functional_hbase.alltypes (33371ms)
> I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 
> entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
> I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 
> 1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, 
> compressed size=1471
> I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 
> 1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed 
> size=58{noformat}
> This is problematic since impalad thought the table is correctly loaded and 
> will try to load it again when applying the catalog update, which could block 
> the statestore subscriber thread for a long time, causing other DDL queries 
> to be blocked as well since they can't acquire the catalog update lock.
> We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and 
> this is the cause.
> Here are logs showing impalad is blocked in applying catalog update of the 
> HBase table:
> {noformat}
> I0203 14:13:09.359010  3636 Frontend.java:1917] 
> db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath 
> '/test-warehouse/test_load_staging_beeswax_True'           into table 
> test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
> ...
> I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/0:0:0:0:0:0:0:1:2181: Connection refused
> W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 29
> I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30
> I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at 
> org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
>         at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
>         at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
>         at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
>         at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         at java.lang.Thread.run(Thread.java:748)
> I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update 
> applied with version: 6193 new min catalog object version: 2
> ... // After this the table test_load_nopart_beeswax_true from LoadData 
> statement can be added
> I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] 
> db4f57572baab787:ebdb853600000000] Adding: 
> TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 
> size: 3866 {noformat}
> The bug is that table loading on HBase table should fail if catalogd fails to 
> connect to HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12788) HBaseTable still get loaded even if HBase is down

Reply via email to