[ 
https://issues.apache.org/jira/browse/IMPALA-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907704#comment-16907704
 ] 

ASF subversion and git services commented on IMPALA-8856:
---------------------------------------------------------

Commit d5d3ace6c27123f788dc292a651f1df0c8620686 in impala's branch 
refs/heads/master from Hao Hao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d5d3ace ]

IMPALA-8856: Deflake TestKuduHMSIntegration

Tests for Kudu's integration with the Hive Metastore can be flaky.
Since Kudu depends on the HMS, create/drop table requests can timeout
when creation/deletion in the HMS take more than 10s, and the following
retry will fail as the table has been already created/deleted by the
first request.

This patch increases the timeout of individual Kudu client rpcs to
avoid flakiness cause by such cases.

Change-Id: Ib98f34bb831b9255e35b5ef234abe6ceaf261bfd
Reviewed-on: http://gerrit.cloudera.org:8080/14067
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> TestKuduHMSIntegration.test_drop_managed_kudu_table failed with "the table 
> does not exist"
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8856
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8856
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Hao Hao
>            Priority: Blocker
>              Labels: flaky
>         Attachments: 
> catalogd.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144934.5595,
>  hive-metastore.log, 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5670,
>  
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5675,
>  
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5675,
>  
> kudu-master.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.28998,
>  
> kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.diagnostics.20190812-144746.0.28965,
>  
> kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.28981,
>  
> kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.29022
>
>
> {noformat}
> custom_cluster.test_kudu.TestKuduHMSIntegration.test_drop_managed_kudu_table 
> (from pytest)
> Failing for the past 1 build (Since Failed#31 )
> Took 27 sec.
> add description
> Error Message
> KuduNotFound: the table does not exist: table_name: 
> "test_drop_managed_kudu_table_a82c250c.foo"
> Stacktrace
> custom_cluster/test_kudu.py:256: in test_drop_managed_kudu_table
>     kudu_client.delete_table(kudu_tbl_name)
> kudu/client.pyx:392: in kudu.client.Client.delete_table (kudu/client.cpp:7106)
>     ???
> kudu/errors.pyx:56: in kudu.errors.check_status (kudu/errors.cpp:904)
>     ???
> E   KuduNotFound: the table does not exist: table_name: 
> "test_drop_managed_kudu_table_a82c250c.foo"
> Standard Error
> -- 2019-08-12 14:49:32,854 INFO     MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50     
> --statestore_priority_update_frequency_ms=50     
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 --impalad_args=--default_query_options=
> 14:49:33 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 14:49:33 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 14:49:34 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 14:49:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 14:49:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 14:49:35 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 14:49:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
> 14:49:38 MainThread: Debug webpage not yet available: ('Connection aborted.', 
> error(111, 'Connection refused'))
> 14:49:40 MainThread: Debug webpage did not become available in expected time.
> 14:49:40 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 14:49:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:41 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
> 14:49:41 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 14:49:42 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:42 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
> 14:49:42 MainThread: num_known_live_backends has reached value: 3
> 14:49:43 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:43 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25001
> 14:49:43 MainThread: num_known_live_backends has reached value: 3
> 14:49:43 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 14:49:43 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25002
> 14:49:43 MainThread: num_known_live_backends has reached value: 3
> 14:49:44 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2019-08-12 14:49:44,203 DEBUG    MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2019-08-12 14:49:44,203 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25010
> -- 2019-08-12 14:49:44,204 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com
> -- 2019-08-12 14:49:44,206 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> -- 2019-08-12 14:49:44,206 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
> -- 2019-08-12 14:49:44,206 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com
> -- 2019-08-12 14:49:44,207 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2019-08-12 14:49:44,207 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25001
> -- 2019-08-12 14:49:44,208 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com
> -- 2019-08-12 14:49:44,209 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2019-08-12 14:49:44,209 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25002
> -- 2019-08-12 14:49:44,210 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com
> -- 2019-08-12 14:49:44,211 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> SET 
> client_identifier=custom_cluster/test_kudu.py::TestKuduHMSIntegration::()::test_drop_managed_kudu_table;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2019-08-12 14:49:44,348 INFO     MainThread: Closing active operation
> -- 2019-08-12 14:49:44,349 ERROR    MainThread: Failed to open transport 
> (tries_left=3)
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/hiveserver2.py",
>  line 961, in _execute
>     return func(request)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py",
>  line 176, in OpenSession
>     return self.recv_OpenSession()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/infra/python/env/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py",
>  line 188, in recv_OpenSession
>     (fname, mtype, rseqid) = iprot.readMessageBegin()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/Impala-Toolchain/thrift-0.9.3-p7/python/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
>  line 126, in readMessageBegin
>     sz = self.readI32()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/Impala-Toolchain/thrift-0.9.3-p7/python/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
>  line 206, in readI32
>     buff = self.trans.readAll(4)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/Impala-Toolchain/thrift-0.9.3-p7/python/lib64/python2.7/site-packages/thrift/transport/TTransport.py",
>  line 58, in readAll
>     chunk = self.read(sz - have)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/Impala-Toolchain/thrift-0.9.3-p7/python/lib64/python2.7/site-packages/thrift/transport/TTransport.py",
>  line 159, in read
>     self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/Impala-Toolchain/thrift-0.9.3-p7/python/lib64/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 120, in read
>     message='TSocket read 0 bytes')
> TTransportException: TSocket read 0 bytes
> -- 2019-08-12 14:49:44,350 INFO     MainThread: Using database 
> testkuduhmsintegration_12761_bypi6f as default
> W0812 14:49:44.382771 12761 client-internal.cc:655] Specified master server 
> address 'localhost:7051' resolved to multiple IPs. Using 127.0.0.1:7051
> SET 
> client_identifier=custom_cluster/test_kudu.py::TestKuduHMSIntegration::()::test_drop_managed_kudu_table;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_drop_managed_kudu_table_a82c250c` CASCADE;
> -- 2019-08-12 14:49:44,497 INFO     MainThread: Started query 
> a4467ba0d476591c:e3d6328a00000000
> SET 
> client_identifier=custom_cluster/test_kudu.py::TestKuduHMSIntegration::()::test_drop_managed_kudu_table;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_drop_managed_kudu_table_a82c250c`;
> -- 2019-08-12 14:49:44,882 INFO     MainThread: Started query 
> 96443159258ca73f:52af6c0e00000000
> -- 2019-08-12 14:49:44,884 INFO     MainThread: Created database 
> "test_drop_managed_kudu_table_a82c250c" for test ID 
> "custom_cluster/test_kudu.py::TestKuduHMSIntegration::()::test_drop_managed_kudu_table"
> -- 2019-08-12 14:49:46,129 INFO     MainThread: Starting new HTTP connection 
> (1): localhost
> W0812 14:49:56.135144  6748 master_proxy_rpc.cc:192] Re-attempting 
> DeleteTable request to leader Master (localhost:7051)
> {noformat}
> It looks like there is some kind of race here, since there is an existence 
> check for the table before deleting it:
> {noformat}
>     assert kudu_client.table_exists(kudu_tbl_name)
>     kudu_client.delete_table(kudu_tbl_name)
> {noformat}
> I attached all the logs that included the table name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to