[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing

2021-02-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285806#comment-17285806
 ] 

ASF subversion and git services commented on IMPALA-10497:
--

Commit 490aff51b9e3289f2225d3918734821cab7f28c2 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=490aff5 ]

IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data.

test_no_fd_caching_on_cached_data has been flaky for not having all of
the data fully cached in the warm-up phase. There is a limit on
concurrency in writing to the cache such that we may fail to cache data
the first time read it. This patch fixes the test by repeating the
warm-up query 5 times. This patch also add a proper start_args to the
test so that each impalad will write their data cache file in their own
directory.

Testing:
- Loop the test manually 100 times and see no more failures.

Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952
Reviewed-on: http://gerrit.cloudera.org:8080/17054
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_no_fd_caching_on_cached_data failing
> -
>
> Key: IMPALA-10497
> URL: https://issues.apache.org/jira/browse/IMPALA-10497
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Bikramjeet Vig
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
>
> {noformat}
> Error Message
> assert 1 == 0  +  where 1 =  >()  +
> where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Stacktrace
> custom_cluster/test_hdfs_fd_caching.py:202: in 
> test_no_fd_caching_on_cached_data
> assert self.cached_handles() == 0
> E   assert 1 == 0
> E+  where 1 =  >()
> E+where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Standard Error
> -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--max_cached_file_handles=16 
> --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB 
> --always_use_data_cache=true ' '--state_store_args=None ' 
> '--catalogd_args=--load_catalog_in_background=false ' 
> --impalad_args=--default_query_options=
> 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 06:40:42 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 06:40:42 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', 
> error(111, 'Connection refused'))
> 06:40:47 MainThread: Debug webpage did not become available in expected time.
> 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:49 MainThread: num_known_live_backends has reached value: 3
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001
> 06:

[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing

2021-02-10 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282556#comment-17282556
 ] 

Riza Suminto commented on IMPALA-10497:
---

The reason for the test flakiness is that data-cache is not fully warmed up. 
Not all data being read in data-cache, therefore a new hdfs fd is opened and 
cached.

I submitted a patch to fix this to gerrit. 

> test_no_fd_caching_on_cached_data failing
> -
>
> Key: IMPALA-10497
> URL: https://issues.apache.org/jira/browse/IMPALA-10497
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Bikramjeet Vig
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
>
> {noformat}
> Error Message
> assert 1 == 0  +  where 1 =  >()  +
> where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Stacktrace
> custom_cluster/test_hdfs_fd_caching.py:202: in 
> test_no_fd_caching_on_cached_data
> assert self.cached_handles() == 0
> E   assert 1 == 0
> E+  where 1 =  >()
> E+where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Standard Error
> -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--max_cached_file_handles=16 
> --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB 
> --always_use_data_cache=true ' '--state_store_args=None ' 
> '--catalogd_args=--load_catalog_in_background=false ' 
> --impalad_args=--default_query_options=
> 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 06:40:42 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 06:40:42 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', 
> error(111, 'Connection refused'))
> 06:40:47 MainThread: Debug webpage did not become available in expected time.
> 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:49 MainThread: num_known_live_backends has reached value: 3
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001
> 06:40:49 MainThread: num_known_live_backends has reached value: 3
> 06:40:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:50 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25002
> 06:40:50 MainThread: num_known_live_backends has reached value: 3
> 06:40:50 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2021-02-08 06:40:51,049 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2021-02-08 06:40:51,049 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25010
> -- 2021-02-08 06:40:51,050 IN

[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing

2021-02-09 Thread Riza Suminto (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282208#comment-17282208
 ] 

Riza Suminto commented on IMPALA-10497:
---

I'm able to reproduce this test failure after several retry. Still not sure 
what is the root cause of this flakiness though.
I'll try to increase the sleep time and see if the flakiness is gone.

> test_no_fd_caching_on_cached_data failing
> -
>
> Key: IMPALA-10497
> URL: https://issues.apache.org/jira/browse/IMPALA-10497
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Bikramjeet Vig
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
>
> {noformat}
> Error Message
> assert 1 == 0  +  where 1 =  >()  +
> where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Stacktrace
> custom_cluster/test_hdfs_fd_caching.py:202: in 
> test_no_fd_caching_on_cached_data
> assert self.cached_handles() == 0
> E   assert 1 == 0
> E+  where 1 =  >()
> E+where  > = 
>  0x7f22dfe5aa10>.cached_handles
> Standard Error
> -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--max_cached_file_handles=16 
> --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB 
> --always_use_data_cache=true ' '--state_store_args=None ' 
> '--catalogd_args=--load_catalog_in_background=false ' 
> --impalad_args=--default_query_options=
> 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 06:40:42 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 06:40:42 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 06:40:42 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:45 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', 
> error(111, 'Connection refused'))
> 06:40:47 MainThread: Debug webpage did not become available in expected time.
> 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:48 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000
> 06:40:49 MainThread: num_known_live_backends has reached value: 3
> 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:49 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001
> 06:40:49 MainThread: num_known_live_backends has reached value: 3
> 06:40:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 06:40:50 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25002
> 06:40:50 MainThread: num_known_live_backends has reached value: 3
> 06:40:50 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2021-02-08 06:40:51,049 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2021-02-08 06:40:51,049 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25010
> -- 2021-02-08 06:40:51,050 INFO MainTh