[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing
[ https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285806#comment-17285806 ] ASF subversion and git services commented on IMPALA-10497: -- Commit 490aff51b9e3289f2225d3918734821cab7f28c2 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=490aff5 ] IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in the warm-up phase. There is a limit on concurrency in writing to the cache such that we may fail to cache data the first time read it. This patch fixes the test by repeating the warm-up query 5 times. This patch also add a proper start_args to the test so that each impalad will write their data cache file in their own directory. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Reviewed-on: http://gerrit.cloudera.org:8080/17054 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_no_fd_caching_on_cached_data failing > - > > Key: IMPALA-10497 > URL: https://issues.apache.org/jira/browse/IMPALA-10497 > Project: IMPALA > Issue Type: Bug >Reporter: Bikramjeet Vig >Assignee: Riza Suminto >Priority: Major > Labels: broken-build > > {noformat} > Error Message > assert 1 == 0 + where 1 = >() + > where > = > 0x7f22dfe5aa10>.cached_handles > Stacktrace > custom_cluster/test_hdfs_fd_caching.py:202: in > test_no_fd_caching_on_cached_data > assert self.cached_handles() == 0 > E assert 1 == 0 > E+ where 1 = >() > E+where > = > 0x7f22dfe5aa10>.cached_handles > Standard Error > -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--max_cached_file_handles=16 > --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB > --always_use_data_cache=true ' '--state_store_args=None ' > '--catalogd_args=--load_catalog_in_background=false ' > --impalad_args=--default_query_options= > 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 06:40:42 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 06:40:42 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', > error(111, 'Connection refused')) > 06:40:47 MainThread: Debug webpage did not become available in expected time. > 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: > None > 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:49 MainThread: num_known_live_backends has reached value: 3 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001 > 06:
[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing
[ https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282556#comment-17282556 ] Riza Suminto commented on IMPALA-10497: --- The reason for the test flakiness is that data-cache is not fully warmed up. Not all data being read in data-cache, therefore a new hdfs fd is opened and cached. I submitted a patch to fix this to gerrit. > test_no_fd_caching_on_cached_data failing > - > > Key: IMPALA-10497 > URL: https://issues.apache.org/jira/browse/IMPALA-10497 > Project: IMPALA > Issue Type: Bug >Reporter: Bikramjeet Vig >Assignee: Riza Suminto >Priority: Major > Labels: broken-build > > {noformat} > Error Message > assert 1 == 0 + where 1 = >() + > where > = > 0x7f22dfe5aa10>.cached_handles > Stacktrace > custom_cluster/test_hdfs_fd_caching.py:202: in > test_no_fd_caching_on_cached_data > assert self.cached_handles() == 0 > E assert 1 == 0 > E+ where 1 = >() > E+where > = > 0x7f22dfe5aa10>.cached_handles > Standard Error > -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--max_cached_file_handles=16 > --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB > --always_use_data_cache=true ' '--state_store_args=None ' > '--catalogd_args=--load_catalog_in_background=false ' > --impalad_args=--default_query_options= > 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 06:40:42 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 06:40:42 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', > error(111, 'Connection refused')) > 06:40:47 MainThread: Debug webpage did not become available in expected time. > 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: > None > 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:49 MainThread: num_known_live_backends has reached value: 3 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001 > 06:40:49 MainThread: num_known_live_backends has reached value: 3 > 06:40:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:50 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25002 > 06:40:50 MainThread: num_known_live_backends has reached value: 3 > 06:40:50 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 > executors). > -- 2021-02-08 06:40:51,049 DEBUGMainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > -- 2021-02-08 06:40:51,049 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25010 > -- 2021-02-08 06:40:51,050 IN
[jira] [Commented] (IMPALA-10497) test_no_fd_caching_on_cached_data failing
[ https://issues.apache.org/jira/browse/IMPALA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282208#comment-17282208 ] Riza Suminto commented on IMPALA-10497: --- I'm able to reproduce this test failure after several retry. Still not sure what is the root cause of this flakiness though. I'll try to increase the sleep time and see if the flakiness is gone. > test_no_fd_caching_on_cached_data failing > - > > Key: IMPALA-10497 > URL: https://issues.apache.org/jira/browse/IMPALA-10497 > Project: IMPALA > Issue Type: Bug >Reporter: Bikramjeet Vig >Assignee: Riza Suminto >Priority: Major > Labels: broken-build > > {noformat} > Error Message > assert 1 == 0 + where 1 = >() + > where > = > 0x7f22dfe5aa10>.cached_handles > Stacktrace > custom_cluster/test_hdfs_fd_caching.py:202: in > test_no_fd_caching_on_cached_data > assert self.cached_handles() == 0 > E assert 1 == 0 > E+ where 1 = >() > E+where > = > 0x7f22dfe5aa10>.cached_handles > Standard Error > -- 2021-02-08 06:40:41,413 INFO MainThread: Starting cluster with > command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > '--state_store_args=--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args=--max_cached_file_handles=16 > --unused_file_handle_timeout_sec=5 --data_cache=/tmp:500MB > --always_use_data_cache=true ' '--state_store_args=None ' > '--catalogd_args=--load_catalog_in_background=false ' > --impalad_args=--default_query_options= > 06:40:42 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) > 06:40:42 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 06:40:42 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 06:40:42 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:45 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:45 MainThread: Debug webpage not yet available: ('Connection aborted.', > error(111, 'Connection refused')) > 06:40:47 MainThread: Debug webpage did not become available in expected time. > 06:40:47 MainThread: Waiting for num_known_live_backends=3. Current value: > None > 06:40:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:48 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:48 MainThread: Waiting for num_known_live_backends=3. Current value: 0 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25000 > 06:40:49 MainThread: num_known_live_backends has reached value: 3 > 06:40:49 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:49 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25001 > 06:40:49 MainThread: num_known_live_backends has reached value: 3 > 06:40:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > 06:40:50 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25002 > 06:40:50 MainThread: num_known_live_backends has reached value: 3 > 06:40:50 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 > executors). > -- 2021-02-08 06:40:51,049 DEBUGMainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > -- 2021-02-08 06:40:51,049 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r5-4xlarge-ondemand-02df.vpc.cloudera.com:25010 > -- 2021-02-08 06:40:51,050 INFO MainTh