surahman edited a comment on pull request #3735:
URL: https://github.com/apache/incubator-heron/pull/3735#issuecomment-968160231
After a lot of code digging and multithreaded debugging on the command line
in GDB I have isolated and fixed the bug with the failing test on a 4 core
machine. The test is also passing on a machine with many more cores.
The issue is with backpressure buildup in the three problematic tests. The
bolt threads are killed and the spouts are allowed to build backpressure to set
up the scenario for the "event" in the test. I have tweaked the high/low
watermarks for the backpressure as well as tweaked the spout counts to get the
tests to work faster.
There are some weird `PEX` errors that are coming up but if the CI runtime
passes the 100-105min mark we can safely assume the `stmgr_unittest` is now
passing in entirety in the CI pipeline as well.
**_Edit:_** It is flaky but passing locally.
<details><summary>Test results</summary>
```bash
INFO: Elapsed time: 379.150s, Critical Path: 353.90s
INFO: 235 processes: 236 local.
INFO: Build completed successfully, 235 total actions
//heron/api/tests/cpp:serialization_unittest
PASSED in 0.1s
//heron/api/tests/java:BaseWindowedBoltTest
PASSED in 0.5s
//heron/api/tests/java:ConfigTest
PASSED in 0.3s
//heron/api/tests/java:CountStatAndMetricTest
PASSED in 0.3s
//heron/api/tests/java:GeneralReduceByKeyAndWindowOperatorTest
PASSED in 0.5s
//heron/api/tests/java:HeronSubmitterTest
PASSED in 2.0s
//heron/api/tests/java:JoinOperatorTest
PASSED in 0.5s
//heron/api/tests/java:KVStreamletShadowTest
PASSED in 0.4s
//heron/api/tests/java:KeyByOperatorTest
PASSED in 0.4s
//heron/api/tests/java:LatencyStatAndMetricTest
PASSED in 0.4s
//heron/api/tests/java:ReduceByKeyAndWindowOperatorTest
PASSED in 0.5s
//heron/api/tests/java:StreamletImplTest
PASSED in 0.4s
//heron/api/tests/java:StreamletShadowTest
PASSED in 0.4s
//heron/api/tests/java:StreamletUtilsTest
PASSED in 0.3s
//heron/api/tests/java:UtilsTest
PASSED in 0.5s
//heron/api/tests/java:WaterMarkEventGeneratorTest
PASSED in 0.3s
//heron/api/tests/java:WindowManagerTest
PASSED in 0.3s
//heron/api/tests/java:WindowedBoltExecutorTest
PASSED in 0.5s
//heron/api/tests/scala:api-scala-test
PASSED in 0.9s
WARNING: //heron/api/tests/scala:api-scala-test: Test execution time (0.9s
excluding execution overhead) outside of range for MODERATE tests. Consider
setting timeout="short" or size="small".
//heron/ckptmgr/tests/java:CheckpointManagerServerTest
PASSED in 0.6s
//heron/common/tests/cpp/basics:fileutils_unittest
PASSED in 0.0s
//heron/common/tests/cpp/basics:rid_unittest
PASSED in 0.0s
//heron/common/tests/cpp/basics:strutils_unittest
PASSED in 0.0s
//heron/common/tests/cpp/basics:utils_unittest
PASSED in 0.0s
//heron/common/tests/cpp/config:topology-config-helper_unittest
PASSED in 0.0s
//heron/common/tests/cpp/errors:errors_unittest
PASSED in 0.0s
//heron/common/tests/cpp/errors:module_unittest
PASSED in 0.1s
//heron/common/tests/cpp/errors:syserrs_unittest
PASSED in 0.0s
//heron/common/tests/cpp/metrics:count-metric_unittest
PASSED in 0.0s
//heron/common/tests/cpp/metrics:mean-metric_unittest
PASSED in 0.0s
//heron/common/tests/cpp/metrics:multi-count-metric_unittest
PASSED in 0.0s
//heron/common/tests/cpp/metrics:multi-mean-metric_unittest
PASSED in 0.0s
//heron/common/tests/cpp/metrics:time-spent-metric_unittest
PASSED in 1.4s
//heron/common/tests/cpp/network:http_unittest
PASSED in 0.1s
//heron/common/tests/cpp/network:order_unittest
PASSED in 0.1s
//heron/common/tests/cpp/network:packet_unittest
PASSED in 0.0s
//heron/common/tests/cpp/network:piper_unittest
PASSED in 2.1s
//heron/common/tests/cpp/network:rate_limit_unittest
PASSED in 4.1s
//heron/common/tests/cpp/network:switch_unittest
PASSED in 0.2s
//heron/common/tests/cpp/threads:spcountdownlatch_unittest
PASSED in 2.0s
//heron/common/tests/java:ByteAmountTest
PASSED in 0.3s
//heron/common/tests/java:CommunicatorTest
PASSED in 0.3s
//heron/common/tests/java:ConfigReaderTest
PASSED in 0.4s
//heron/common/tests/java:EchoTest
PASSED in 0.6s
//heron/common/tests/java:FileUtilsTest
PASSED in 1.1s
//heron/common/tests/java:HeronServerTest
PASSED in 1.5s
//heron/common/tests/java:PackageTypeTest
PASSED in 0.3s
//heron/common/tests/java:SysUtilsTest
PASSED in 5.1s
//heron/common/tests/java:SystemConfigTest
PASSED in 0.4s
//heron/common/tests/java:TopologyUtilsTest
PASSED in 0.4s
//heron/common/tests/java:WakeableLooperTest
PASSED in 1.3s
//heron/common/tests/python/pex_loader:pex_loader_unittest
PASSED in 0.9s
//heron/downloaders/tests/java:DLDownloaderTest
PASSED in 1.0s
//heron/downloaders/tests/java:ExtractorTests
PASSED in 0.4s
//heron/downloaders/tests/java:RegistryTest
PASSED in 0.4s
//heron/executor/tests/python:executor_unittest
PASSED in 1.3s
//heron/healthmgr/tests/java:BackPressureDetectorTest
PASSED in 0.9s
//heron/healthmgr/tests/java:BackPressureSensorTest
PASSED in 1.0s
//heron/healthmgr/tests/java:BufferSizeSensorTest
PASSED in 0.8s
//heron/healthmgr/tests/java:DataSkewDiagnoserTest
PASSED in 0.6s
//heron/healthmgr/tests/java:ExecuteCountSensorTest
PASSED in 0.6s
//heron/healthmgr/tests/java:GrowingWaitQueueDetectorTest
PASSED in 0.5s
//heron/healthmgr/tests/java:HealthManagerTest
PASSED in 0.9s
//heron/healthmgr/tests/java:HealthPolicyConfigReaderTest
PASSED in 0.5s
//heron/healthmgr/tests/java:LargeWaitQueueDetectorTest
PASSED in 0.6s
//heron/healthmgr/tests/java:MetricsCacheMetricsProviderTest
PASSED in 0.7s
//heron/healthmgr/tests/java:PackingPlanProviderTest
PASSED in 0.6s
//heron/healthmgr/tests/java:ProcessingRateSkewDetectorTest
PASSED in 0.6s
//heron/healthmgr/tests/java:ScaleUpResolverTest
PASSED in 0.7s
//heron/healthmgr/tests/java:SlowInstanceDiagnoserTest
PASSED in 0.6s
//heron/healthmgr/tests/java:UnderProvisioningDiagnoserTest
PASSED in 0.5s
//heron/healthmgr/tests/java:WaitQueueSkewDetectorTest
PASSED in 0.5s
//heron/instance/tests/java:ActivateDeactivateTest
PASSED in 0.5s
//heron/instance/tests/java:BoltInstanceTest
PASSED in 0.5s
//heron/instance/tests/java:BoltStatefulInstanceTest
PASSED in 2.6s
//heron/instance/tests/java:ConnectTest
PASSED in 0.6s
//heron/instance/tests/java:CustomGroupingTest
PASSED in 0.7s
//heron/instance/tests/java:EmitDirectBoltTest
PASSED in 0.6s
//heron/instance/tests/java:EmitDirectSpoutTest
PASSED in 0.7s
//heron/instance/tests/java:GlobalMetricsTest
PASSED in 0.3s
//heron/instance/tests/java:HandleReadTest
PASSED in 0.6s
//heron/instance/tests/java:HandleWriteTest
PASSED in 5.6s
//heron/instance/tests/java:MultiAssignableMetricTest
PASSED in 0.3s
//heron/instance/tests/java:SpoutInstanceTest
PASSED in 2.7s
//heron/instance/tests/java:SpoutStatefulInstanceTest
PASSED in 2.5s
//heron/instance/tests/python/network:event_looper_unittest
PASSED in 3.1s
//heron/instance/tests/python/network:gateway_looper_unittest
PASSED in 11.0s
//heron/instance/tests/python/network:heron_client_unittest
PASSED in 1.1s
//heron/instance/tests/python/network:metricsmgr_client_unittest
PASSED in 1.2s
//heron/instance/tests/python/network:protocol_unittest
PASSED in 1.1s
//heron/instance/tests/python/network:st_stmgrcli_unittest
PASSED in 1.2s
//heron/instance/tests/python/utils:communicator_unittest
PASSED in 1.2s
//heron/instance/tests/python/utils:custom_grouping_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:global_metrics_unittest
PASSED in 1.2s
//heron/instance/tests/python/utils:log_unittest
PASSED in 1.0s
//heron/instance/tests/python/utils:metrics_helper_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:outgoing_tuple_helper_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:pplan_helper_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:py_metrics_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:topology_context_impl_unittest
PASSED in 1.1s
//heron/instance/tests/python/utils:tuple_helper_unittest
PASSED in 1.1s
//heron/io/dlog/tests/java:DLInputStreamTest
PASSED in 0.7s
//heron/io/dlog/tests/java:DLOutputStreamTest
PASSED in 0.5s
//heron/metricscachemgr/tests/java:CacheCoreTest
PASSED in 0.4s
//heron/metricscachemgr/tests/java:MetricsCacheQueryUtilsTest
PASSED in 0.4s
//heron/metricscachemgr/tests/java:MetricsCacheTest
PASSED in 0.4s
//heron/metricsmgr/tests/java:FileSinkTest
PASSED in 0.5s
//heron/metricsmgr/tests/java:HandleTManagerLocationTest
PASSED in 0.6s
//heron/metricsmgr/tests/java:MetricsCacheSinkTest
PASSED in 9.5s
//heron/metricsmgr/tests/java:MetricsManagerServerTest
PASSED in 0.6s
//heron/metricsmgr/tests/java:MetricsUtilTests
PASSED in 0.5s
//heron/metricsmgr/tests/java:PrometheusSinkTests
PASSED in 0.4s
//heron/metricsmgr/tests/java:SinkExecutorTest
PASSED in 0.5s
//heron/metricsmgr/tests/java:TManagerSinkTest
PASSED in 9.6s
//heron/metricsmgr/tests/java:WebSinkTest
PASSED in 0.6s
//heron/packing/tests/java:FirstFitDecreasingPackingTest
PASSED in 0.8s
//heron/packing/tests/java:PackingPlanBuilderTest
PASSED in 0.5s
//heron/packing/tests/java:PackingUtilsTest
PASSED in 0.4s
//heron/packing/tests/java:ResourceCompliantRRPackingTest
PASSED in 0.6s
//heron/packing/tests/java:RoundRobinPackingTest
PASSED in 0.5s
//heron/packing/tests/java:ScorerTest
PASSED in 0.3s
//heron/scheduler-core/tests/java:HttpServiceSchedulerClientTest
PASSED in 1.2s
//heron/scheduler-core/tests/java:JsonFormatterUtilsTest
PASSED in 0.4s
//heron/scheduler-core/tests/java:LaunchRunnerTest
PASSED in 1.2s
//heron/scheduler-core/tests/java:LauncherUtilsTest
PASSED in 1.9s
//heron/scheduler-core/tests/java:LibrarySchedulerClientTest
PASSED in 0.4s
//heron/scheduler-core/tests/java:RuntimeManagerMainTest
PASSED in 2.8s
//heron/scheduler-core/tests/java:RuntimeManagerRunnerTest
PASSED in 2.4s
//heron/scheduler-core/tests/java:SchedulerClientFactoryTest
PASSED in 1.4s
//heron/scheduler-core/tests/java:SchedulerMainTest
PASSED in 3.6s
//heron/scheduler-core/tests/java:SchedulerServerTest
PASSED in 0.5s
//heron/scheduler-core/tests/java:SchedulerUtilsTest
PASSED in 1.4s
//heron/scheduler-core/tests/java:SubmitDryRunRenderTest
PASSED in 1.5s
//heron/scheduler-core/tests/java:SubmitterMainTest
PASSED in 1.2s
//heron/scheduler-core/tests/java:UpdateDryRunRenderTest
PASSED in 1.7s
//heron/scheduler-core/tests/java:UpdateTopologyManagerTest
PASSED in 11.8s
//heron/schedulers/tests/java:AuroraCLIControllerTest
PASSED in 0.4s
//heron/schedulers/tests/java:AuroraContextTest
PASSED in 0.4s
//heron/schedulers/tests/java:AuroraLauncherTest
PASSED in 0.8s
//heron/schedulers/tests/java:AuroraSchedulerTest
PASSED in 2.9s
//heron/schedulers/tests/java:HeronExecutorTaskTest
PASSED in 1.4s
//heron/schedulers/tests/java:HeronMasterDriverTest
PASSED in 1.8s
//heron/schedulers/tests/java:KubernetesContextTest
PASSED in 0.4s
//heron/schedulers/tests/java:KubernetesControllerTest
PASSED in 0.3s
//heron/schedulers/tests/java:KubernetesLauncherTest
PASSED in 0.9s
//heron/schedulers/tests/java:KubernetesSchedulerTest
PASSED in 0.8s
//heron/schedulers/tests/java:KubernetesUtilsTest
PASSED in 0.4s
//heron/schedulers/tests/java:LaunchableTaskTest
PASSED in 0.5s
//heron/schedulers/tests/java:LocalLauncherTest
PASSED in 0.9s
//heron/schedulers/tests/java:LocalSchedulerTest
PASSED in 0.6s
//heron/schedulers/tests/java:MarathonControllerTest
PASSED in 1.3s
//heron/schedulers/tests/java:MarathonLauncherTest
PASSED in 0.8s
//heron/schedulers/tests/java:MarathonSchedulerTest
PASSED in 0.5s
//heron/schedulers/tests/java:MesosFrameworkTest
PASSED in 0.7s
//heron/schedulers/tests/java:MesosLauncherTest
PASSED in 0.6s
//heron/schedulers/tests/java:MesosSchedulerTest
PASSED in 0.7s
//heron/schedulers/tests/java:NomadSchedulerTest
PASSED in 2.1s
//heron/schedulers/tests/java:SlurmControllerTest
PASSED in 1.0s
//heron/schedulers/tests/java:SlurmLauncherTest
PASSED in 1.0s
//heron/schedulers/tests/java:SlurmSchedulerTest
PASSED in 1.2s
//heron/schedulers/tests/java:TaskResourcesTest
PASSED in 0.3s
//heron/schedulers/tests/java:TaskUtilsTest
PASSED in 0.3s
//heron/schedulers/tests/java:V1ControllerTest
PASSED in 1.7s
//heron/schedulers/tests/java:VolumesTests
PASSED in 0.4s
//heron/schedulers/tests/java:YarnLauncherTest
PASSED in 0.7s
//heron/schedulers/tests/java:YarnSchedulerTest
PASSED in 0.5s
//heron/simulator/tests/java:AllGroupingTest
PASSED in 0.3s
//heron/simulator/tests/java:CustomGroupingTest
PASSED in 0.3s
//heron/simulator/tests/java:FieldsGroupingTest
PASSED in 0.8s
//heron/simulator/tests/java:InstanceExecutorTest
PASSED in 0.5s
//heron/simulator/tests/java:LowestGroupingTest
PASSED in 0.3s
//heron/simulator/tests/java:RotatingMapTest
PASSED in 0.3s
//heron/simulator/tests/java:ShuffleGroupingTest
PASSED in 0.3s
//heron/simulator/tests/java:SimulatorTest
PASSED in 0.5s
//heron/simulator/tests/java:TopologyManagerTest
PASSED in 0.4s
//heron/simulator/tests/java:TupleCacheTest
PASSED in 0.3s
//heron/simulator/tests/java:XORManagerTest
PASSED in 0.5s
//heron/spi/tests/java:ConfigLoaderTest
PASSED in 1.4s
//heron/spi/tests/java:ConfigTest
PASSED in 1.0s
//heron/spi/tests/java:ContextTest
PASSED in 0.3s
//heron/spi/tests/java:ExceptionInfoTest
PASSED in 0.2s
//heron/spi/tests/java:KeysTest
PASSED in 0.3s
//heron/spi/tests/java:MetricsInfoTest
PASSED in 0.3s
//heron/spi/tests/java:MetricsRecordTest
PASSED in 0.2s
//heron/spi/tests/java:NetworkUtilsTest
PASSED in 1.6s
//heron/spi/tests/java:PackingPlanTest
PASSED in 0.3s
//heron/spi/tests/java:ResourceTest
PASSED in 0.3s
//heron/spi/tests/java:ShellUtilsTest
PASSED in 1.5s
//heron/spi/tests/java:TokenSubTest
PASSED in 0.3s
//heron/spi/tests/java:UploaderUtilsTest
PASSED in 0.4s
//heron/statefulstorages/tests/java:DlogStorageTest
PASSED in 2.0s
//heron/statefulstorages/tests/java:HDFSStorageTest
PASSED in 2.0s
//heron/statefulstorages/tests/java:LocalFileSystemStorageTest
PASSED in 1.1s
//heron/statemgrs/tests/cpp:zk-statemgr_unittest
PASSED in 0.0s
//heron/statemgrs/tests/java:CuratorStateManagerTest
PASSED in 0.6s
//heron/statemgrs/tests/java:LocalFileSystemStateManagerTest
PASSED in 1.3s
//heron/statemgrs/tests/java:ZkUtilsTest
PASSED in 1.1s
//heron/statemgrs/tests/python:configloader_unittest
PASSED in 1.1s
//heron/statemgrs/tests/python:statemanagerfactory_unittest
PASSED in 1.2s
//heron/statemgrs/tests/python:zkstatemanager_unittest
PASSED in 1.1s
//heron/stmgr/tests/cpp/grouping:all-grouping_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:custom-grouping_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:fields-grouping_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:lowest-grouping_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/grouping:shuffle-grouping_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/server:checkpoint-gateway_unittest
PASSED in 1.4s
WARNING: //heron/stmgr/tests/cpp/server:checkpoint-gateway_unittest: Test
execution time (1.4s excluding execution overhead) outside of range for
MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/server:stateful-restorer_unittest
PASSED in 0.0s
WARNING: //heron/stmgr/tests/cpp/server:stateful-restorer_unittest: Test
execution time (0.0s excluding execution overhead) outside of range for
MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/util:neighbour_calculator_unittest
PASSED in 0.1s
WARNING: //heron/stmgr/tests/cpp/util:neighbour_calculator_unittest: Test
execution time (0.1s excluding execution overhead) outside of range for
MODERATE tests. Consider setting timeout="short" or size="small".
//heron/stmgr/tests/cpp/util:rotating-map_unittest
PASSED in 0.0s
//heron/stmgr/tests/cpp/util:tuple-cache_unittest
PASSED in 3.7s
//heron/stmgr/tests/cpp/util:xor-manager_unittest
PASSED in 4.1s
//heron/tmanager/tests/cpp/server:stateful_checkpointer_unittest
PASSED in 0.0s
//heron/tmanager/tests/cpp/server:stateful_restorer_unittest
PASSED in 5.1s
//heron/tmanager/tests/cpp/server:tcontroller_unittest
PASSED in 0.0s
//heron/tmanager/tests/cpp/server:tmanager_unittest
PASSED in 26.2s
//heron/tools/apiserver/tests/java:ConfigUtilsTests
PASSED in 0.4s
//heron/tools/apiserver/tests/java:TopologyResourceTests
PASSED in 0.7s
//heron/tools/cli/tests/python:client_command_unittest
PASSED in 1.2s
//heron/tools/cli/tests/python:opts_unittest
PASSED in 1.0s
//heron/tools/explorer/tests/python:explorer_unittest
PASSED in 1.2s
//heron/tools/tracker/tests/python:query_operator_unittest
PASSED in 1.5s
//heron/tools/tracker/tests/python:query_unittest
PASSED in 1.4s
//heron/tools/tracker/tests/python:topology_unittest
PASSED in 1.3s
//heron/tools/tracker/tests/python:tracker_unittest
PASSED in 1.5s
//heron/uploaders/tests/java:DlogUploaderTest
PASSED in 0.7s
//heron/uploaders/tests/java:GcsUploaderTests
PASSED in 0.4s
//heron/uploaders/tests/java:HdfsUploaderTest
PASSED in 0.5s
//heron/uploaders/tests/java:HttpUploaderTest
PASSED in 0.6s
//heron/uploaders/tests/java:LocalFileSystemConfigTest
PASSED in 0.3s
//heron/uploaders/tests/java:LocalFileSystemContextTest
PASSED in 0.3s
//heron/uploaders/tests/java:LocalFileSystemUploaderTest
PASSED in 0.4s
//heron/uploaders/tests/java:S3UploaderTest
PASSED in 1.6s
//heron/uploaders/tests/java:ScpUploaderTest
PASSED in 0.4s
//heron/stmgr/tests/cpp/server:stmgr_unittest
FLAKY, failed in 1 out of 2 in 315.1s
Stats over 2 runs: max = 315.1s, min = 37.0s, avg = 176.0s, dev = 139.0s
/root/.cache/bazel/_bazel_root/f4ab758bd53020512013f7dfa13b6902/execroot/org_apache_heron/bazel-out/k8-fastbuild/testlogs/heron/stmgr/tests/cpp/server/stmgr_unittest/test_attempts/attempt_1.log
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]