[ https://issues.apache.org/jira/browse/MESOS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390279#comment-17390279 ]
Martin Tzvetanov Grigorov commented on MESOS-10226: --------------------------------------------------- `sudo ./bin/mesos-tests.sh --gtest_filter=*ProvisionerDockerTest.*ROOT_INTERNET_CURL_SimpleCommand* --verbose` didn't hang but failed with: {code:java} ... 7-b49a-765ac4cd1729/backends/overlay/rootfses/ba3ccd6c-bacf-4d88-a4fc-5104ca45d19e' for container a6a3e1b5-4322-4b07-b49a-765ac4cd1729 I0730 05:32:29.134213 2249744 master.cpp:1149] Master terminating I0730 05:32:29.134589 2249739 hierarchical.cpp:1232] Removed all filters for agent 4c43d934-41d8-4159-9b03-2dfdeee3f386-S0 I0730 05:32:29.134629 2249739 hierarchical.cpp:1108] Removed agent 4c43d934-41d8-4159-9b03-2dfdeee3f386-S0 [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where GetParam() = "quay.io/coreos/alpine-sh" (3751 ms) [----------] 5 tests from ContainerImage/ProvisionerDockerTest (38953 ms total)[----------] Global test environment tear-down [==========] 5 tests from 1 test case ran. (38966 ms total) [ PASSED ] 0 tests. [ FAILED ] 5 tests, listed below: [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/0, where GetParam() = "alpine" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/1, where GetParam() = "library/alpine" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/2, where GetParam() = "gcr.io/google-containers/busybox:1.24" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/3, where GetParam() = "gcr.io/google-containers/busybox:1.27" [ FAILED ] ContainerImage/ProvisionerDockerTest.ROOT_INTERNET_CURL_SimpleCommand/4, where GetParam() = "quay.io/coreos/alpine-sh" 5 FAILED TESTS I0730 05:32:29.176168 2249746 process.cpp:935] Stopped the socket accept loop {code} > test suite hangs on ARM64 > ------------------------- > > Key: MESOS-10226 > URL: https://issues.apache.org/jira/browse/MESOS-10226 > Project: Mesos > Issue Type: Bug > Reporter: Charles Natali > Assignee: Charles Natali > Priority: Major > Attachments: gdb-thread-apply-bt-all-29.07.2021-2.txt, > gdb-thread-apply-bt-all-29.07.2021.txt > > > Reported by [~mgrigorov]. > > {noformat} > [ RUN ] > NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace > sh: 1: hadoop: not found > Marked '/' as rslave > I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 > I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent > 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 > I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event > I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on > martin-arm64 > I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event > I0726 11:59:17.834415 36 executor.cpp:722] Starting task > d1bbb266-bee7-4c9d-929f-16aa41f4e9cf > I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 > Preparing rootfs at > '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' > Changing root to > /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 > Failed to execute 'sh': Exec format error > I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 > (pid: 38) > ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:1111: > Failure > Mock function called more times than expected - returning directly. > Function call: statusUpdate(0xffffc28527f0, @0xffffa2cf3a60 136-byte > object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 > 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 > A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 > 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 > 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 > 03-00 00-00>) > Expected: to be called twice > Actual: called 3 times - over-saturated and active > I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept > loop{noformat} > > I asked him to provide a gdb traceback and we can see the following: > > {noformat} > Thread 1 (Thread 0xffffa3bc2c60 (LWP 173475)): > #0 0x0000ffffa518db20 in __libc_open64 (file=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", oflag=<optimized out>) at > ../sysdeps/unix/sysv/linux/open64.c:48 > #1 0x0000ffffa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, > filename=<optimized out>, posix_mode=<optimized out>, prot=prot@entry=438, > read_write=8, is32not64=<optimized out>) at fileops.c:189 > #2 0x0000ffffa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, > filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=<optimized > out>, mode@entry=0xaaaad762f3c8 "r", is32not64=is32not64@e > ntry=1) at fileops.c:281 > #3 0x0000ffffa512e0dc in __fopen_internal (filename=0xaaab00f342e0 > "/tmp/7VXP3w/pipe", mode=0xaaaad762f3c8 "r", is32=1) at iofopen.c:75 > #4 0x0000aaaad54f5350 in os::read (path="/tmp/7VXP3w/pipe") at > ../../3rdparty/stout/include/stout/os/read.hpp:136 > #5 0x0000aaaad74f1c1c in > mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody > (this=0xaaab00f88f50) at ../../src/tests/containeri > zer/nested_mesos_containerizer_tests.cpp:1126 > {noformat} > > > Basically the test uses a named pipe to synchronize with the task being > started, and if the task fails to start - in this case because we're trying > to launch an x86 container on an arm64 host - the test will just hang reading > from the pipe. > I send Martin a tentative fix for him to test, and I'll open an MR if > successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)