tags 594777 + patch pending thanks Hi,
After investigating this, I came to the conclusion that the test case that produced this FTBFS is prone to race conditions, unrelated to powerpc. After playing with it for a while, I could reliably (100%) reproduce the problem under certain conditions. Among others, pescetti.debian.org (the PowerPC porterbox) had the problem but only if varnishtest was called with -v (but not -vv!) on the case. The build-deps are still there and the source is in my home directory, if you want to see for yourself. Since this is RC, I've uploaded an NMU fixing this to DELAYED/2. The gory details are in the diff. I'm attaching both the actual patch that fixes this (from debian/patches) and the whole NMU debdiff. As indicated at the patch headers, I haven't forwarded the patch to upstream; I presume you are in a better position to do so. I included a fairly good description plus comments in the test case itself, but if you or upstream need me to clarify, I'd be happy to do so. Note that upstream's SVN history showed that this has been a PITA race-wise for quite some time but, unfortunately, all of their attempts weren't good enough. Regards, Faidon
diff -Nru varnish-2.1.3/debian/changelog varnish-2.1.3/debian/changelog --- varnish-2.1.3/debian/changelog 2010-08-28 01:44:04.000000000 +0300 +++ varnish-2.1.3/debian/changelog 2010-09-08 01:35:53.000000000 +0300 @@ -1,3 +1,14 @@ +varnish (2.1.3-6.1) unstable; urgency=high + + * Non-maintainer upload. + * Urgency high because of a squeeze-targeted RC bugfix. + * Fix powerpc FTBFS caused by a race condition in a test suite case. + (Closes: #594777) + * Rename patch debian-changes-2.1.3-6 to fix-changelog-typo and fix + its documentation. + + -- Faidon Liambotis <parav...@debian.org> Wed, 08 Sep 2010 01:23:28 +0300 + varnish (2.1.3-6) unstable; urgency=low * Install only libvarnishapi.so (Closes: #592244) diff -Nru varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 --- varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 2010-08-28 01:48:39.000000000 +0300 +++ varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 1970-01-01 02:00:00.000000000 +0200 @@ -1,48 +0,0 @@ -Description: Upstream changes introduced in version 2.1.3-6 - This patch has been created by dpkg-source during the package build. - Here's the last changelog entry, hopefully it gives details on why - those changes were made: - . - varnish (2.1.3-6) unstable; urgency=low - . - * Install only libvarnishapi.so (Closes: #592244) - . - The person named in the Author field signed this changelog entry. -Author: Stig Sandbeck Mathisen <s...@debian.org> -Bug-Debian: http://bugs.debian.org/592244 - ---- -The information above should follow the Patch Tagging Guidelines, please -checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here -are templates for supplementary fields that you might want to add: - -Origin: <vendor|upstream|other>, <url of original patch> -Bug: <url in upstream bugtracker> -Bug-Debian: http://bugs.debian.org/<bugnumber> -Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber> -Forwarded: <no|not-needed|url proving that it has been forwarded> -Reviewed-By: <name and email of someone who approved the patch> -Last-Update: <YYYY-MM-DD> - ---- varnish-2.1.3.orig/doc/changes-2.1.1.html -+++ varnish-2.1.3/doc/changes-2.1.1.html -@@ -74,7 +74,7 @@ - <ul> - <li> - <p><span class="code">varnishsizes</span>, which is -- like <span class="code">varnishhost</span>, but for the length of objects, -+ like <span class="code">varnishhist</span>, but for the length of objects, - has been added..</p> - </li> - </ul> ---- varnish-2.1.3.orig/doc/changes-2.1.0-2.1.1.xml -+++ varnish-2.1.3/doc/changes-2.1.0-2.1.1.xml -@@ -86,7 +86,7 @@ - - <change type="enh"> - <para><code>varnishsizes</code>, which is -- like <code>varnishhost</code>, but for the length of objects, -+ like <code>varnishhist</code>, but for the length of objects, - has been added..</para> - </change> - </subsystem> diff -Nru varnish-2.1.3/debian/patches/fix-changelog-typo varnish-2.1.3/debian/patches/fix-changelog-typo --- varnish-2.1.3/debian/patches/fix-changelog-typo 1970-01-01 02:00:00.000000000 +0200 +++ varnish-2.1.3/debian/patches/fix-changelog-typo 2010-09-07 22:55:34.000000000 +0300 @@ -0,0 +1,25 @@ +Author: Stig Sandbeck Mathisen <s...@debian.org> +Origin: vendor + +--- varnish-2.1.3.orig/doc/changes-2.1.1.html ++++ varnish-2.1.3/doc/changes-2.1.1.html +@@ -74,7 +74,7 @@ + <ul> + <li> + <p><span class="code">varnishsizes</span>, which is +- like <span class="code">varnishhost</span>, but for the length of objects, ++ like <span class="code">varnishhist</span>, but for the length of objects, + has been added..</p> + </li> + </ul> +--- varnish-2.1.3.orig/doc/changes-2.1.0-2.1.1.xml ++++ varnish-2.1.3/doc/changes-2.1.0-2.1.1.xml +@@ -86,7 +86,7 @@ + + <change type="enh"> + <para><code>varnishsizes</code>, which is +- like <code>varnishhost</code>, but for the length of objects, ++ like <code>varnishhist</code>, but for the length of objects, + has been added..</para> + </change> + </subsystem> diff -Nru varnish-2.1.3/debian/patches/fix-race-test-v00014 varnish-2.1.3/debian/patches/fix-race-test-v00014 --- varnish-2.1.3/debian/patches/fix-race-test-v00014 1970-01-01 02:00:00.000000000 +0200 +++ varnish-2.1.3/debian/patches/fix-race-test-v00014 2010-09-08 01:35:22.000000000 +0300 @@ -0,0 +1,85 @@ +Author: Faidon Liambotis <parav...@debian.org> +Bug-Debian: http://bugs.debian.org/594777 +Forwarded: no +Last-Update: 2010-09-07 + +The test case in question is testing the backend polling functionality of +varnish. It does so by spawning an extra server{} that is intended to be +polled by the probe instead of by a client{}. + +Unfortunately, the way this is implemented, a race exists in which the client +thread runs after the server thread has run but before the probe had the time +to actually poll the backend and verify its healthiness. + +This resulted in a 100% reproducible test case failure but under very certain +conditions. For example, in a particular machine, it succeded with no +verbosity, failed with verbosity 1 (-v) and succeeded with verbosity > 1 (-vv). + +SVN history shows that races have been a problem forever and various solutions +have been attempted in the past. This patch tries to solve the problem once and +for all along with properly documenting how the test case works so that it +isn't easily broken again in the future. As a side goal, it tries to fix the +case to run faster by using multiples of 0.1s instead of 1s. + +--- a/bin/varnishtest/tests/v00014.vtc ++++ b/bin/varnishtest/tests/v00014.vtc +@@ -2,6 +2,7 @@ + + test "Check req.backend.healthy" + ++# probe will hit this to check healthiness + server s1 { + rxreq + expect req.url == "/" +@@ -16,8 +17,8 @@ varnish v1 -vcl { + .max_connections = 1; + .probe = { + .url = "/"; +- .timeout = 1s; +- .interval = 1s; ++ .timeout = 0.1s; ++ .interval = 0.1s; + .window = 3; + .threshold = 2; + .initial = 0; +@@ -33,25 +34,33 @@ varnish v1 -vcl { + } + } -start + ++# probe runs every 0.1s and needs 2 (threshold) out of the latest 3 (window) to ++# be successful. Make sure this does *not* happen: the first probe would be ++# successful (s1 was started and finished), but let the next one or two to fail ++# (and hence making the backend unhealthy) by idling. ++server s1 -wait ++delay 0.3 ++ ++# verify that backend is unhealthy as expected + client c1 { + txreq + rxresp + expect resp.status == 500 + } -run + +-server s1 { +- rxreq +- expect req.url == "/" +- txresp -body "slash" +-} -start ++# now let three consecutive probe checks to succeed by starting them and ++# waiting for them to finish. This will lead varnish to believe that the ++# backend is healthly. (note that s1 ports are reused) ++server s1 -repeat 3 -start -wait + +-server s1 { ++# do an actual check with a healthy backend, expecting to get a 200 back ++server s2 { + rxreq + expect req.url == "/foo" + txresp -body "foobar" + } -start + +-client c1 { ++client c2 { + txreq -url "/foo" + rxresp + expect resp.status == 200 diff -Nru varnish-2.1.3/debian/patches/series varnish-2.1.3/debian/patches/series --- varnish-2.1.3/debian/patches/series 2010-08-28 01:44:36.000000000 +0300 +++ varnish-2.1.3/debian/patches/series 2010-09-08 01:50:32.000000000 +0300 @@ -1 +1,2 @@ -debian-changes-2.1.3-6 +fix-changelog-typo +fix-race-test-v00014
Author: Faidon Liambotis <parav...@debian.org> Bug-Debian: http://bugs.debian.org/594777 Forwarded: no Last-Update: 2010-09-07 The test case in question is testing the backend polling functionality of varnish. It does so by spawning an extra server{} that is intended to be polled by the probe instead of by a client{}. Unfortunately, the way this is implemented, a race exists in which the client thread runs after the server thread has run but before the probe had the time to actually poll the backend and verify its healthiness. This resulted in a 100% reproducible test case failure but under very certain conditions. For example, in a particular machine, it succeded with no verbosity, failed with verbosity 1 (-v) and succeeded with verbosity > 1 (-vv). SVN history shows that races have been a problem forever and various solutions have been attempted in the past. This patch tries to solve the problem once and for all along with properly documenting how the test case works so that it isn't easily broken again in the future. As a side goal, it tries to fix the case to run faster by using multiples of 0.1s instead of 1s. --- a/bin/varnishtest/tests/v00014.vtc +++ b/bin/varnishtest/tests/v00014.vtc @@ -2,6 +2,7 @@ test "Check req.backend.healthy" +# probe will hit this to check healthiness server s1 { rxreq expect req.url == "/" @@ -16,8 +17,8 @@ varnish v1 -vcl { .max_connections = 1; .probe = { .url = "/"; - .timeout = 1s; - .interval = 1s; + .timeout = 0.1s; + .interval = 0.1s; .window = 3; .threshold = 2; .initial = 0; @@ -33,25 +34,33 @@ varnish v1 -vcl { } } -start +# probe runs every 0.1s and needs 2 (threshold) out of the latest 3 (window) to +# be successful. Make sure this does *not* happen: the first probe would be +# successful (s1 was started and finished), but let the next one or two to fail +# (and hence making the backend unhealthy) by idling. +server s1 -wait +delay 0.3 + +# verify that backend is unhealthy as expected client c1 { txreq rxresp expect resp.status == 500 } -run -server s1 { - rxreq - expect req.url == "/" - txresp -body "slash" -} -start +# now let three consecutive probe checks to succeed by starting them and +# waiting for them to finish. This will lead varnish to believe that the +# backend is healthly. (note that s1 ports are reused) +server s1 -repeat 3 -start -wait -server s1 { +# do an actual check with a healthy backend, expecting to get a 200 back +server s2 { rxreq expect req.url == "/foo" txresp -body "foobar" } -start -client c1 { +client c2 { txreq -url "/foo" rxresp expect resp.status == 200