tags 594777 + patch pending
thanks

Hi,

After investigating this, I came to the conclusion that the test case
that produced this FTBFS is prone to race conditions, unrelated to
powerpc. After playing with it for a while, I could reliably (100%)
reproduce the problem under certain conditions.

Among others, pescetti.debian.org (the PowerPC porterbox) had the
problem but only if varnishtest was called with -v (but not -vv!) on the
case. The build-deps are still there and the source is in my home
directory, if you want to see for yourself.

Since this is RC, I've uploaded an NMU fixing this to DELAYED/2.

The gory details are in the diff. I'm attaching both the actual patch
that fixes this (from debian/patches) and the whole NMU debdiff.

As indicated at the patch headers, I haven't forwarded the patch to
upstream; I presume you are in a better position to do so. I included a
fairly good description plus comments in the test case itself, but if
you or upstream need me to clarify, I'd be happy to do so.

Note that upstream's SVN history showed that this has been a PITA
race-wise for quite some time but, unfortunately, all of their attempts
weren't good enough.

Regards,
Faidon
diff -Nru varnish-2.1.3/debian/changelog varnish-2.1.3/debian/changelog
--- varnish-2.1.3/debian/changelog      2010-08-28 01:44:04.000000000 +0300
+++ varnish-2.1.3/debian/changelog      2010-09-08 01:35:53.000000000 +0300
@@ -1,3 +1,14 @@
+varnish (2.1.3-6.1) unstable; urgency=high
+
+  * Non-maintainer upload.
+  * Urgency high because of a squeeze-targeted RC bugfix.
+  * Fix powerpc FTBFS caused by a race condition in a test suite case.
+    (Closes: #594777)
+  * Rename patch debian-changes-2.1.3-6 to fix-changelog-typo and fix
+    its documentation.
+
+ -- Faidon Liambotis <parav...@debian.org>  Wed, 08 Sep 2010 01:23:28 +0300
+
 varnish (2.1.3-6) unstable; urgency=low
 
   * Install only libvarnishapi.so (Closes: #592244)
diff -Nru varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 
varnish-2.1.3/debian/patches/debian-changes-2.1.3-6
--- varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 2010-08-28 
01:48:39.000000000 +0300
+++ varnish-2.1.3/debian/patches/debian-changes-2.1.3-6 1970-01-01 
02:00:00.000000000 +0200
@@ -1,48 +0,0 @@
-Description: Upstream changes introduced in version 2.1.3-6
- This patch has been created by dpkg-source during the package build.
- Here's the last changelog entry, hopefully it gives details on why
- those changes were made:
- .
- varnish (2.1.3-6) unstable; urgency=low
- .
-   * Install only libvarnishapi.so (Closes: #592244)
- .
- The person named in the Author field signed this changelog entry.
-Author: Stig Sandbeck Mathisen <s...@debian.org>
-Bug-Debian: http://bugs.debian.org/592244
-
----
-The information above should follow the Patch Tagging Guidelines, please
-checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
-are templates for supplementary fields that you might want to add:
-
-Origin: <vendor|upstream|other>, <url of original patch>
-Bug: <url in upstream bugtracker>
-Bug-Debian: http://bugs.debian.org/<bugnumber>
-Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
-Forwarded: <no|not-needed|url proving that it has been forwarded>
-Reviewed-By: <name and email of someone who approved the patch>
-Last-Update: <YYYY-MM-DD>
-
---- varnish-2.1.3.orig/doc/changes-2.1.1.html
-+++ varnish-2.1.3/doc/changes-2.1.1.html
-@@ -74,7 +74,7 @@
-     <ul>
-       <li>
-         <p><span class="code">varnishsizes</span>, which is
--      like <span class="code">varnishhost</span>, but for the length of 
objects,
-+      like <span class="code">varnishhist</span>, but for the length of 
objects,
-       has been added..</p>
-       </li>
-     </ul>
---- varnish-2.1.3.orig/doc/changes-2.1.0-2.1.1.xml
-+++ varnish-2.1.3/doc/changes-2.1.0-2.1.1.xml
-@@ -86,7 +86,7 @@
- 
-     <change type="enh">
-       <para><code>varnishsizes</code>, which is
--      like <code>varnishhost</code>, but for the length of objects,
-+      like <code>varnishhist</code>, but for the length of objects,
-       has been added..</para>
-     </change>
-   </subsystem>
diff -Nru varnish-2.1.3/debian/patches/fix-changelog-typo 
varnish-2.1.3/debian/patches/fix-changelog-typo
--- varnish-2.1.3/debian/patches/fix-changelog-typo     1970-01-01 
02:00:00.000000000 +0200
+++ varnish-2.1.3/debian/patches/fix-changelog-typo     2010-09-07 
22:55:34.000000000 +0300
@@ -0,0 +1,25 @@
+Author: Stig Sandbeck Mathisen <s...@debian.org>
+Origin: vendor
+
+--- varnish-2.1.3.orig/doc/changes-2.1.1.html
++++ varnish-2.1.3/doc/changes-2.1.1.html
+@@ -74,7 +74,7 @@
+     <ul>
+       <li>
+         <p><span class="code">varnishsizes</span>, which is
+-      like <span class="code">varnishhost</span>, but for the length of 
objects,
++      like <span class="code">varnishhist</span>, but for the length of 
objects,
+       has been added..</p>
+       </li>
+     </ul>
+--- varnish-2.1.3.orig/doc/changes-2.1.0-2.1.1.xml
++++ varnish-2.1.3/doc/changes-2.1.0-2.1.1.xml
+@@ -86,7 +86,7 @@
+ 
+     <change type="enh">
+       <para><code>varnishsizes</code>, which is
+-      like <code>varnishhost</code>, but for the length of objects,
++      like <code>varnishhist</code>, but for the length of objects,
+       has been added..</para>
+     </change>
+   </subsystem>
diff -Nru varnish-2.1.3/debian/patches/fix-race-test-v00014 
varnish-2.1.3/debian/patches/fix-race-test-v00014
--- varnish-2.1.3/debian/patches/fix-race-test-v00014   1970-01-01 
02:00:00.000000000 +0200
+++ varnish-2.1.3/debian/patches/fix-race-test-v00014   2010-09-08 
01:35:22.000000000 +0300
@@ -0,0 +1,85 @@
+Author: Faidon Liambotis <parav...@debian.org>
+Bug-Debian: http://bugs.debian.org/594777
+Forwarded: no
+Last-Update: 2010-09-07
+
+The test case in question is testing the backend polling functionality of
+varnish. It does so by spawning an extra server{} that is intended to be
+polled by the probe instead of by a client{}.
+
+Unfortunately, the way this is implemented, a race exists in which the client
+thread runs after the server thread has run but before the probe had the time
+to actually poll the backend and verify its healthiness.
+
+This resulted in a 100% reproducible test case failure but under very certain
+conditions. For example, in a particular machine, it succeded with no
+verbosity, failed with verbosity 1 (-v) and succeeded with verbosity > 1 (-vv).
+
+SVN history shows that races have been a problem forever and various solutions
+have been attempted in the past. This patch tries to solve the problem once and
+for all along with properly documenting how the test case works so that it
+isn't easily broken again in the future. As a side goal, it tries to fix the
+case to run faster by using multiples of 0.1s instead of 1s.
+
+--- a/bin/varnishtest/tests/v00014.vtc
++++ b/bin/varnishtest/tests/v00014.vtc
+@@ -2,6 +2,7 @@
+ 
+ test "Check req.backend.healthy"
+ 
++# probe will hit this to check healthiness
+ server s1 {
+       rxreq
+       expect req.url == "/"
+@@ -16,8 +17,8 @@ varnish v1 -vcl {
+               .max_connections = 1;
+               .probe = {
+                       .url = "/";
+-                      .timeout = 1s;
+-                      .interval = 1s;
++                      .timeout = 0.1s;
++                      .interval = 0.1s;
+                       .window = 3;
+                       .threshold = 2;
+                       .initial = 0;
+@@ -33,25 +34,33 @@ varnish v1 -vcl {
+       }
+ } -start
+ 
++# probe runs every 0.1s and needs 2 (threshold) out of the latest 3 (window) 
to
++# be successful. Make sure this does *not* happen: the first probe would be
++# successful (s1 was started and finished), but let the next one or two to 
fail
++# (and hence making the backend unhealthy) by idling.
++server s1 -wait
++delay 0.3
++
++# verify that backend is unhealthy as expected
+ client c1 {
+       txreq
+       rxresp
+       expect resp.status == 500
+ } -run
+ 
+-server s1 {
+-      rxreq
+-      expect req.url == "/"
+-      txresp -body "slash"
+-} -start
++# now let three consecutive probe checks to succeed by starting them and
++# waiting for them to finish. This will lead varnish to believe that the
++# backend is healthly. (note that s1 ports are reused)
++server s1 -repeat 3 -start -wait
+ 
+-server s1 {
++# do an actual check with a healthy backend, expecting to get a 200 back
++server s2 {
+       rxreq
+       expect req.url == "/foo"
+       txresp -body "foobar"
+ } -start
+ 
+-client c1 {
++client c2 {
+       txreq -url "/foo"
+       rxresp
+       expect resp.status == 200
diff -Nru varnish-2.1.3/debian/patches/series 
varnish-2.1.3/debian/patches/series
--- varnish-2.1.3/debian/patches/series 2010-08-28 01:44:36.000000000 +0300
+++ varnish-2.1.3/debian/patches/series 2010-09-08 01:50:32.000000000 +0300
@@ -1 +1,2 @@
-debian-changes-2.1.3-6
+fix-changelog-typo
+fix-race-test-v00014
Author: Faidon Liambotis <parav...@debian.org>
Bug-Debian: http://bugs.debian.org/594777
Forwarded: no
Last-Update: 2010-09-07

The test case in question is testing the backend polling functionality of
varnish. It does so by spawning an extra server{} that is intended to be
polled by the probe instead of by a client{}.

Unfortunately, the way this is implemented, a race exists in which the client
thread runs after the server thread has run but before the probe had the time
to actually poll the backend and verify its healthiness.

This resulted in a 100% reproducible test case failure but under very certain
conditions. For example, in a particular machine, it succeded with no
verbosity, failed with verbosity 1 (-v) and succeeded with verbosity > 1 (-vv).

SVN history shows that races have been a problem forever and various solutions
have been attempted in the past. This patch tries to solve the problem once and
for all along with properly documenting how the test case works so that it
isn't easily broken again in the future. As a side goal, it tries to fix the
case to run faster by using multiples of 0.1s instead of 1s.

--- a/bin/varnishtest/tests/v00014.vtc
+++ b/bin/varnishtest/tests/v00014.vtc
@@ -2,6 +2,7 @@
 
 test "Check req.backend.healthy"
 
+# probe will hit this to check healthiness
 server s1 {
 	rxreq
 	expect req.url == "/"
@@ -16,8 +17,8 @@ varnish v1 -vcl {
 		.max_connections = 1;
 		.probe = {
 			.url = "/";
-			.timeout = 1s;
-			.interval = 1s;
+			.timeout = 0.1s;
+			.interval = 0.1s;
 			.window = 3;
 			.threshold = 2;
 			.initial = 0;
@@ -33,25 +34,33 @@ varnish v1 -vcl {
 	}
 } -start
 
+# probe runs every 0.1s and needs 2 (threshold) out of the latest 3 (window) to
+# be successful. Make sure this does *not* happen: the first probe would be
+# successful (s1 was started and finished), but let the next one or two to fail
+# (and hence making the backend unhealthy) by idling.
+server s1 -wait
+delay 0.3
+
+# verify that backend is unhealthy as expected
 client c1 {
 	txreq
 	rxresp
 	expect resp.status == 500
 } -run
 
-server s1 {
-	rxreq
-	expect req.url == "/"
-	txresp -body "slash"
-} -start
+# now let three consecutive probe checks to succeed by starting them and
+# waiting for them to finish. This will lead varnish to believe that the
+# backend is healthly. (note that s1 ports are reused)
+server s1 -repeat 3 -start -wait
 
-server s1 {
+# do an actual check with a healthy backend, expecting to get a 200 back
+server s2 {
 	rxreq
 	expect req.url == "/foo"
 	txresp -body "foobar"
 } -start
 
-client c1 {
+client c2 {
 	txreq -url "/foo"
 	rxresp
 	expect resp.status == 200

Reply via email to