Ok, so this is probably as old as Jenkins itself, and probably ready to become (if not already) its businesscard - what's that Jenkins thing? Ah, it does some CI and bunch of server hanging around. And the biggest problem is not that the issue described here happens, it's what shown by this log:

01:53:56.709 Archiving artifacts
12:02:21.094 ERROR: Failed to archive artifacts: *.sum
12:02:21.096 hudson.util.IOException2: hudson.util.IOException2: Failed to extract /home/buildslave/workspace/cbuild/transfer of 1 files
12:02:21.096 Caused by: java.io.IOException
12:02:21.096 at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)

So, it hanged build server for 10hrs. As it usually happens, there're more jobs accumulate for this build slave, other builds waiting for this build to complete, so entire CI system comes to a halt, followed by kaboom. And if nobody's looking, Jenkins will happily and shamelessly lock up your servers for days.

So, why this happens? Surely, there's no single reason, but carefully selected assorti of toxic stuff:

1. Bugs in JVM/Java - just because there're comments above that switching to another JDK version/provider seem to alleviate it.
2. Bugs in Jenkins - just because every new release brings only security fixes as if previous version was something like 0.0.1, so you may imagine there's lot to fix yet.
3. But most importantly, that's the way Jenkins is written. To illustrate that, let's look at linked JENKINS-11586, comment straight from Kohsuke: "The ping currently is supposed to wait for 4 minutes before it gives up and kills the channel. But I have a hard time believing that the channel did really clog for 4 minutes." Here we go. Network over which that channel goes is UNRELIABLE. Everything you find hard to believe about it is actually true. It may fail to deliver stuff, it may clog for 4 or 40 minutes, RIAA may knock on your door telling you download forbidden torrents which you never did. Or take another example, straight from FastPipedInputStream.java as quoted in stacktrace above. In its header, it confesses to be java.io.PipedInputStream equivalent, which just "uses proper synchronization" and "doesn't rely on polling". A seasonal engineer would recognize smartness and forthlooking of Java engineers who did not trust Java to do synchronization and instead used strings-and-stick method. Now smart kids came who thought they could make it better, and now their code locks up servers throughout the world.

Ok, enough intro to problem area. Let's look straight into FastPipedInputStream.java:175 which was caught red-handed in the stacktrace above: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/FastPipedInputStream.java#L163
And what we see immediately is infinite loop. Just see yourself: it blocks in wait on buffer for 10s, then does some liveness checks (which, as we already learned, will fail to detect any issues regularly), the checks for updates from outside, and if nothing happens, it will hang there forever. Here it is, Jenkins coding style.

Here's the patch: https://github.com/pfalcon/remoting/commit/239b3dcf26498ff296fabf770dffc7a456b2878c

So, why am I writing all this? Over time, I learned that if people come to me with artifact archiving issues, the best thing I can suggest them is AVOID NATIVE JENKINS ARTIFACT ARCHIVING LIKE A PLAGUE. People listen, and that job whose stack trace is quoted above is no longer uses it (rest of our jobs didn't use it for years). So, while I hacked up the patch above, I don't really have sandbox to test it with. So, if you experience this issue, I encourage you to try that patch and share results. Just to clarify - the patch above is not going to fix this issue (if you want it not fail, then Java and Jenkins are wrong technologies). But instead it makes it fail fast and not waste the resources (it also addresses only one infinite loop, I'm sure there're dozens more).

The stacktrace above happened with Jenins 1.532.1 on Linux/Ubuntu master/slave (x86, x64, arm slaves affected).

Thanks for listening to the rant, and happy (or sour) Jenkinsing!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to