On 29 Feb 2012, at 12:21 AM, William A. Rowe Jr. wrote: > After two months, firehose still didn't obtain another +1, so the vote to > incorporate firehose into trunk stands at 3 +1's, 1 -1, and therefore > failed the vote for inclusion in trunk.
I count 4 +1s on the dev@httpd list: minfrin, issac, sctemme, jim and igalic on the pmc list. poirier and niq expressed support, but didn't vote. By my reckoning, it passes. > There are 4 +1's for a firehose subproject at httpd. If you wish to continue > this effort you can keep this simple by svn mv'ing those respective files > across > from httpd/httpd/trunk to httpd/mod_firehose/trunk, where it can build a > larger > community within httpd project, and things like the data representation format > can be evaluated and enhanced by the community. So far, you are the only one who has questioned the data representation format, claiming that wireshark/pcap could be used instead. When I invented the firehose I wanted an ASCII based protocol, so that I could look at a firehose being recorded and by inspection see the boundaries between the buckets. When you're debugging httpd or an application server behind httpd the bucket boundaries are important. The protocol used is simply chunked encoding, with additional fields behind the chunked value allowing you to tie the buckets back together, and it is very clear which bits are firehose and which part is the bucket by inspection, on purpose. In addition, there was a clear need to be able to capture both individual requests, and whole connection streams, and have the ability to choose which. With the individual requests we were able to isolate some tricky protocol edge cases in mod_cache, and with the connections we were able to identify issues with HTTP/1.0 clients and keepalive in a highly loaded service oriented environment. Further crucial information is dropped buckets, which is either a sign of congestion over the pipe to which these buckets are written, or a sign that the child process crashed unexpectedly. When you brought up pcap as a possibility, I went and did a whole bunch of research into it. What I discovered is that pcap itself gives you a binary encapsulation format, meaning that you lose the ability to see bucket boundaries, and the ability to debug by inspection. Then I looked for existing client support that might help, and discovered there is none - clients expect the pcap packets to contain clearly identified packets you would expect to find at layer 2, not free form binary that you can expect to find in an HTTP body. I considered hacking up fake TCP packets, but then ran into the problem that I now needed to care whether they were IPV4 or IPV6 packets, and trying to hack up fake packets when you captured individual requests on a long lived connection is a non trivial exercise. At the end of the day you want confidence in your debugging data, and a hacked together reconstruction of TCP is the exact opposite of that. The HTTP protocol dump is called a "firehose" for a reason - at one point we were recording and analysing hundreds of gigabytes of request data, and firehose had to decode and process that in situ on a production box that had been taken out of the pool without the installation of any scripting language stack (which would have been too slow anyway), or the moving around of the data. Both mod_firehose and firehose are battle tested in that real world scenario. I believe the pcap based solution you proposed is inferior to what we have now, and I believe there is no compelling reason at this point to attempt any alternative solution. Regards, Graham --