L'octidi 8 floréal, an CCXXIII, Vincent Lefevre a écrit : > The CPU time is OK.
Good to know. > I don't understand the point. Accumulating in strings (which involves > copies and possible reallocations) and doing a split is much slower > than reading lines one by one and treating them separately. First: not necessarily, because once the header is loaded in a string, you can apply regexps to the whole header at once instead of using a loop. This may prove faster. Second: not slower, more CPU-intensive, but your program seems IO-bound. You stated it above: "CPU time is OK". The gist of it is the usual saying: "profile, don't speculate". You had a particular issue that made your program immensely slower. Now that this problem is resolved and your program run-time is acceptable, you may want to trade a bit of CPU consumption for simplicity: having the whole header in a string makes a lot of things easier and/or more robust, especially everything that has to do with folded headers. And remember you already traded A LOT of CPU for simplicity: you are using Perl, not assembly. ( As an amusing note, I had an issue that was similar in essence some sixteen years ago. It was a NNTP server written in OCaml. To avoid copying data around, and also because strings in OCaml were shameful for a functional language (and still are, the latest release states "In a FIRST STEP towards making strings immutable, a type BYTES of mutable byte arrays and a supporting library module Bytes were introduced." (emphasis is mine)), I used a list of strings for the output buffer, not a single string. It was a terrible idea. For starters, the extra syscalls would completely overbalance the little bit of CPU saved by avoiding copying and reallocation; but that is negligible. What is not negligible is that when making a three-line reply, the server would make a short write for the first line, then a second one for the second line and... Nagle. More clearly: the first short write is sent, the second one is delayed until either the first one is ACKed or the outgoing buffer has enough for the maximum segment size. On the other side, the client was delaying its ACK a little bit in the chance that it can be bundled with ACK for next packets or the client's next request; neither of those was going to happen until the rest of the reply was sent. http://en.wikipedia.org/wiki/Nagle%27s_algorithm http://en.wikipedia.org/wiki/TCP_delayed_acknowledgment I identified the problem by comparing tcpdump output with a normally fast server and could fix it; it only required concatenating the output buffer into a single string before writing it. It was only a few years later, when I read the Stevens, that I actually understood why it made that much of a difference. Of course, for high-performance servers, Unix kernels have the writev() syscall to write several buffers with a single syscall. Personally, I consider Nagle's algorithm to be a bad design decision. The kernel has buffering for TCP sockets anyway, even with TCP_NODELAY. Some kind of MSG_FLUSH flag to send() would have been better than an implicit flush at the end of each write. ) Regards, -- Nicolas George
signature.asc
Description: Digital signature