L'octidi 8 floréal, an CCXXIII, Vincent Lefevre a écrit :
> The CPU time is OK.

Good to know.

> I don't understand the point. Accumulating in strings (which involves
> copies and possible reallocations) and doing a split is much slower
> than reading lines one by one and treating them separately.

First: not necessarily, because once the header is loaded in a string, you
can apply regexps to the whole header at once instead of using a loop. This
may prove faster.

Second: not slower, more CPU-intensive, but your program seems IO-bound. You
stated it above: "CPU time is OK".

The gist of it is the usual saying: "profile, don't speculate". You had a
particular issue that made your program immensely slower. Now that this
problem is resolved and your program run-time is acceptable, you may want to
trade a bit of CPU consumption for simplicity: having the whole header in a
string makes a lot of things easier and/or more robust, especially
everything that has to do with folded headers. And remember you already
traded A LOT of CPU for simplicity: you are using Perl, not assembly.

(

As an amusing note, I had an issue that was similar in essence some sixteen
years ago.

It was a NNTP server written in OCaml. To avoid copying data around, and
also because strings in OCaml were shameful for a functional language (and
still are, the latest release states "In a FIRST STEP towards making strings
immutable, a type BYTES of mutable byte arrays and a supporting library
module Bytes were introduced." (emphasis is mine)), I used a list of strings
for the output buffer, not a single string.

It was a terrible idea. For starters, the extra syscalls would completely
overbalance the little bit of CPU saved by avoiding copying and
reallocation; but that is negligible. What is not negligible is that when
making a three-line reply, the server would make a short write for the first
line, then a second one for the second line and... Nagle.

More clearly: the first short write is sent, the second one is delayed until
either the first one is ACKed or the outgoing buffer has enough for the
maximum segment size. On the other side, the client was delaying its ACK a
little bit in the chance that it can be bundled with ACK for next packets or
the client's next request; neither of those was going to happen until the
rest of the reply was sent.
http://en.wikipedia.org/wiki/Nagle%27s_algorithm
http://en.wikipedia.org/wiki/TCP_delayed_acknowledgment

I identified the problem by comparing tcpdump output with a normally fast
server and could fix it; it only required concatenating the output buffer
into a single string before writing it. It was only a few years later, when
I read the Stevens, that I actually understood why it made that much of a
difference.

Of course, for high-performance servers, Unix kernels have the writev()
syscall to write several buffers with a single syscall.

Personally, I consider Nagle's algorithm to be a bad design decision. The
kernel has buffering for TCP sockets anyway, even with TCP_NODELAY. Some
kind of MSG_FLUSH flag to send() would have been better than an implicit
flush at the end of each write.

)

Regards,

-- 
  Nicolas George

Attachment: signature.asc
Description: Digital signature

Reply via email to