Guy Harris <[EMAIL PROTECTED]> - Wed, Oct 08, 2003: > One problem with "tvb_find_guint8()", at least as you're using it, is > that it assumes that lines end with CR-LF. Perhaps they *should*, but > that doesn't mean that they necessarily *will*. > "tvb_find_line_end()" doesn't care whether the line ends with CR, LF, > CR-LF, or LF-CR. > How would "tvb_find_line_end()" have more problems with malformed > headers than "tvb_find_guint8()"?
Yes, I've spotted that point too, and that's why I switched to "tvb_find_guint8()": to be sure to match byte exactly the end of the headers. Now I see how it can be useful, because I did not see the real meaning of the "next_offset" it returns, I should rework my code to use "tvb_find_line_end()" again, sorry. > An alternative would be to have a state variable for the conversation, > indicating whether we're processing the request/reply line, the > headers, or the body, along with another state variable giving the > content length, and just do enough reassembly to reassemble a single > header line. I don't see what you mean, could you (or possibly someone else :) point me to some code doing that in Ethereal? > >Possibly fourth, I read in RFC2616 the Content-Length isn't always > > present, but should be for backward compatibility with HTTP 1.0. > If you're referring to section 4.4 "Message Length", then, if > Content-Length is missing, either > 1) the message is one that's not allowed to have a message-body, in > which case Ethereal shouldn't even try to reassemble the message body; > 2) the message has a Transfer-Encoding field other than > Transfer-Encoding: identity, in which case Ethereal would have to > handle chunked encoding, which is probably something that would be > worth doing eventually, but it's probably not something that needs to > be done now; I did some additional captures, and it seems "chunked" is quite common, where gzip/deflate/compress/whatever never happens (although I Accept-Encoding: gzip,deflate). > Ethereal should, as noted, not even try to process a message-body for > response messages that "MUST NOT" include a message body (although to > tell whether something is a response to a HEAD request we'd have to see > the request, so that might be difficult to handle...). In the case of > non-identity transfer encodings, or multipart/byteranges, it should > probably not reassemble traffic *or* hand it to a subdissector (as it's > not raw data). Otherwise, it could probably assume that the transfer > finishes when the connection is closed, although there's *currently* no > way for TCP to send a "connection closed" indication to the > subdissector. I suggest I rework the desegmentation to use "tvb_find_line_end()", and maybe add chunked Transfer Encoding. Some thought I have that you or the audience could possibly answer: - anyone heard about the difference between the "TE" header and the "Transfer-Encoding" header? - some captures of gzipped/deflated HTTP conversation? There is a lot of possible enhancements for the packet-http.c routines. I think Ethereal should provide common lzw-compression functions or use a third-party lib to deal with all the compressed data that we can come across, for example in HTTP each "part" of a message (when parsing a multipart message) could be decompressed and passed to an appropriate dissector. -- Lo�c Minier <[EMAIL PROTECTED]>
