On Wed, Jun 21, 2017 at 1:08 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote: > > But there were only two questionable values for \0, and in this case the > answer is obvious. Invert the rule to a TOKEN char from the rather dubious > TOKEN_STOP definition. Solved.
... for trunk, IMO. I don't suggest we touch this code again on 2.4 branch. For trunk, when I encounter the concept of a STOP I envision there is a set of defined delimiters (e.g. punctuation/ctrls) which serve as terminators. What HTTP defined was the set of token characters in the bnf. T_HTTP_NOT_TOKEN would have been a better name for the existing rule. T_HTTP_TOKEN is simply a better understood rule. On to whether element [NUL] tests aught to be trustable or unused; My research (unsure if this was shared earlier) starts with tests of isfn('\0'); isalnum false isalpha false iscntrl true isdigit false isgraph false islower false isprint false ispunct false isspace false isupper false isxdigit false isascii true isblank false In each case above, the isfn('\0') results are plainly obvious, right down to iscntrl(NUL)'s and isascii(NUL)'s result. Our resulting test_char table values include element [NUL] as well, and it needs to be sensible for the many purposes... Our declarations; T_ESCAPE_SHELL_CMD false (correct, \0 is EOS, not a char) T_ESCAPE_PATH_SEGMENT false (correct, \0 is EOS, not a char) T_OS_ESCAPE_PATH false (correct, \0 is EOS, not a char) T_HTTP_TOKEN_STOP true (correct, \0 is a non-token byte) T_ESCAPE_LOGITEM false (correct, \0 is EOS not a char) T_ESCAPE_FORENSIC true (interesting/debatable) T_ESCAPE_URLENCODED false (correct, \0 is EOS not a char) T_HTTP_CTRLS true (correct, \0 is NUL, non VCHAR) T_VCHAR_OBSTEXT false (correct, \0 is NUL < 0x80) In the _ESCAPE's we are escaping a string, and NUL's terminate a string and are not escaped. Any NUL byte traveling the wire needs to be evaluated by the appropriate TOKEN, CTRLS or OBSTEXT rule by HTTP RFC. The T_ESCAPE_FORENSIC case inverts the general rule for command lines and paths we are constructing, but as a symbol over the wire, I can agree with the idea of logging an escaped \0 in the case of a NUL input byte. This differs from logging in that our logging API has no counted-string semantic (if it did, I guess we would encrypt \0's.)