On Wed, 3 Nov 2010, Heinbockel, Bill wrote:
For things like the RFC5424 structured data, the name in name=value
pairs cannot contain characters such as NUL.
In my option, what to do if you receive a Syslog message with a NUL in
the name, is the bigger issue
Fortunately not. In RFC5424, full UTF-8 is permitted in values. However, the
name alphabet is much more restricted. Out of my head, I think it is US-
ASCII
minus the control character set. During syslog standardization, we did not
see need to support a larger alphabet (but I admit that it is debatable if
national characters should be supported in names -- after long discussion we
said "no").
While it is against the specification, how do you handle the case were an invalid
character is included in the name? This is an issue where most programs have issues;
they assume that the message will according to the specification. The approach of
what to do if an illegal character is encounter should be standardized as well...
(though this applies more generally to "what do you do if you receive an
invalid name in structured data)
Off the top of my head, you have only 1 of 3 choices:
(1) strip the offending characters
(2) strip the whole name=value pair
(3) drop the entire message
(4) change the offending characters in the messaage
I've seen some software that just replaces all invalid characters with 'X'
ryslog has a couple of ways to escape such characters (replacing them with
#xxx for example)
dropping the message, dropping the element, or dropping the characters all
loose significant data.
replacing the character with something else at least lets someone looking
at it figure out that there was some garbage in the message (and if it's
an ongong issue, they can sniff the network to see the raw data if nothing
else)
going back to the discussion on UTF data that we had several months ago,
most of the time you can treat the data as an octet stream, but when you
parse the data or display the data you need to worry about this sort of
thing.
when parsing a message, I think it's probably sanest to consider a NULL to
be an end-of-message character in the datastream (unless the user
explicitly configures the receiving library to allow nulls and escape
them)
NULL is a special case since it is used to represent the end of a string
in C, and as a result in many other languages that are written in C. As a
result the probability of it causing problems in other tools is
_extremely_ high, much more so than any escape sequence or control
characters. As a result, everything downstream should be protected UNLESS
the user explicitly says to pass it through (and even then I would want to
have the documentation strongly recommend against doing so)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com