On Sat, Dec 02, 2006 at 01:17:28AM -0500, Chris Hanson wrote:
> The reason it doesn't match is that the "R in a circle" character is
> encoded in the log file as using the ISO 8859-1 code 0xae, but this
> isn't a valid first byte of a UTF-8 code.  Consequently, the "."
> pattern doesn't match it.  In fact, I don't think there's _any_ way to
> match this byte sequence in a UTF-8 locale.

I guess [eg]libc's regex functions are a bit strict about their input.
However, grep also comes with its own DFA-based functions, which are
more lax about encoding errors; they are normally skipped for multibyte
encodings, but can be forced with GREP_USE_DFA=1.

> Unfortunately I'm not sure what to do about this, because it's not
> obvious how the log-file messages relate to the locale.  This message

They don't, at least not reliably.  There's stuff in there, like ssh
usernames, that comes directly from nefarious people who don't give a
rat's ass about your particular selection of encoding.

> One thing that works in this case is to set "LC_ALL=C" prior to
> calling grep.  But if the log files sometimes contain UTF-8 coding,
> this will mess that up

I doubt this would be a problem.  Pretty much everything that is matched
explicitly in any rule (hostname, IP address, process ID) is in ASCII.
Any chunk of arbitrary data should be matched with something like .* or
[^[:space:]]+, which will work whether it was decoded or not.

Now, it's true that POSIX restricts the "C" locale to 7-bit characters,
but both grep and elibc appear to deal with binary characters just fine.


One unfortunate side-effect is that any error messages from grep will
therefore be in English, but that's probably a lesser evil.
(LC_MESSAGES cannot be left as is, since mixing different encodings is
not supported.)


-- 
Never trust an operating system you don't have sources for. ;-)
                -- Unknown source



_______________________________________________
Logcheck-devel mailing list
Logcheck-devel@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/logcheck-devel

Reply via email to