Re: [Syslog] #5 - character encoding (was: Consensus?)
Not sure which bits of MIME you have in mind but I like the term Content-Transfer-Encoding, I like the list of such encodings, I like the list of charsets and I like the way that the user/application gets to choose a suitable delimiter for the various parts rather than have the protocol designer impose an unsuitable one. What don't I like? the term charset, but I think that is too well embedded to avoid. Oh and I love the way that the protocol proper is ASCII encoded, so easy to read and display unlike markup languages, binary encodings etc etc And I think its guidance about what to do when fields are absent or corrupt is good, leading to a good chance of interoperability. Tom Petch - Original Message - From: "Rainer Gerhards" <[EMAIL PROTECTED]> To: "Tom Petch" <[EMAIL PROTECTED]>; "Chris Lonvick" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, December 01, 2005 2:25 PM Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Well... let me rephrase it slightly ;) After -protocol is finished, we could actually do something like syslog-mime, which could then describe this as an optional feature. Might even not be as crazy as it sounds - at least if I look what has been suggested so far. syslog-mime might be a solution for some of these needs Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards > Sent: Thursday, December 01, 2005 12:45 PM > To: Tom Petch; Chris Lonvick > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Tom, WG > > I am *not* kidding. If we go for an encoding header, why not > use MIME? > > Rainer > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch > > Sent: Thursday, December 01, 2005 10:05 AM > > To: Chris Lonvick > > Cc: [EMAIL PROTECTED] > > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?) > > > > - Original Message - > > From: "Chris Lonvick" <[EMAIL PROTECTED]> > > To: "Rainer Gerhards" <[EMAIL PROTECTED]> > > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > Sent: Wednesday, November 30, 2005 2:18 PM > > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > > > > Hi Rainer, > > > > > > Let's use this email as an example. :) There is no > > indication that I'm > > > using US-ASCII encoding or that I'm writing in English. > > > > Actually, Chris, there is; when I receive this e-mail, the > > header contains > > > > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > > so implicitly or explicitly you are telling me it is US-ASCII > > > > By contrast, e-mails from Rainer, contain > > > > Content-Type: text/plain; charset="iso-8859-1" > > Content-Transfer-Encoding: quoted-printable > > > > This reply is in charset="iso-8859-1" but by default, my > > Windows MUA replies in > > the charset the message came in, so that > > replying to you I could not then spell Müller properly which > > I could do to > > Rainer. On the > > other hand, when replying to you, Windows inserts > to denote > > the incoming text > > which it suppresses when I reply to Rainer. And some e-mails > > I receive are not > > in US-ASCII but lack the charset= in which case the display > > on screen is > > somewhat or totally corrupted. > > > > So MIME does an ok job but can be fooled by the rest of the > > system; if we can do > > that well with syslog, we should be proud of ourselves. > > > > Tom Petch > > > > > > ___ > > Syslog mailing list > > Syslog@lists.ietf.org > > https://www1.ietf.org/mailman/listinfo/syslog > > > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
I suggest including wording to the effect "if no SD-ID encoding element is specified, then the encoding of the content is implementation specific and it is RECOMMENDED that no assumption be made about the encoding of the content." dbh > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards > Sent: Wednesday, November 30, 2005 6:24 AM > To: [EMAIL PROTECTED]; Chris Lonvick > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Andrew, > > > >Hi Rainer, > > > > > >Why don't we look at it from the other direction? We could > > state that any > > >encoding is acceptable - for ease-of-use/migration with > > existing syslog > > >implementations. It is RECOMMENDED that UTF-8 be used. > When it is > > >used, an SD-ID element will be REQUIRED. e.g. - > > [enc="utf-8" lang="en"] > > > > I like that idea too. > > > > So, if no SD-ID encoding element is specified, then we must > > assume US-ASCII > > and deal with it accordingly?? > > I think not. If it is not present, we known that we do not know it. If > it is US-ASCII, I would expect something like > > [enc="us-ascii" lang="en"] > > Of course, we could also say if it is non-present, we can assume > US-ASCII. But then we would need to introduce > > [enc="unknown"] > > for the (common) case where we simply do not know it (again: think > POSIX). I find this somehwat confusing. > > Rainer > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Well... let me rephrase it slightly ;) After -protocol is finished, we could actually do something like syslog-mime, which could then describe this as an optional feature. Might even not be as crazy as it sounds - at least if I look what has been suggested so far. syslog-mime might be a solution for some of these needs Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards > Sent: Thursday, December 01, 2005 12:45 PM > To: Tom Petch; Chris Lonvick > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Tom, WG > > I am *not* kidding. If we go for an encoding header, why not > use MIME? > > Rainer > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch > > Sent: Thursday, December 01, 2005 10:05 AM > > To: Chris Lonvick > > Cc: [EMAIL PROTECTED] > > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?) > > > > - Original Message - > > From: "Chris Lonvick" <[EMAIL PROTECTED]> > > To: "Rainer Gerhards" <[EMAIL PROTECTED]> > > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > > Sent: Wednesday, November 30, 2005 2:18 PM > > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > > > > Hi Rainer, > > > > > > Let's use this email as an example. :) There is no > > indication that I'm > > > using US-ASCII encoding or that I'm writing in English. > > > > Actually, Chris, there is; when I receive this e-mail, the > > header contains > > > > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > > so implicitly or explicitly you are telling me it is US-ASCII > > > > By contrast, e-mails from Rainer, contain > > > > Content-Type: text/plain; charset="iso-8859-1" > > Content-Transfer-Encoding: quoted-printable > > > > This reply is in charset="iso-8859-1" but by default, my > > Windows MUA replies in > > the charset the message came in, so that > > replying to you I could not then spell Müller properly which > > I could do to > > Rainer. On the > > other hand, when replying to you, Windows inserts > to denote > > the incoming text > > which it suppresses when I reply to Rainer. And some e-mails > > I receive are not > > in US-ASCII but lack the charset= in which case the display > > on screen is > > somewhat or totally corrupted. > > > > So MIME does an ok job but can be fooled by the rest of the > > system; if we can do > > that well with syslog, we should be proud of ourselves. > > > > Tom Petch > > > > > > ___ > > Syslog mailing list > > Syslog@lists.ietf.org > > https://www1.ietf.org/mailman/listinfo/syslog > > > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Tom, I apprecite your point. My intension is: -15 specifies that MSG must contain UTF-8 encoding exclusively (full character set). During implementation, I have seen that I can not obtain the encoding information for to-be-sent messages under Unix. In the mean time, Balazs Scheidler has suggest a potential way to do that, then this would probably be a no-issue. For the time being, let's assume it can not be obtained. In many cases, I have different encodings, like ISO 8859-1 or EUC, at least in parts of the message. As I do not know the encoding, I can not properly convert it to Unicode. So the syslogd would send a non-compliant message. As there is a high chance of invalid UTF-8 sequences (as it is no UTF-8), a compliant receiving syslogd must drop this message because it is invalid. My point here is that I am not really interested if the sending syslogd is to blame or not. My point is that the message can not be received. My proposal was to recommend UTF-8 whenever possible, but allow MSGs with unknown encoding when we can not obtain encoding information. To differentiate, I suggested the the Unicode BOM is used if it is UTF-8. Though there might still be small window of misinterpretation, I'd expect that a UTF-8 encoded BOM is very unlikely to appear in the first three octests of an ordinary syslog message. I'd found this easy and acceptable. If the syslogd reliably can obtain the provided encoding - as Balazs thankfully mentioned - we could stick with UTF-8 only, as it now would be no issue. The only issue eventually present in it would be if we could expect implementors to implement a converter for any given character set to Unicode - but that's a different story. The ever-changing fragile WG consensus at this time of the year seems to be that we are back to supporting all possible encodings to address the need I mentioned. While I do not really like this approach, it will allow me to do what I need to do. So I do not object it. I agree with you that we should not try to focus too much on backwards compatibility. But on the other hand, Vancouver told us people would like to see it. The list then said "oh no". A few days later we have multiple voices saying we must support this and that. I have to admit that I loose sense of stable consensus the longer I discuss this now. For me, I have decided to only voice my concerns if I believe something will be broken. Field order, field semantics and a lot of the other issues currently being re-re-re-re-considered are not really that important. Even if we end up with something totally horrible, I am sure it is possible to program a parser that handles it. After all, our parsers handle todays syslog - can it really become worse? I think there would be huge value in a syslog standard, no matter how ugly the details may look to some of us. After all, beauty is a very subjective concept ;) I hope I have been able to convey my root concern on the encoding. On the other issues, I am waiting for WG consensus to be declared and then I will include that consensus, whatever it is, into the I-D. I just hope it'll stay stable long enough so that the I-D can proceed... Rainer > -Original Message- > From: Tom Petch [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 01, 2005 9:25 AM > To: Rainer Gerhards; Chris Lonvick > Cc: [EMAIL PROTECTED] > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?) > > Rainer > > I think I detect an approach I do not agree with, in this and > perhaps other > issues. > > You seem to be saying that the (eg POSIX) syslogd must emit > perfect syslog > messages and is responsible for anything that is wrong with > them no matter what > it received from the application (I exaggerate slightly). > > I would say that if the application passes incomprehensible > garbage, something > criminal or illegal, then it is the application that is at > fault; syslogd can > only be held responsible if it produces messages that are > invalid for the parts > over which it has control, eg header syntax. > > So if syslogd has no idea what the transfer encoding is > because the rest of the > system does not tell it, then syslogd cannot be held > responsible for the absence > of a field saying what the transfer encoding actually is. Or > put differently, > if our RFC specify what the application MUST or SHOULD do, as > well as syslogd, > then that is ok with me. > > What syslogd would be responsible for, IMO, would be allowing > characters that > have a special meaning in the syntax (eg NUL is end of > message) appearing > unescaped (or otherwise encoded). Whether we have such > problems depends on the > resolution of other issues, not saying that we have at present. > > Tom Petch > > - Original Messag
RE: [Syslog] #5 - character encoding (was: Consensus?)
Tom, WG I am *not* kidding. If we go for an encoding header, why not use MIME? Rainer > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch > Sent: Thursday, December 01, 2005 10:05 AM > To: Chris Lonvick > Cc: [EMAIL PROTECTED] > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?) > > - Original Message - > From: "Chris Lonvick" <[EMAIL PROTECTED]> > To: "Rainer Gerhards" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Wednesday, November 30, 2005 2:18 PM > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > > Hi Rainer, > > > > Let's use this email as an example. :) There is no > indication that I'm > > using US-ASCII encoding or that I'm writing in English. > > Actually, Chris, there is; when I receive this e-mail, the > header contains > > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > so implicitly or explicitly you are telling me it is US-ASCII > > By contrast, e-mails from Rainer, contain > > Content-Type: text/plain; charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > > This reply is in charset="iso-8859-1" but by default, my > Windows MUA replies in > the charset the message came in, so that > replying to you I could not then spell Müller properly which > I could do to > Rainer. On the > other hand, when replying to you, Windows inserts > to denote > the incoming text > which it suppresses when I reply to Rainer. And some e-mails > I receive are not > in US-ASCII but lack the charset= in which case the display > on screen is > somewhat or totally corrupted. > > So MIME does an ok job but can be fooled by the rest of the > system; if we can do > that well with syslog, we should be proud of ourselves. > > Tom Petch > > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
Re: [Syslog] #5 - character encoding (was: Consensus?)
- Original Message - From: "Chris Lonvick" <[EMAIL PROTECTED]> To: "Rainer Gerhards" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 30, 2005 2:18 PM Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > Hi Rainer, > > Let's use this email as an example. :) There is no indication that I'm > using US-ASCII encoding or that I'm writing in English. Actually, Chris, there is; when I receive this e-mail, the header contains Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed so implicitly or explicitly you are telling me it is US-ASCII By contrast, e-mails from Rainer, contain Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable This reply is in charset="iso-8859-1" but by default, my Windows MUA replies in the charset the message came in, so that replying to you I could not then spell Müller properly which I could do to Rainer. On the other hand, when replying to you, Windows inserts > to denote the incoming text which it suppresses when I reply to Rainer. And some e-mails I receive are not in US-ASCII but lack the charset= in which case the display on screen is somewhat or totally corrupted. So MIME does an ok job but can be fooled by the rest of the system; if we can do that well with syslog, we should be proud of ourselves. Tom Petch ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
Re: [Syslog] #5 - character encoding (was: Consensus?)
Rainer I think I detect an approach I do not agree with, in this and perhaps other issues. You seem to be saying that the (eg POSIX) syslogd must emit perfect syslog messages and is responsible for anything that is wrong with them no matter what it received from the application (I exaggerate slightly). I would say that if the application passes incomprehensible garbage, something criminal or illegal, then it is the application that is at fault; syslogd can only be held responsible if it produces messages that are invalid for the parts over which it has control, eg header syntax. So if syslogd has no idea what the transfer encoding is because the rest of the system does not tell it, then syslogd cannot be held responsible for the absence of a field saying what the transfer encoding actually is. Or put differently, if our RFC specify what the application MUST or SHOULD do, as well as syslogd, then that is ok with me. What syslogd would be responsible for, IMO, would be allowing characters that have a special meaning in the syntax (eg NUL is end of message) appearing unescaped (or otherwise encoded). Whether we have such problems depends on the resolution of other issues, not saying that we have at present. Tom Petch - Original Message - From: "Rainer Gerhards" <[EMAIL PROTECTED]> To: "Chris Lonvick" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 30, 2005 2:48 PM Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Chris, I fully agree - thanks ;) Rainer > -Original Message- > From: Chris Lonvick [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 30, 2005 2:39 PM > To: Rainer Gerhards > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Hi Rainer, > > I believe that we are saying the same thing. :) > > If there is no indicator of encoding or language then a > reciever will not > know what it is receiving - just like receivers don't know > what they are > receiving today. They MAY make an assumption that it is something in > US-ASCII (but may be disappointed). > > If there is an indicator of the encoding and language then > the receiver > will know exactly what it is. Having an indicator should be > RECOMMENDED > but not REQUIRED for ease of migration. > > Is that what we're all saying? > > Thanks, > Chris > > > > On Wed, 30 Nov 2005, Rainer Gerhards wrote: > > > Chris, > > > >> Let's use this email as an example. :) There is no > >> indication that I'm > >> using US-ASCII encoding or that I'm writing in English. > > > > I think there actually is. If I am right, the SMTP RFCs > require mail text to be US-ASCII. Only via MIME and/or escape > characters you can include 8-bit data. For example Müller and > Möller might create some problems in some mailers (But I > guess my Mail system will encode them with =). > Dropping messages with octets > 127 in the subject is a > common spam protection setting... > > > >> However, you're > >> able to recieve this and read it. Similarly, you could write > >> an email in > >> German and send it to me. I would still be able to recieve > >> it but I'd > >> have a difficult time parsing the meaning. > >> > >> I'm suggesting that same approach for the transmission of > the syslog > >> content. If I really wanted you to know what encoding and > >> language I'm > >> using in an email, I would specify a mime header. syslog > >> senders will > >> continue to pump out whatever encoding and language they've > >> been using > >> and recievers will continue to do their best to parse them. > >> If a vendor > >> wants to get very specific about that, then they will have to > >> use an SD-ID > >> to identify the contents of the message. > > > > Here I agree with you. What I was saying is that IF the > header says it is US-ASCII, only then we should assume it > actually is. If there is no "enc" SD-ID, then we do not know > what it is but can assume ... whatever we assume. Let me > phrase it that way: > > > > If the message contains > > > > [enc="us-ascii" lang="en"] > > > > then the receiver can honestly expect it to be US-ASCII. > But if it does not contain any "enc" the receiver does not > know exactly and assume anything it finds useful (may be > ASCII, may not). > > > > Does this clarify? I somehow have the impression we mean > the same thing and I simply do not manage to c
[Fwd: RE: [Syslog] #5 - character encoding (was: Consensus?)]
Missed reply-all... Forwarded Message > From: Balazs Scheidler <[EMAIL PROTECTED]> > To: Rainer Gerhards <[EMAIL PROTECTED]> > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > Date: Thu, 01 Dec 2005 10:55:42 +0100 > > On Wed, 2005-11-30 at 09:01 +0100, Rainer Gerhards wrote: > > Sheran, > > > > > Also want to clarify that you suggest that if the message is in ASCII, > > > it will not required SD-ID, but for all other encodings, SD-ID will be > > > required. > > > > Unfortunately, we can not do this. If we would know the encoding, we > > could translate it to UTF-8, as so far is required by syslog-protocol. > > However, we often do not know which encoding it is. The reason is that > > the POSIX syslog API does not tell us. So if we want to support POSIX > > (which I think we must), we must allow a syslog sender to send messages > > without telling the encoding - simply because it has no way to obtain > > that knowledge. > > > > A syslog sender embedded e.g. in a device does probably not have this > > restriction. So it SHOULD encode in UTF-8. That will ensure the receiver > > can understand it. If the sender has absolutely no idea of how to do > > that, but knows the encoding, then (and only then) it SHOULD specify the > > encoding. > > Just a small note, there is a way in the syslog() libc function to > recover current encoding information based on the contents of the > LC_CTYPE (or LANG) environment variable. So although the API does not > explicitly contain parameters to specify encoding, the program > environment contains this information. You are right that the standard > POSIX API without any changes will send unfiltered/unconverted strings > to syslog without any encoding information, but it is not impossible to > create a replacement for syslog(3) that actually delivers this > information while staying compatible with the POSIX API. > > The way I see it: > - have the SD-ID to specify encoding and use that if available > - if there is no SD-ID (legacy applications) then assume US-ASCII and > let the administrator override this on a per-source basis (using a > SHOULD clause) > - implementation SHOULD validate (and possibly convert) incoming > messages and SHOULD allow the administrator to choose what to do with > non-conforming characters (drop, substitute, leave it as is) > > -- Bazsi ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris: I agree with all your points. Recommend an encoding and standard lang tag, and accept all other encoding and lang specification. Regards, Sheran -Original Message- From: Chris Lonvick (clonvick) Sent: Wednesday, November 30, 2005 5:06 AM To: Shyyunn Lin (sheranl) Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Sheran, On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote: > Chris: > > I think having SD-ID with [enc="utf-8" lang="English"] may be a good > approach. If different language use utf-8 encoding, then "lang=" can > distinguish it. We _should_ be using language codes from RFC 3066. That specifies ISO 639 language tags. 639-1 has 2 character codes ("en" is English) and 639-2 has 3 characters ("eng" is English). RFC 3066 will likely be replaced by the works of the Language Tag Registry Update (ltru) Working Group. http://www.ietf.org/html.charters/ltru-charter.html They have IDs in the works. Until those become RFCs we should continue to reference RFC 3066. > > Also want to clarify that you suggest that if the message is in ASCII, > it will not required SD-ID, but for all other encodings, SD-ID will be > required. Yes - that's my suggestion. > > Note most other encoding methods already imply the language used, for > example, in Chinese, there are several encoding methods, Traditional > Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese > used in Mainland China is GBK, so if the message is in traditional > Chinese char, it will be shown as [enc="Big5", lang="Traditional > Chinese"], a little bit redundant. The Big5 also includes all English > char so it can be a mix of Chinese and English. Good point. As far as I can tell, "Big5" is not recognized by any accredited standards developing organization. It is recognized by the Ideographic Rapporteur Group (IRG) which reports to the Unicode consortium. The recognized way to represent Chinese characters, traditional and simplified, is through ISO 639-2 with the subcodes to indicate traditional and simplified for the "zh" _language_. The ID on "Tags for Identifying Languages" http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt identifies simplified Chinese as "zh-Hans" and traditional Chinese as "zh-Hant". Additional subtags could identify a locale such as "zh-Hant-TW" for Taiwan Chinese in traditional script. This is from the "Initial Language Subtag Registry" ID. http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt I think that we should specify encoding and language tags as striaghtforward as possible and let others augment syslog-protocol (in the future) with other encoding mechanisms. We can RECOMMEND that encoding be in UTF-8 and language tags come from RFC 3066. We can allow that other encoding and language identifications are acceptable. In the worst case, a vendor will have the option of [EMAIL PROTECTED]"something" [EMAIL PROTECTED]"piglatin"]. Does this work for you? Thanks, Chris > > > > Regards, > > Sheran > > -Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick > (clonvick) > Sent: Tuesday, November 29, 2005 10:22 AM > To: Rainer Gerhards > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Hi Rainer, > > Why don't we look at it from the other direction? We could state that > any encoding is acceptable - for ease-of-use/migration with existing > syslog implementations. It is RECOMMENDED that UTF-8 be used. When > it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" > lang="en"] > > Thoughts? > > All: Let's discuss this and close this issue. > > Thanks, > Chris > > On Tue, 29 Nov 2005, Rainer Gerhards wrote: > >> Chris & WG, >> >>>> #5 Character encoding in MSG: due to my proof-of-concept >>>> implementation, I have raised the (ugly) question if we need >>>> to allow encodings other than UTF-8. Please note that this >>>> question arises from needs introduced by e.g. POSIX. So we >>>> can't easily argue them away by whishful thinking ;) >>>> >>>> Not even discussed yet. >>> >>> I haven't reviewed that yet. However, I'll note that allowing >>> different encoding can be accomplished in the future as long as we >>> establish a default encoding and a way to identify it in our current >>> work. >> >> I have read a little in the mailing archive. Please note that in 2000
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, I fully agree - thanks ;) Rainer > -Original Message- > From: Chris Lonvick [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 30, 2005 2:39 PM > To: Rainer Gerhards > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Hi Rainer, > > I believe that we are saying the same thing. :) > > If there is no indicator of encoding or language then a > reciever will not > know what it is receiving - just like receivers don't know > what they are > receiving today. They MAY make an assumption that it is something in > US-ASCII (but may be disappointed). > > If there is an indicator of the encoding and language then > the receiver > will know exactly what it is. Having an indicator should be > RECOMMENDED > but not REQUIRED for ease of migration. > > Is that what we're all saying? > > Thanks, > Chris > > > > On Wed, 30 Nov 2005, Rainer Gerhards wrote: > > > Chris, > > > >> Let's use this email as an example. :) There is no > >> indication that I'm > >> using US-ASCII encoding or that I'm writing in English. > > > > I think there actually is. If I am right, the SMTP RFCs > require mail text to be US-ASCII. Only via MIME and/or escape > characters you can include 8-bit data. For example Müller and > Möller might create some problems in some mailers (But I > guess my Mail system will encode them with =). > Dropping messages with octets > 127 in the subject is a > common spam protection setting... > > > >> However, you're > >> able to recieve this and read it. Similarly, you could write > >> an email in > >> German and send it to me. I would still be able to recieve > >> it but I'd > >> have a difficult time parsing the meaning. > >> > >> I'm suggesting that same approach for the transmission of > the syslog > >> content. If I really wanted you to know what encoding and > >> language I'm > >> using in an email, I would specify a mime header. syslog > >> senders will > >> continue to pump out whatever encoding and language they've > >> been using > >> and recievers will continue to do their best to parse them. > >> If a vendor > >> wants to get very specific about that, then they will have to > >> use an SD-ID > >> to identify the contents of the message. > > > > Here I agree with you. What I was saying is that IF the > header says it is US-ASCII, only then we should assume it > actually is. If there is no "enc" SD-ID, then we do not know > what it is but can assume ... whatever we assume. Let me > phrase it that way: > > > > If the message contains > > > > [enc="us-ascii" lang="en"] > > > > then the receiver can honestly expect it to be US-ASCII. > But if it does not contain any "enc" the receiver does not > know exactly and assume anything it finds useful (may be > ASCII, may not). > > > > Does this clarify? I somehow have the impression we mean > the same thing and I simply do not manage to convey what I > intend to ;) > > > > Rainer > > > >> > >> Mit Aufrichtigkeit, > >> Chris > >> > >> > >> > >> > >> On Wed, 30 Nov 2005, Rainer Gerhards wrote: > >> > >>> Andrew, > >>> > >>>>> Hi Rainer, > >>>>> > >>>>> Why don't we look at it from the other direction? We could > >>>> state that any > >>>>> encoding is acceptable - for ease-of-use/migration with > >>>> existing syslog > >>>>> implementations. It is RECOMMENDED that UTF-8 be used. > >> When it is > >>>>> used, an SD-ID element will be REQUIRED. e.g. - > >>>> [enc="utf-8" lang="en"] > >>>> > >>>> I like that idea too. > >>>> > >>>> So, if no SD-ID encoding element is specified, then we must > >>>> assume US-ASCII > >>>> and deal with it accordingly?? > >>> > >>> I think not. If it is not present, we known that we do not > >> know it. If > >>> it is US-ASCII, I would expect something like > >>> > >>> [enc="us-ascii" lang="en"] > >>> > >>> Of course, we could also say if it is non-present, we can assume > >>> US-ASCII. But then we would need to introduce > >>> > >>> [enc="unknown"] > >>> > >>> for the (common) case where we simply do not know it (again: think > >>> POSIX). I find this somehwat confusing. > >>> > >>> Rainer > >>> > >> > > > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer, I believe that we are saying the same thing. :) If there is no indicator of encoding or language then a reciever will not know what it is receiving - just like receivers don't know what they are receiving today. They MAY make an assumption that it is something in US-ASCII (but may be disappointed). If there is an indicator of the encoding and language then the receiver will know exactly what it is. Having an indicator should be RECOMMENDED but not REQUIRED for ease of migration. Is that what we're all saying? Thanks, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Chris, Let's use this email as an example. :) There is no indication that I'm using US-ASCII encoding or that I'm writing in English. I think there actually is. If I am right, the SMTP RFCs require mail text to be US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example Müller and Möller might create some problems in some mailers (But I guess my Mail system will encode them with =). Dropping messages with octets > 127 in the subject is a common spam protection setting... However, you're able to recieve this and read it. Similarly, you could write an email in German and send it to me. I would still be able to recieve it but I'd have a difficult time parsing the meaning. I'm suggesting that same approach for the transmission of the syslog content. If I really wanted you to know what encoding and language I'm using in an email, I would specify a mime header. syslog senders will continue to pump out whatever encoding and language they've been using and recievers will continue to do their best to parse them. If a vendor wants to get very specific about that, then they will have to use an SD-ID to identify the contents of the message. Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only then we should assume it actually is. If there is no "enc" SD-ID, then we do not know what it is but can assume ... whatever we assume. Let me phrase it that way: If the message contains [enc="us-ascii" lang="en"] then the receiver can honestly expect it to be US-ASCII. But if it does not contain any "enc" the receiver does not know exactly and assume anything it finds useful (may be ASCII, may not). Does this clarify? I somehow have the impression we mean the same thing and I simply do not manage to convey what I intend to ;) Rainer Mit Aufrichtigkeit, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Andrew, Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc="us-ascii" lang="en"] Of course, we could also say if it is non-present, we can assume US-ASCII. But then we would need to introduce [enc="unknown"] for the (common) case where we simply do not know it (again: think POSIX). I find this somehwat confusing. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, > Let's use this email as an example. :) There is no > indication that I'm > using US-ASCII encoding or that I'm writing in English. I think there actually is. If I am right, the SMTP RFCs require mail text to be US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example Müller and Möller might create some problems in some mailers (But I guess my Mail system will encode them with =). Dropping messages with octets > 127 in the subject is a common spam protection setting... > However, you're > able to recieve this and read it. Similarly, you could write > an email in > German and send it to me. I would still be able to recieve > it but I'd > have a difficult time parsing the meaning. > > I'm suggesting that same approach for the transmission of the syslog > content. If I really wanted you to know what encoding and > language I'm > using in an email, I would specify a mime header. syslog > senders will > continue to pump out whatever encoding and language they've > been using > and recievers will continue to do their best to parse them. > If a vendor > wants to get very specific about that, then they will have to > use an SD-ID > to identify the contents of the message. Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only then we should assume it actually is. If there is no "enc" SD-ID, then we do not know what it is but can assume ... whatever we assume. Let me phrase it that way: If the message contains [enc="us-ascii" lang="en"] then the receiver can honestly expect it to be US-ASCII. But if it does not contain any "enc" the receiver does not know exactly and assume anything it finds useful (may be ASCII, may not). Does this clarify? I somehow have the impression we mean the same thing and I simply do not manage to convey what I intend to ;) Rainer > > Mit Aufrichtigkeit, > Chris > > > > > On Wed, 30 Nov 2005, Rainer Gerhards wrote: > > > Andrew, > > > >>> Hi Rainer, > >>> > >>> Why don't we look at it from the other direction? We could > >> state that any > >>> encoding is acceptable - for ease-of-use/migration with > >> existing syslog > >>> implementations. It is RECOMMENDED that UTF-8 be used. > When it is > >>> used, an SD-ID element will be REQUIRED. e.g. - > >> [enc="utf-8" lang="en"] > >> > >> I like that idea too. > >> > >> So, if no SD-ID encoding element is specified, then we must > >> assume US-ASCII > >> and deal with it accordingly?? > > > > I think not. If it is not present, we known that we do not > know it. If > > it is US-ASCII, I would expect something like > > > > [enc="us-ascii" lang="en"] > > > > Of course, we could also say if it is non-present, we can assume > > US-ASCII. But then we would need to introduce > > > > [enc="unknown"] > > > > for the (common) case where we simply do not know it (again: think > > POSIX). I find this somehwat confusing. > > > > Rainer > > > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, I agree to all but one point - only that one quoted here... > > Also want to clarify that you suggest that if the message > is in ASCII, > > it will not required SD-ID, but for all other encodings, > SD-ID will be > > required. > > Yes - that's my suggestion. I am sorry, we can not do this. The whole issue is rooted in POSIX APIs. You need to look at it why it is such a problem. On Windows, you know what character encodings you are dealing with. On Unix, you actually just get a bunch of octets - and nobody tells you what it is. So the poor Unix syslogd actually has no idea of what it handles and likewise does not know what to place in that field ;) If it knew it were this or that encoding, I would be very tempted to request it to convert to UTF-8. But the need behind this encoding is *NOT* to allow the multitude of whatever currently is in existence but rather provide a way to let a syslogd that needs to omit a "bunch of octets" do that. Does this clarify? I can provide code if that would be helpful... Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer, Let's use this email as an example. :) There is no indication that I'm using US-ASCII encoding or that I'm writing in English. However, you're able to recieve this and read it. Similarly, you could write an email in German and send it to me. I would still be able to recieve it but I'd have a difficult time parsing the meaning. I'm suggesting that same approach for the transmission of the syslog content. If I really wanted you to know what encoding and language I'm using in an email, I would specify a mime header. syslog senders will continue to pump out whatever encoding and language they've been using and recievers will continue to do their best to parse them. If a vendor wants to get very specific about that, then they will have to use an SD-ID to identify the contents of the message. Mit Aufrichtigkeit, Chris On Wed, 30 Nov 2005, Rainer Gerhards wrote: Andrew, Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc="us-ascii" lang="en"] Of course, we could also say if it is non-present, we can assume US-ASCII. But then we would need to introduce [enc="unknown"] for the (common) case where we simply do not know it (again: think POSIX). I find this somehwat confusing. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Sheran, On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote: Chris: I think having SD-ID with [enc="utf-8" lang="English"] may be a good approach. If different language use utf-8 encoding, then "lang=" can distinguish it. We _should_ be using language codes from RFC 3066. That specifies ISO 639 language tags. 639-1 has 2 character codes ("en" is English) and 639-2 has 3 characters ("eng" is English). RFC 3066 will likely be replaced by the works of the Language Tag Registry Update (ltru) Working Group. http://www.ietf.org/html.charters/ltru-charter.html They have IDs in the works. Until those become RFCs we should continue to reference RFC 3066. Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Yes - that's my suggestion. Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc="Big5", lang="Traditional Chinese"], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Good point. As far as I can tell, "Big5" is not recognized by any accredited standards developing organization. It is recognized by the Ideographic Rapporteur Group (IRG) which reports to the Unicode consortium. The recognized way to represent Chinese characters, traditional and simplified, is through ISO 639-2 with the subcodes to indicate traditional and simplified for the "zh" _language_. The ID on "Tags for Identifying Languages" http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt identifies simplified Chinese as "zh-Hans" and traditional Chinese as "zh-Hant". Additional subtags could identify a locale such as "zh-Hant-TW" for Taiwan Chinese in traditional script. This is from the "Initial Language Subtag Registry" ID. http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt I think that we should specify encoding and language tags as striaghtforward as possible and let others augment syslog-protocol (in the future) with other encoding mechanisms. We can RECOMMEND that encoding be in UTF-8 and language tags come from RFC 3066. We can allow that other encoding and language identifications are acceptable. In the worst case, a vendor will have the option of [EMAIL PROTECTED]"something" [EMAIL PROTECTED]"piglatin"]. Does this work for you? Thanks, Chris Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris & WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying "other encodings MAY be used". While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer __
RE: [Syslog] #5 - character encoding (was: Consensus?)
Andrew, > >Hi Rainer, > > > >Why don't we look at it from the other direction? We could > state that any > >encoding is acceptable - for ease-of-use/migration with > existing syslog > >implementations. It is RECOMMENDED that UTF-8 be used. When it is > >used, an SD-ID element will be REQUIRED. e.g. - > [enc="utf-8" lang="en"] > > I like that idea too. > > So, if no SD-ID encoding element is specified, then we must > assume US-ASCII > and deal with it accordingly?? I think not. If it is not present, we known that we do not know it. If it is US-ASCII, I would expect something like [enc="us-ascii" lang="en"] Of course, we could also say if it is non-present, we can assume US-ASCII. But then we would need to introduce [enc="unknown"] for the (common) case where we simply do not know it (again: think POSIX). I find this somehwat confusing. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
>Hi Rainer, > >Why don't we look at it from the other direction? We could state that any >encoding is acceptable - for ease-of-use/migration with existing syslog >implementations. It is RECOMMENDED that UTF-8 be used. When it is >used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] I like that idea too. So, if no SD-ID encoding element is specified, then we must assume US-ASCII and deal with it accordingly?? Cheers Andrew ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Sheran, > Also want to clarify that you suggest that if the message is in ASCII, > it will not required SD-ID, but for all other encodings, SD-ID will be > required. Unfortunately, we can not do this. If we would know the encoding, we could translate it to UTF-8, as so far is required by syslog-protocol. However, we often do not know which encoding it is. The reason is that the POSIX syslog API does not tell us. So if we want to support POSIX (which I think we must), we must allow a syslog sender to send messages without telling the encoding - simply because it has no way to obtain that knowledge. A syslog sender embedded e.g. in a device does probably not have this restriction. So it SHOULD encode in UTF-8. That will ensure the receiver can understand it. If the sender has absolutely no idea of how to do that, but knows the encoding, then (and only then) it SHOULD specify the encoding. Rainer > > Note most other encoding methods already imply the language used, for > example, in Chinese, there are several encoding methods, Traditional > Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese > used in Mainland China is GBK, so if the message is in traditional > Chinese char, it will be shown as [enc="Big5", lang="Traditional > Chinese"], a little bit redundant. The Big5 also includes all English > char so it can be a mix of Chinese and English. > > > > Regards, > > Sheran > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick > (clonvick) > Sent: Tuesday, November 29, 2005 10:22 AM > To: Rainer Gerhards > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > Hi Rainer, > > Why don't we look at it from the other direction? We could state that > any encoding is acceptable - for ease-of-use/migration with existing > syslog implementations. It is RECOMMENDED that UTF-8 be > used. When it > is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" > lang="en"] > > Thoughts? > > All: Let's discuss this and close this issue. > > Thanks, > Chris > > On Tue, 29 Nov 2005, Rainer Gerhards wrote: > > > Chris & WG, > > > >>> #5 Character encoding in MSG: due to my proof-of-concept > >>> implementation, I have raised the (ugly) question if we need > >>> to allow encodings other than UTF-8. Please note that this > >>> question arises from needs introduced by e.g. POSIX. So we > >>> can't easily argue them away by whishful thinking ;) > >>> > >>> Not even discussed yet. > >> > >> I haven't reviewed that yet. However, I'll note that allowing > >> different encoding can be accomplished in the future as long as we > >> establish a default encoding and a way to identify it in > our current > >> work. > > > > I have read a little in the mailing archive. Please note > that in 2000 > > it was consensus that the MSG part may contain encodings other then > > US-ASCII. Follow this threat: > > > > http://www.syslog.cc/ietf/autoarc/msg00127.html > > > > This discussion lead to RFC 3164 saying "other encodings > MAY be used". > > While this was observed behaviour, we need still to be > aware that the > > POSIX (and glibc) API places the restrictions on us that we > simply do > > not know the character encoding used by the application. As > such, no > > *nix syslogd can be programmed to be compliant to > syslog-protocol if > > we demand UTF-8 exclusively. > > > > I propose that we RECOMMEND UTF-8 that MUST start with the Unicode > > Byte Order Mask (BOM) if used. If the MSG part does not > start with the > > > BOM, it may be any encoding just as in RFC 3164. I do not see any > > alternative to this. > > > > Rainer > > > > ___ > > Syslog mailing list > > Syslog@lists.ietf.org > > https://www1.ietf.org/mailman/listinfo/syslog > > > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris: I think having SD-ID with [enc="utf-8" lang="English"] may be a good approach. If different language use utf-8 encoding, then "lang=" can distinguish it. Also want to clarify that you suggest that if the message is in ASCII, it will not required SD-ID, but for all other encodings, SD-ID will be required. Note most other encoding methods already imply the language used, for example, in Chinese, there are several encoding methods, Traditional Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese used in Mainland China is GBK, so if the message is in traditional Chinese char, it will be shown as [enc="Big5", lang="Traditional Chinese"], a little bit redundant. The Big5 also includes all English char so it can be a mix of Chinese and English. Regards, Sheran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick (clonvick) Sent: Tuesday, November 29, 2005 10:22 AM To: Rainer Gerhards Cc: [EMAIL PROTECTED] Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: > Chris & WG, > >>> #5 Character encoding in MSG: due to my proof-of-concept >>> implementation, I have raised the (ugly) question if we need >>> to allow encodings other than UTF-8. Please note that this >>> question arises from needs introduced by e.g. POSIX. So we >>> can't easily argue them away by whishful thinking ;) >>> >>> Not even discussed yet. >> >> I haven't reviewed that yet. However, I'll note that allowing >> different encoding can be accomplished in the future as long as we >> establish a default encoding and a way to identify it in our current >> work. > > I have read a little in the mailing archive. Please note that in 2000 > it was consensus that the MSG part may contain encodings other then > US-ASCII. Follow this threat: > > http://www.syslog.cc/ietf/autoarc/msg00127.html > > This discussion lead to RFC 3164 saying "other encodings MAY be used". > While this was observed behaviour, we need still to be aware that the > POSIX (and glibc) API places the restrictions on us that we simply do > not know the character encoding used by the application. As such, no > *nix syslogd can be programmed to be compliant to syslog-protocol if > we demand UTF-8 exclusively. > > I propose that we RECOMMEND UTF-8 that MUST start with the Unicode > Byte Order Mask (BOM) if used. If the MSG part does not start with the > BOM, it may be any encoding just as in RFC 3164. I do not see any > alternative to this. > > Rainer > > ___ > Syslog mailing list > Syslog@lists.ietf.org > https://www1.ietf.org/mailman/listinfo/syslog > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris, I think that is a good compromise. It would also enable us to convey the enconding information if we have it (anyhow, in that case it would be more smarter to convert to UTF-8, but that's not yet important). Rainer > -Original Message- > From: Chris Lonvick [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 29, 2005 7:22 PM > To: Rainer Gerhards > Cc: [EMAIL PROTECTED] > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?) > > > Hi Rainer, > > Why don't we look at it from the other direction? We could > state that any > encoding is acceptable - for ease-of-use/migration with > existing syslog > implementations. It is RECOMMENDED that UTF-8 be used. When it is > used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" > lang="en"] > > Thoughts? > > All: Let's discuss this and close this issue. > > Thanks, > Chris > > On Tue, 29 Nov 2005, Rainer Gerhards wrote: > > > Chris & WG, > > > >>> #5 Character encoding in MSG: due to my proof-of-concept > >>> implementation, I have raised the (ugly) question if we need > >>> to allow encodings other than UTF-8. Please note that this > >>> question arises from needs introduced by e.g. POSIX. So we > >>> can't easily argue them away by whishful thinking ;) > >>> > >>> Not even discussed yet. > >> > >> I haven't reviewed that yet. However, I'll note that allowing > >> different encoding can be accomplished in the future as long as we > >> establish a default encoding and a way to identify it in > our current > >> work. > > > > I have read a little in the mailing archive. Please note > that in 2000 > > it was consensus that the MSG part may contain encodings other then > > US-ASCII. Follow this threat: > > > > http://www.syslog.cc/ietf/autoarc/msg00127.html > > > > This discussion lead to RFC 3164 saying "other encodings > MAY be used". > > While this was observed behaviour, we need still to be > aware that the > > POSIX (and glibc) API places the restrictions on us that we > simply do > > not know the character encoding used by the application. As > such, no > > *nix syslogd can be programmed to be compliant to > syslog-protocol if > > we demand UTF-8 exclusively. > > > > I propose that we RECOMMEND UTF-8 that MUST start with the Unicode > > Byte Order Mask (BOM) if used. If the MSG part does not > start with the > > BOM, it may be any encoding just as in RFC 3164. I do not see any > > alternative to this. > > > > Rainer > > > > ___ > > Syslog mailing list > > Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog > > > ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
Re: [Syslog] #5 - character encoding (was: Consensus?)
> Hi Rainer, > > Why don't we look at it from the other direction? We could state that any > encoding is acceptable - for ease-of-use/migration with existing syslog > implementations. It is RECOMMENDED that UTF-8 be used. When it is > used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] > > Thoughts? I think this is a very sensible approach. Darren ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer, Why don't we look at it from the other direction? We could state that any encoding is acceptable - for ease-of-use/migration with existing syslog implementations. It is RECOMMENDED that UTF-8 be used. When it is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8" lang="en"] Thoughts? All: Let's discuss this and close this issue. Thanks, Chris On Tue, 29 Nov 2005, Rainer Gerhards wrote: Chris & WG, #5 Character encoding in MSG: due to my proof-of-concept implementation, I have raised the (ugly) question if we need to allow encodings other than UTF-8. Please note that this question arises from needs introduced by e.g. POSIX. So we can't easily argue them away by whishful thinking ;) Not even discussed yet. I haven't reviewed that yet. However, I'll note that allowing different encoding can be accomplished in the future as long as we establish a default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying "other encodings MAY be used". While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
RE: [Syslog] #5 - character encoding (was: Consensus?)
Chris & WG, > > #5 Character encoding in MSG: due to my proof-of-concept > > implementation, I have raised the (ugly) question if we need > > to allow encodings other than UTF-8. Please note that this > > question arises from needs introduced by e.g. POSIX. So we > > can't easily argue them away by whishful thinking ;) > > > > Not even discussed yet. > > I haven't reviewed that yet. However, I'll note that > allowing different > encoding can be accomplished in the future as long as we establish a > default encoding and a way to identify it in our current work. I have read a little in the mailing archive. Please note that in 2000 it was consensus that the MSG part may contain encodings other then US-ASCII. Follow this threat: http://www.syslog.cc/ietf/autoarc/msg00127.html This discussion lead to RFC 3164 saying "other encodings MAY be used". While this was observed behaviour, we need still to be aware that the POSIX (and glibc) API places the restrictions on us that we simply do not know the character encoding used by the application. As such, no *nix syslogd can be programmed to be compliant to syslog-protocol if we demand UTF-8 exclusively. I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte Order Mask (BOM) if used. If the MSG part does not start with the BOM, it may be any encoding just as in RFC 3164. I do not see any alternative to this. Rainer ___ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog