Re: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Tom Petch
Not sure which bits of MIME you have in mind but I like the term
Content-Transfer-Encoding, I like the list of such encodings, I like the list of
charsets and I like the way that the user/application gets to choose a suitable
delimiter for the various parts rather than have the protocol designer impose an
unsuitable one.  What don't I like? the term charset, but I think that is too
well embedded to avoid.

Oh and I love the way that the protocol proper is ASCII encoded, so easy to read
and display unlike markup languages, binary encodings etc etc

And I think its guidance about what to do when fields are absent or corrupt is
good, leading to a good chance of interoperability.

Tom Petch
- Original Message -
From: "Rainer Gerhards" <[EMAIL PROTECTED]>
To: "Tom Petch" <[EMAIL PROTECTED]>; "Chris Lonvick"
<[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, December 01, 2005 2:25 PM
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)


Well... let me rephrase it slightly ;) After -protocol is finished, we could
actually do something like syslog-mime, which could then describe this as an
optional feature. Might even not be as crazy as it sounds - at least if I look
what has been suggested so far. syslog-mime might be a solution for some of
these needs

Rainer

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards
> Sent: Thursday, December 01, 2005 12:45 PM
> To: Tom Petch; Chris Lonvick
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
>
> Tom, WG
>
> I am *not* kidding. If we go for an encoding header, why not
> use MIME?
>
> Rainer
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch
> > Sent: Thursday, December 01, 2005 10:05 AM
> > To: Chris Lonvick
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?)
> >
> > - Original Message -
> > From: "Chris Lonvick" <[EMAIL PROTECTED]>
> > To: "Rainer Gerhards" <[EMAIL PROTECTED]>
> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > Sent: Wednesday, November 30, 2005 2:18 PM
> > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> >
> > > Hi Rainer,
> > >
> > > Let's use this email as an example.  :)  There is no
> > indication that I'm
> > > using US-ASCII encoding or that I'm writing in English.
> >
> > Actually, Chris, there is; when I receive this e-mail, the
> > header contains
> >
> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> >
> > so implicitly or explicitly you are telling me it is US-ASCII
> >
> > By contrast, e-mails from Rainer, contain
> >
> > Content-Type: text/plain; charset="iso-8859-1"
> > Content-Transfer-Encoding: quoted-printable
> >
> > This reply is in charset="iso-8859-1" but by default, my
> > Windows MUA replies in
> > the charset the message came in, so that
> > replying to you I could not then spell Müller properly which
> > I could do to
> > Rainer.  On the
> > other hand, when replying to you, Windows inserts > to denote
> > the incoming text
> > which it suppresses when I reply to Rainer.  And some e-mails
> > I receive are not
> > in US-ASCII but lack the charset= in which case the display
> > on screen is
> > somewhat or totally corrupted.
> >
> > So MIME does an ok job but can be fooled by the rest of the
> > system; if we can do
> > that well with syslog, we should be proud of ourselves.
> >
> > Tom Petch
> >
> >
> > ___
> > Syslog mailing list
> > Syslog@lists.ietf.org
> > https://www1.ietf.org/mailman/listinfo/syslog
> >
>
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
>


___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread David B Harrington
I suggest including wording to the effect

"if no SD-ID encoding element is specified, then the encoding of the
content is implementation specific and it is RECOMMENDED that no
assumption be made about the encoding of the content." 

dbh

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards
> Sent: Wednesday, November 30, 2005 6:24 AM
> To: [EMAIL PROTECTED]; Chris Lonvick
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> Andrew,
> 
> > >Hi Rainer,
> > >
> > >Why don't we look at it from the other direction?  We could 
> > state that any 
> > >encoding is acceptable - for ease-of-use/migration with 
> > existing syslog 
> > >implementations.  It is RECOMMENDED that UTF-8 be used.  
> When it is 
> > >used, an SD-ID element will be REQUIRED.  e.g. - 
> > [enc="utf-8" lang="en"]
> > 
> > I like that idea too.
> > 
> > So, if no SD-ID encoding element is specified, then we must 
> > assume US-ASCII
> > and deal with it accordingly??
> 
> I think not. If it is not present, we known that we do not know it.
If
> it is US-ASCII, I would expect something like
> 
> [enc="us-ascii" lang="en"]
> 
> Of course, we could also say if it is non-present, we can assume
> US-ASCII. But then we would need to introduce
> 
> [enc="unknown"]
> 
> for the (common) case where we simply do not know it (again: think
> POSIX). I find this somehwat confusing.
> 
> Rainer
> 
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
> 



___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Rainer Gerhards
Well... let me rephrase it slightly ;) After -protocol is finished, we could 
actually do something like syslog-mime, which could then describe this as an 
optional feature. Might even not be as crazy as it sounds - at least if I look 
what has been suggested so far. syslog-mime might be a solution for some of 
these needs

Rainer 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards
> Sent: Thursday, December 01, 2005 12:45 PM
> To: Tom Petch; Chris Lonvick
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> Tom, WG
> 
> I am *not* kidding. If we go for an encoding header, why not 
> use MIME? 
> 
> Rainer 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch
> > Sent: Thursday, December 01, 2005 10:05 AM
> > To: Chris Lonvick
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [Syslog] #5 - character encoding (was: Consensus?)
> > 
> > - Original Message -
> > From: "Chris Lonvick" <[EMAIL PROTECTED]>
> > To: "Rainer Gerhards" <[EMAIL PROTECTED]>
> > Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > Sent: Wednesday, November 30, 2005 2:18 PM
> > Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> > 
> > > Hi Rainer,
> > >
> > > Let's use this email as an example.  :)  There is no 
> > indication that I'm
> > > using US-ASCII encoding or that I'm writing in English.
> > 
> > Actually, Chris, there is; when I receive this e-mail, the 
> > header contains
> > 
> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> > 
> > so implicitly or explicitly you are telling me it is US-ASCII
> > 
> > By contrast, e-mails from Rainer, contain
> > 
> > Content-Type: text/plain; charset="iso-8859-1"
> > Content-Transfer-Encoding: quoted-printable
> > 
> > This reply is in charset="iso-8859-1" but by default, my 
> > Windows MUA replies in
> > the charset the message came in, so that
> > replying to you I could not then spell Müller properly which 
> > I could do to
> > Rainer.  On the
> > other hand, when replying to you, Windows inserts > to denote 
> > the incoming text
> > which it suppresses when I reply to Rainer.  And some e-mails 
> > I receive are not
> > in US-ASCII but lack the charset= in which case the display 
> > on screen is
> > somewhat or totally corrupted.
> > 
> > So MIME does an ok job but can be fooled by the rest of the 
> > system; if we can do
> > that well with syslog, we should be proud of ourselves.
> > 
> > Tom Petch
> > 
> > 
> > ___
> > Syslog mailing list
> > Syslog@lists.ietf.org
> > https://www1.ietf.org/mailman/listinfo/syslog
> > 
> 
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Rainer Gerhards
Tom,

I apprecite your point. My intension is:

-15 specifies that MSG must contain UTF-8 encoding exclusively (full character 
set). During implementation, I have seen that I can not obtain the encoding 
information for to-be-sent messages under Unix. In the mean time, Balazs 
Scheidler has suggest a potential way to do that, then this would probably be a 
no-issue. For the time being, let's assume it can not be obtained. In many 
cases, I have different encodings, like ISO 8859-1 or EUC, at least in parts of 
the message. As I do not know the encoding, I can not properly convert it to 
Unicode. So the syslogd would send a non-compliant message. As there is a high 
chance of invalid UTF-8 sequences (as it is no UTF-8), a compliant receiving 
syslogd must drop this message because it is invalid. My point here is that I 
am not really interested if the sending syslogd is to blame or not. My point is 
that the message can not be received.

My proposal was to recommend UTF-8 whenever possible, but allow MSGs with 
unknown encoding when we can not obtain encoding information. To differentiate, 
I suggested the the Unicode BOM is used if it is UTF-8. Though there might 
still be small window of misinterpretation, I'd expect that a UTF-8 encoded BOM 
is very unlikely to appear in the first three octests of an ordinary syslog 
message. I'd found this easy and acceptable.

If the syslogd reliably can obtain the provided encoding - as Balazs thankfully 
mentioned - we could stick with UTF-8 only, as it now would be no issue. The 
only issue eventually present in it would be if we could expect implementors to 
implement a converter for any given character set to Unicode - but that's a 
different story.

The ever-changing fragile WG consensus at this time of the year seems to be 
that we are back to supporting all possible encodings to address the need I 
mentioned. While I do not really like this approach, it will allow me to do 
what I need to do. So I do not object it.

I agree with you that we should not try to focus too much on backwards 
compatibility. But on the other hand, Vancouver told us people would like to 
see it. The list then said "oh no". A few days later we have multiple voices 
saying we must support this and that. I have to admit that I loose sense of 
stable consensus the longer I discuss this now.

For me, I have decided to only voice my concerns if I believe something will be 
broken. Field order, field semantics and a lot of the other issues currently 
being re-re-re-re-considered are not really that important. Even if we end up 
with something totally horrible, I am sure it is possible to program a parser 
that handles it. After all, our parsers handle todays syslog - can it really 
become worse? I think there would be huge value in a syslog standard, no matter 
how ugly the details may look to some of us. After all, beauty is a very 
subjective concept ;)

I hope I have been able to convey my root concern on the encoding. On the other 
issues, I am waiting for WG consensus to be declared and then I will include 
that consensus, whatever it is, into the I-D. I just hope it'll stay stable 
long enough so that the I-D can proceed...

Rainer

> -Original Message-
> From: Tom Petch [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, December 01, 2005 9:25 AM
> To: Rainer Gerhards; Chris Lonvick
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Syslog] #5 - character encoding (was: Consensus?)
> 
> Rainer
> 
> I think I detect an approach I do not agree with, in this and 
> perhaps other
> issues.
> 
> You seem to be saying that the (eg POSIX) syslogd must emit 
> perfect syslog
> messages and is responsible for anything that is wrong with 
> them no matter what
> it received from the application (I exaggerate slightly).
> 
> I would say that if the application passes incomprehensible 
> garbage, something
> criminal or illegal, then it is the application that is at 
> fault; syslogd can
> only be held responsible if it produces messages that are 
> invalid for the parts
> over which it has control, eg header syntax.
> 
> So if syslogd has no idea what the transfer encoding is 
> because the rest of the
> system does not tell it, then syslogd cannot be held 
> responsible for the absence
> of a field saying what the transfer encoding actually is.  Or 
> put differently,
> if our RFC specify what the application MUST or SHOULD do, as 
> well as syslogd,
> then that is ok with me.
> 
> What syslogd would be responsible for, IMO, would be allowing 
> characters that
> have a special meaning in the syntax (eg NUL is end of 
> message) appearing
> unescaped (or otherwise encoded).  Whether we have such 
> problems depends on the
> resolution of other issues, not saying that we have at present.
> 
> Tom Petch
> 
> - Original Messag

RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Rainer Gerhards
Tom, WG

I am *not* kidding. If we go for an encoding header, why not use MIME? 

Rainer 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch
> Sent: Thursday, December 01, 2005 10:05 AM
> To: Chris Lonvick
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Syslog] #5 - character encoding (was: Consensus?)
> 
> - Original Message -
> From: "Chris Lonvick" <[EMAIL PROTECTED]>
> To: "Rainer Gerhards" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Wednesday, November 30, 2005 2:18 PM
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> > Hi Rainer,
> >
> > Let's use this email as an example.  :)  There is no 
> indication that I'm
> > using US-ASCII encoding or that I'm writing in English.
> 
> Actually, Chris, there is; when I receive this e-mail, the 
> header contains
> 
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> 
> so implicitly or explicitly you are telling me it is US-ASCII
> 
> By contrast, e-mails from Rainer, contain
> 
> Content-Type: text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> 
> This reply is in charset="iso-8859-1" but by default, my 
> Windows MUA replies in
> the charset the message came in, so that
> replying to you I could not then spell Müller properly which 
> I could do to
> Rainer.  On the
> other hand, when replying to you, Windows inserts > to denote 
> the incoming text
> which it suppresses when I reply to Rainer.  And some e-mails 
> I receive are not
> in US-ASCII but lack the charset= in which case the display 
> on screen is
> somewhat or totally corrupted.
> 
> So MIME does an ok job but can be fooled by the rest of the 
> system; if we can do
> that well with syslog, we should be proud of ourselves.
> 
> Tom Petch
> 
> 
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


Re: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Tom Petch
- Original Message -
From: "Chris Lonvick" <[EMAIL PROTECTED]>
To: "Rainer Gerhards" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 30, 2005 2:18 PM
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)

> Hi Rainer,
>
> Let's use this email as an example.  :)  There is no indication that I'm
> using US-ASCII encoding or that I'm writing in English.

Actually, Chris, there is; when I receive this e-mail, the header contains

Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

so implicitly or explicitly you are telling me it is US-ASCII

By contrast, e-mails from Rainer, contain

Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

This reply is in charset="iso-8859-1" but by default, my Windows MUA replies in
the charset the message came in, so that
replying to you I could not then spell Müller properly which I could do to
Rainer.  On the
other hand, when replying to you, Windows inserts > to denote the incoming text
which it suppresses when I reply to Rainer.  And some e-mails I receive are not
in US-ASCII but lack the charset= in which case the display on screen is
somewhat or totally corrupted.

So MIME does an ok job but can be fooled by the rest of the system; if we can do
that well with syslog, we should be proud of ourselves.

Tom Petch


___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


Re: [Syslog] #5 - character encoding (was: Consensus?)

2005-12-01 Thread Tom Petch
Rainer

I think I detect an approach I do not agree with, in this and perhaps other
issues.

You seem to be saying that the (eg POSIX) syslogd must emit perfect syslog
messages and is responsible for anything that is wrong with them no matter what
it received from the application (I exaggerate slightly).

I would say that if the application passes incomprehensible garbage, something
criminal or illegal, then it is the application that is at fault; syslogd can
only be held responsible if it produces messages that are invalid for the parts
over which it has control, eg header syntax.

So if syslogd has no idea what the transfer encoding is because the rest of the
system does not tell it, then syslogd cannot be held responsible for the absence
of a field saying what the transfer encoding actually is.  Or put differently,
if our RFC specify what the application MUST or SHOULD do, as well as syslogd,
then that is ok with me.

What syslogd would be responsible for, IMO, would be allowing characters that
have a special meaning in the syntax (eg NUL is end of message) appearing
unescaped (or otherwise encoded).  Whether we have such problems depends on the
resolution of other issues, not saying that we have at present.

Tom Petch

- Original Message -
From: "Rainer Gerhards" <[EMAIL PROTECTED]>
To: "Chris Lonvick" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, November 30, 2005 2:48 PM
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)


Chris,

I fully agree - thanks ;)

Rainer

> -Original Message-
> From: Chris Lonvick [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 30, 2005 2:39 PM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
>
> Hi Rainer,
>
> I believe that we are saying the same thing.  :)
>
> If there is no indicator of encoding or language then a
> reciever will not
> know what it is receiving - just like receivers don't know
> what they are
> receiving today.  They MAY make an assumption that it is something in
> US-ASCII (but may be disappointed).
>
> If there is an indicator of the encoding and language then
> the receiver
> will know exactly what it is.  Having an indicator should be
> RECOMMENDED
> but not REQUIRED for ease of migration.
>
> Is that what we're all saying?
>
> Thanks,
> Chris
>
>
>
> On Wed, 30 Nov 2005, Rainer Gerhards wrote:
>
> > Chris,
> >
> >> Let's use this email as an example.  :)  There is no
> >> indication that I'm
> >> using US-ASCII encoding or that I'm writing in English.
> >
> > I think there actually is. If I am right, the SMTP RFCs
> require mail text to be US-ASCII. Only via MIME and/or escape
> characters you can include 8-bit data. For example Müller and
> Möller might create some problems in some mailers (But I
> guess my Mail system will encode them with =).
> Dropping messages with octets > 127 in the subject is a
> common spam protection setting...
> >
> >> However, you're
> >> able to recieve this and read it.  Similarly, you could write
> >> an email in
> >> German and send it to me.  I would still be able to recieve
> >> it but I'd
> >> have a difficult time parsing the meaning.
> >>
> >> I'm suggesting that same approach for the transmission of
> the syslog
> >> content.  If I really wanted you to know what encoding and
> >> language I'm
> >> using in an email, I would specify a mime header.  syslog
> >> senders will
> >> continue to pump out whatever encoding and language they've
> >> been using
> >> and recievers will continue to do their best to parse them.
> >> If a vendor
> >> wants to get very specific about that, then they will have to
> >> use an SD-ID
> >> to identify the contents of the message.
> >
> > Here I agree with you. What I was saying is that IF the
> header says it is US-ASCII, only then we should assume it
> actually is. If there is no "enc" SD-ID, then we do not know
> what it is but can assume ... whatever we assume. Let me
> phrase it that way:
> >
> > If the message contains
> >
> > [enc="us-ascii" lang="en"]
> >
> > then the receiver can honestly expect it to be US-ASCII.
> But if it does not contain any "enc" the receiver does not
> know exactly and assume anything it finds useful (may be
> ASCII, may not).
> >
> > Does this clarify? I somehow have the impression we mean
> the same thing and I simply do not manage to c

[Fwd: RE: [Syslog] #5 - character encoding (was: Consensus?)]

2005-12-01 Thread Balazs Scheidler
Missed reply-all...


 Forwarded Message 
> From: Balazs Scheidler <[EMAIL PROTECTED]>
> To: Rainer Gerhards <[EMAIL PROTECTED]>
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> Date: Thu, 01 Dec 2005 10:55:42 +0100
> 
> On Wed, 2005-11-30 at 09:01 +0100, Rainer Gerhards wrote:
> > Sheran, 
> > 
> > > Also want to clarify that you suggest that if the message is in ASCII,
> > > it will not required SD-ID, but for all other encodings, SD-ID will be
> > > required.
> > 
> > Unfortunately, we can not do this. If we would know the encoding, we
> > could translate it to UTF-8, as so far is required by syslog-protocol.
> > However, we often do not know which encoding it is. The reason is that
> > the POSIX syslog API does not tell us. So if we want to support POSIX
> > (which I think we must), we must allow a syslog sender to send messages
> > without telling the encoding - simply because it has no way to obtain
> > that knowledge.
> > 
> > A syslog sender embedded e.g. in a device does probably not have this
> > restriction. So it SHOULD encode in UTF-8. That will ensure the receiver
> > can understand it. If the sender has absolutely no idea of how to do
> > that, but knows the encoding, then (and only then) it SHOULD specify the
> > encoding.
> 
> Just a small note, there is a way in the syslog() libc function to
> recover current encoding information based on the contents of the
> LC_CTYPE (or LANG) environment variable. So although the API does not
> explicitly contain parameters to specify encoding, the program
> environment contains this information. You are right that the standard
> POSIX API without any changes will send unfiltered/unconverted strings
> to syslog without any encoding information, but it is not impossible to
> create a replacement for syslog(3) that actually delivers this
> information while staying compatible with the POSIX API.
> 
> The way I see it:
> - have the SD-ID to specify encoding and use that if available
> - if there is no SD-ID (legacy applications) then assume US-ASCII and
> let the administrator override this on a per-source basis (using a
> SHOULD clause)
> - implementation SHOULD validate (and possibly convert) incoming
> messages and SHOULD allow the administrator to choose what to do with
> non-conforming characters (drop, substitute, leave it as is)
> 
> 
-- 
Bazsi


___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Shyyunn Lin \(sheranl\)
Chris:

I agree with all your points. Recommend an encoding and standard lang
tag, and accept all other encoding and lang specification.

Regards,
 
Sheran

-Original Message-
From: Chris Lonvick (clonvick) 
Sent: Wednesday, November 30, 2005 5:06 AM
To: Shyyunn Lin (sheranl)
Cc: [EMAIL PROTECTED]
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)

Hi Sheran,

On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote:

> Chris:
>
> I think having SD-ID with [enc="utf-8" lang="English"] may be a good 
> approach. If different language use utf-8 encoding, then "lang=" can 
> distinguish it.

We _should_ be using language codes from RFC 3066.  That specifies ISO
639 language tags.  639-1 has 2 character codes ("en" is English) and
639-2 has 3 characters ("eng" is English).  RFC 3066 will likely be
replaced by the works of the Language Tag Registry Update (ltru) Working
Group.
   http://www.ietf.org/html.charters/ltru-charter.html
They have IDs in the works.  Until those become RFCs we should continue
to reference RFC 3066.

>
> Also want to clarify that you suggest that if the message is in ASCII,

> it will not required SD-ID, but for all other encodings, SD-ID will be

> required.

Yes - that's my suggestion.

>
> Note most other encoding methods already imply the language used, for 
> example, in Chinese, there are several encoding methods, Traditional 
> Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese 
> used in Mainland China is GBK, so if the message is in traditional 
> Chinese char, it will be shown as [enc="Big5", lang="Traditional 
> Chinese"], a little bit redundant. The Big5 also includes all English 
> char so it can be a mix of Chinese and English.

Good point.  As far as I can tell, "Big5" is not recognized by any
accredited standards developing organization.  It is recognized by the
Ideographic Rapporteur Group (IRG) which reports to the Unicode
consortium.  The recognized way to represent Chinese characters,
traditional and simplified, is through ISO 639-2 with the subcodes to
indicate traditional and simplified for the "zh" _language_.  The ID on
"Tags for Identifying Languages"

   http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt

identifies simplified Chinese as "zh-Hans" and traditional Chinese as
"zh-Hant".  Additional subtags could identify a locale such as
"zh-Hant-TW" for Taiwan Chinese in traditional script.  This is from the
"Initial Language Subtag Registry" ID.

http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt

I think that we should specify encoding and language tags as
striaghtforward as possible and let others augment syslog-protocol (in
the
future) with other encoding mechanisms.  We can RECOMMEND that encoding
be in UTF-8 and language tags come from RFC 3066.  We can allow that
other encoding and language identifications are acceptable.  In the
worst case, a vendor will have the option of [EMAIL PROTECTED]"something"
[EMAIL PROTECTED]"piglatin"].

Does this work for you?

Thanks,
Chris

>
>
>
> Regards,
>
> Sheran
>
> -Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
> (clonvick)
> Sent: Tuesday, November 29, 2005 10:22 AM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
>
> Hi Rainer,
>
> Why don't we look at it from the other direction?  We could state that

> any encoding is acceptable - for ease-of-use/migration with existing 
> syslog implementations.  It is RECOMMENDED that UTF-8 be used.  When 
> it is used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8"
> lang="en"]
>
> Thoughts?
>
> All:  Let's discuss this and close this issue.
>
> Thanks,
> Chris
>
> On Tue, 29 Nov 2005, Rainer Gerhards wrote:
>
>> Chris & WG,
>>
>>>> #5 Character encoding in MSG: due to my proof-of-concept
>>>>   implementation, I have raised the (ugly) question if we need
>>>>   to allow encodings other than UTF-8. Please note that this
>>>>   question arises from needs introduced by e.g. POSIX. So we
>>>>   can't easily argue them away by whishful thinking ;)
>>>>
>>>> Not even discussed yet.
>>>
>>> I haven't reviewed that yet.  However, I'll note that allowing 
>>> different encoding can be accomplished in the future as long as we 
>>> establish a default encoding and a way to identify it in our current

>>> work.
>>
>> I have read a little in the mailing archive. Please note that in 2000

RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Rainer Gerhards
Chris,

I fully agree - thanks ;)

Rainer 

> -Original Message-
> From: Chris Lonvick [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, November 30, 2005 2:39 PM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> Hi Rainer,
> 
> I believe that we are saying the same thing.  :)
> 
> If there is no indicator of encoding or language then a 
> reciever will not 
> know what it is receiving - just like receivers don't know 
> what they are 
> receiving today.  They MAY make an assumption that it is something in 
> US-ASCII (but may be disappointed).
> 
> If there is an indicator of the encoding and language then 
> the receiver 
> will know exactly what it is.  Having an indicator should be 
> RECOMMENDED 
> but not REQUIRED for ease of migration.
> 
> Is that what we're all saying?
> 
> Thanks,
> Chris
> 
> 
> 
> On Wed, 30 Nov 2005, Rainer Gerhards wrote:
> 
> > Chris,
> >
> >> Let's use this email as an example.  :)  There is no
> >> indication that I'm
> >> using US-ASCII encoding or that I'm writing in English.
> >
> > I think there actually is. If I am right, the SMTP RFCs 
> require mail text to be US-ASCII. Only via MIME and/or escape 
> characters you can include 8-bit data. For example Müller and 
> Möller might create some problems in some mailers (But I 
> guess my Mail system will encode them with =). 
> Dropping messages with octets > 127 in the subject is a 
> common spam protection setting...
> >
> >> However, you're
> >> able to recieve this and read it.  Similarly, you could write
> >> an email in
> >> German and send it to me.  I would still be able to recieve
> >> it but I'd
> >> have a difficult time parsing the meaning.
> >>
> >> I'm suggesting that same approach for the transmission of 
> the syslog
> >> content.  If I really wanted you to know what encoding and
> >> language I'm
> >> using in an email, I would specify a mime header.  syslog
> >> senders will
> >> continue to pump out whatever encoding and language they've
> >> been using
> >> and recievers will continue to do their best to parse them.
> >> If a vendor
> >> wants to get very specific about that, then they will have to
> >> use an SD-ID
> >> to identify the contents of the message.
> >
> > Here I agree with you. What I was saying is that IF the 
> header says it is US-ASCII, only then we should assume it 
> actually is. If there is no "enc" SD-ID, then we do not know 
> what it is but can assume ... whatever we assume. Let me 
> phrase it that way:
> >
> > If the message contains
> >
> > [enc="us-ascii" lang="en"]
> >
> > then the receiver can honestly expect it to be US-ASCII. 
> But if it does not contain any "enc" the receiver does not 
> know exactly and assume anything it finds useful (may be 
> ASCII, may not).
> >
> > Does this clarify? I somehow have the impression we mean 
> the same thing and I simply do not manage to convey what I 
> intend to ;)
> >
> > Rainer
> >
> >>
> >> Mit Aufrichtigkeit,
> >> Chris
> >>
> >>
> >>
> >>
> >> On Wed, 30 Nov 2005, Rainer Gerhards wrote:
> >>
> >>> Andrew,
> >>>
> >>>>> Hi Rainer,
> >>>>>
> >>>>> Why don't we look at it from the other direction?  We could
> >>>> state that any
> >>>>> encoding is acceptable - for ease-of-use/migration with
> >>>> existing syslog
> >>>>> implementations.  It is RECOMMENDED that UTF-8 be used.
> >> When it is
> >>>>> used, an SD-ID element will be REQUIRED.  e.g. -
> >>>> [enc="utf-8" lang="en"]
> >>>>
> >>>> I like that idea too.
> >>>>
> >>>> So, if no SD-ID encoding element is specified, then we must
> >>>> assume US-ASCII
> >>>> and deal with it accordingly??
> >>>
> >>> I think not. If it is not present, we known that we do not
> >> know it. If
> >>> it is US-ASCII, I would expect something like
> >>>
> >>> [enc="us-ascii" lang="en"]
> >>>
> >>> Of course, we could also say if it is non-present, we can assume
> >>> US-ASCII. But then we would need to introduce
> >>>
> >>> [enc="unknown"]
> >>>
> >>> for the (common) case where we simply do not know it (again: think
> >>> POSIX). I find this somehwat confusing.
> >>>
> >>> Rainer
> >>>
> >>
> >
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Chris Lonvick

Hi Rainer,

I believe that we are saying the same thing.  :)

If there is no indicator of encoding or language then a reciever will not 
know what it is receiving - just like receivers don't know what they are 
receiving today.  They MAY make an assumption that it is something in 
US-ASCII (but may be disappointed).


If there is an indicator of the encoding and language then the receiver 
will know exactly what it is.  Having an indicator should be RECOMMENDED 
but not REQUIRED for ease of migration.


Is that what we're all saying?

Thanks,
Chris



On Wed, 30 Nov 2005, Rainer Gerhards wrote:


Chris,


Let's use this email as an example.  :)  There is no
indication that I'm
using US-ASCII encoding or that I'm writing in English.


I think there actually is. If I am right, the SMTP RFCs require mail text to be 
US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example 
Müller and Möller might create some problems in some mailers (But I guess my Mail system 
will encode them with =). Dropping messages with octets > 127 in the 
subject is a common spam protection setting...


However, you're
able to recieve this and read it.  Similarly, you could write
an email in
German and send it to me.  I would still be able to recieve
it but I'd
have a difficult time parsing the meaning.

I'm suggesting that same approach for the transmission of the syslog
content.  If I really wanted you to know what encoding and
language I'm
using in an email, I would specify a mime header.  syslog
senders will
continue to pump out whatever encoding and language they've
been using
and recievers will continue to do their best to parse them.
If a vendor
wants to get very specific about that, then they will have to
use an SD-ID
to identify the contents of the message.


Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only 
then we should assume it actually is. If there is no "enc" SD-ID, then we do 
not know what it is but can assume ... whatever we assume. Let me phrase it that way:

If the message contains

[enc="us-ascii" lang="en"]

then the receiver can honestly expect it to be US-ASCII. But if it does not contain any 
"enc" the receiver does not know exactly and assume anything it finds useful 
(may be ASCII, may not).

Does this clarify? I somehow have the impression we mean the same thing and I 
simply do not manage to convey what I intend to ;)

Rainer



Mit Aufrichtigkeit,
Chris




On Wed, 30 Nov 2005, Rainer Gerhards wrote:


Andrew,


Hi Rainer,

Why don't we look at it from the other direction?  We could

state that any

encoding is acceptable - for ease-of-use/migration with

existing syslog

implementations.  It is RECOMMENDED that UTF-8 be used.

When it is

used, an SD-ID element will be REQUIRED.  e.g. -

[enc="utf-8" lang="en"]

I like that idea too.

So, if no SD-ID encoding element is specified, then we must
assume US-ASCII
and deal with it accordingly??


I think not. If it is not present, we known that we do not

know it. If

it is US-ASCII, I would expect something like

[enc="us-ascii" lang="en"]

Of course, we could also say if it is non-present, we can assume
US-ASCII. But then we would need to introduce

[enc="unknown"]

for the (common) case where we simply do not know it (again: think
POSIX). I find this somehwat confusing.

Rainer



___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Rainer Gerhards
Chris,

> Let's use this email as an example.  :)  There is no 
> indication that I'm 
> using US-ASCII encoding or that I'm writing in English.  

I think there actually is. If I am right, the SMTP RFCs require mail text to be 
US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. 
For example Müller and Möller might create some problems in some mailers (But I 
guess my Mail system will encode them with =). Dropping messages with 
octets > 127 in the subject is a common spam protection setting...

> However, you're 
> able to recieve this and read it.  Similarly, you could write 
> an email in 
> German and send it to me.  I would still be able to recieve 
> it but I'd 
> have a difficult time parsing the meaning.
> 
> I'm suggesting that same approach for the transmission of the syslog 
> content.  If I really wanted you to know what encoding and 
> language I'm 
> using in an email, I would specify a mime header.  syslog 
> senders will 
> continue to pump out whatever encoding and language they've 
> been using 
> and recievers will continue to do their best to parse them.  
> If a vendor 
> wants to get very specific about that, then they will have to 
> use an SD-ID 
> to identify the contents of the message.

Here I agree with you. What I was saying is that IF the header says it is 
US-ASCII, only then we should assume it actually is. If there is no "enc" 
SD-ID, then we do not know what it is but can assume ... whatever we assume. 
Let me phrase it that way:

If the message contains

[enc="us-ascii" lang="en"]

then the receiver can honestly expect it to be US-ASCII. But if it does not 
contain any "enc" the receiver does not know exactly and assume anything it 
finds useful (may be ASCII, may not).

Does this clarify? I somehow have the impression we mean the same thing and I 
simply do not manage to convey what I intend to ;)

Rainer

> 
> Mit Aufrichtigkeit,
> Chris
> 
> 
> 
> 
> On Wed, 30 Nov 2005, Rainer Gerhards wrote:
> 
> > Andrew,
> >
> >>> Hi Rainer,
> >>>
> >>> Why don't we look at it from the other direction?  We could
> >> state that any
> >>> encoding is acceptable - for ease-of-use/migration with
> >> existing syslog
> >>> implementations.  It is RECOMMENDED that UTF-8 be used.  
> When it is
> >>> used, an SD-ID element will be REQUIRED.  e.g. -
> >> [enc="utf-8" lang="en"]
> >>
> >> I like that idea too.
> >>
> >> So, if no SD-ID encoding element is specified, then we must
> >> assume US-ASCII
> >> and deal with it accordingly??
> >
> > I think not. If it is not present, we known that we do not 
> know it. If
> > it is US-ASCII, I would expect something like
> >
> > [enc="us-ascii" lang="en"]
> >
> > Of course, we could also say if it is non-present, we can assume
> > US-ASCII. But then we would need to introduce
> >
> > [enc="unknown"]
> >
> > for the (common) case where we simply do not know it (again: think
> > POSIX). I find this somehwat confusing.
> >
> > Rainer
> >
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Rainer Gerhards
Chris,

I agree to all but one point - only that one quoted here...


> > Also want to clarify that you suggest that if the message 
> is in ASCII,
> > it will not required SD-ID, but for all other encodings, 
> SD-ID will be
> > required.
> 
> Yes - that's my suggestion.

I am sorry, we can not do this.  The whole issue is rooted in POSIX
APIs. You need to look at it why it is such a problem. On Windows, you
know what character encodings you are dealing with. On Unix, you
actually just get a bunch of octets - and nobody tells you what it is.
So the poor Unix syslogd actually has no idea of what it handles and
likewise does not know what to place in that field ;) If it knew it were
this or that encoding, I would be very tempted to request it to convert
to UTF-8. But the need behind this encoding is *NOT* to allow the
multitude of whatever currently is in existence but rather provide a way
to let a syslogd that needs to omit a "bunch of octets" do that.

Does this clarify? I can provide code if that would be helpful...

Rainer

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Chris Lonvick

Hi Rainer,

Let's use this email as an example.  :)  There is no indication that I'm 
using US-ASCII encoding or that I'm writing in English.  However, you're 
able to recieve this and read it.  Similarly, you could write an email in 
German and send it to me.  I would still be able to recieve it but I'd 
have a difficult time parsing the meaning.


I'm suggesting that same approach for the transmission of the syslog 
content.  If I really wanted you to know what encoding and language I'm 
using in an email, I would specify a mime header.  syslog senders will 
continue to pump out whatever encoding and language they've been using 
and recievers will continue to do their best to parse them.  If a vendor 
wants to get very specific about that, then they will have to use an SD-ID 
to identify the contents of the message.


Mit Aufrichtigkeit,
Chris




On Wed, 30 Nov 2005, Rainer Gerhards wrote:


Andrew,


Hi Rainer,

Why don't we look at it from the other direction?  We could

state that any

encoding is acceptable - for ease-of-use/migration with

existing syslog

implementations.  It is RECOMMENDED that UTF-8 be used.  When it is
used, an SD-ID element will be REQUIRED.  e.g. -

[enc="utf-8" lang="en"]

I like that idea too.

So, if no SD-ID encoding element is specified, then we must
assume US-ASCII
and deal with it accordingly??


I think not. If it is not present, we known that we do not know it. If
it is US-ASCII, I would expect something like

[enc="us-ascii" lang="en"]

Of course, we could also say if it is non-present, we can assume
US-ASCII. But then we would need to introduce

[enc="unknown"]

for the (common) case where we simply do not know it (again: think
POSIX). I find this somehwat confusing.

Rainer



___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Chris Lonvick

Hi Sheran,

On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote:


Chris:

I think having SD-ID with [enc="utf-8" lang="English"] may be a good
approach. If different language use utf-8 encoding, then "lang=" can
distinguish it.


We _should_ be using language codes from RFC 3066.  That specifies ISO 639 
language tags.  639-1 has 2 character codes ("en" is English) and 639-2 
has 3 characters ("eng" is English).  RFC 3066 will likely be replaced by 
the works of the Language Tag Registry Update (ltru) Working Group.

  http://www.ietf.org/html.charters/ltru-charter.html
They have IDs in the works.  Until those become RFCs we should continue to 
reference RFC 3066.




Also want to clarify that you suggest that if the message is in ASCII,
it will not required SD-ID, but for all other encodings, SD-ID will be
required.


Yes - that's my suggestion.



Note most other encoding methods already imply the language used, for
example, in Chinese, there are several encoding methods, Traditional
Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese
used in Mainland China is GBK, so if the message is in traditional
Chinese char, it will be shown as [enc="Big5", lang="Traditional
Chinese"], a little bit redundant. The Big5 also includes all English
char so it can be a mix of Chinese and English.


Good point.  As far as I can tell, "Big5" is not recognized by any 
accredited standards developing organization.  It is recognized by the 
Ideographic Rapporteur Group (IRG) which reports to the Unicode 
consortium.  The recognized way to represent Chinese characters, 
traditional and simplified, is through ISO 639-2 with the subcodes to 
indicate traditional and simplified for the "zh" _language_.  The ID on 
"Tags for Identifying Languages"


  http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt

identifies simplified Chinese as "zh-Hans" and traditional Chinese as 
"zh-Hant".  Additional subtags could identify a locale such as 
"zh-Hant-TW" for Taiwan Chinese in traditional script.  This is from the 
"Initial Language Subtag Registry" ID.


http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt

I think that we should specify encoding and language tags as 
striaghtforward as possible and let others augment syslog-protocol (in the 
future) with other encoding mechanisms.  We can RECOMMEND that encoding be 
in UTF-8 and language tags come from RFC 3066.  We can allow that other 
encoding and language identifications are acceptable.  In the worst case, 
a vendor will have the option of [EMAIL PROTECTED]"something" [EMAIL PROTECTED]"piglatin"].


Does this work for you?

Thanks,
Chris





Regards,

Sheran

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
(clonvick)
Sent: Tuesday, November 29, 2005 10:22 AM
To: Rainer Gerhards
Cc: [EMAIL PROTECTED]
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)

Hi Rainer,

Why don't we look at it from the other direction?  We could state that
any encoding is acceptable - for ease-of-use/migration with existing
syslog implementations.  It is RECOMMENDED that UTF-8 be used.  When it
is used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8"
lang="en"]

Thoughts?

All:  Let's discuss this and close this issue.

Thanks,
Chris

On Tue, 29 Nov 2005, Rainer Gerhards wrote:


Chris & WG,


#5 Character encoding in MSG: due to my proof-of-concept
  implementation, I have raised the (ugly) question if we need
  to allow encodings other than UTF-8. Please note that this
  question arises from needs introduced by e.g. POSIX. So we
  can't easily argue them away by whishful thinking ;)

Not even discussed yet.


I haven't reviewed that yet.  However, I'll note that allowing
different encoding can be accomplished in the future as long as we
establish a default encoding and a way to identify it in our current
work.


I have read a little in the mailing archive. Please note that in 2000
it was consensus that the MSG part may contain encodings other then
US-ASCII. Follow this threat:

http://www.syslog.cc/ietf/autoarc/msg00127.html

This discussion lead to RFC 3164 saying "other encodings MAY be used".
While this was observed behaviour, we need still to be aware that the
POSIX (and glibc) API places the restrictions on us that we simply do
not know the character encoding used by the application. As such, no
*nix syslogd can be programmed to be compliant to syslog-protocol if
we demand UTF-8 exclusively.

I propose that we RECOMMEND UTF-8 that MUST start with the Unicode
Byte Order Mask (BOM) if used. If the MSG part does not start with the



BOM, it may be any encoding just as in RFC 3164. I do not see any
alternative to this.

Rainer

__

RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Rainer Gerhards
Andrew,

> >Hi Rainer,
> >
> >Why don't we look at it from the other direction?  We could 
> state that any 
> >encoding is acceptable - for ease-of-use/migration with 
> existing syslog 
> >implementations.  It is RECOMMENDED that UTF-8 be used.  When it is 
> >used, an SD-ID element will be REQUIRED.  e.g. - 
> [enc="utf-8" lang="en"]
> 
> I like that idea too.
> 
> So, if no SD-ID encoding element is specified, then we must 
> assume US-ASCII
> and deal with it accordingly??

I think not. If it is not present, we known that we do not know it. If
it is US-ASCII, I would expect something like

[enc="us-ascii" lang="en"]

Of course, we could also say if it is non-present, we can assume
US-ASCII. But then we would need to introduce

[enc="unknown"]

for the (common) case where we simply do not know it (again: think
POSIX). I find this somehwat confusing.

Rainer

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Andrew Ross

>Hi Rainer,
>
>Why don't we look at it from the other direction?  We could state that any 
>encoding is acceptable - for ease-of-use/migration with existing syslog 
>implementations.  It is RECOMMENDED that UTF-8 be used.  When it is 
>used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8" lang="en"]

I like that idea too.

So, if no SD-ID encoding element is specified, then we must assume US-ASCII
and deal with it accordingly??

Cheers

Andrew




___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-30 Thread Rainer Gerhards
Sheran, 

> Also want to clarify that you suggest that if the message is in ASCII,
> it will not required SD-ID, but for all other encodings, SD-ID will be
> required.

Unfortunately, we can not do this. If we would know the encoding, we
could translate it to UTF-8, as so far is required by syslog-protocol.
However, we often do not know which encoding it is. The reason is that
the POSIX syslog API does not tell us. So if we want to support POSIX
(which I think we must), we must allow a syslog sender to send messages
without telling the encoding - simply because it has no way to obtain
that knowledge.

A syslog sender embedded e.g. in a device does probably not have this
restriction. So it SHOULD encode in UTF-8. That will ensure the receiver
can understand it. If the sender has absolutely no idea of how to do
that, but knows the encoding, then (and only then) it SHOULD specify the
encoding.

Rainer

> 
> Note most other encoding methods already imply the language used, for
> example, in Chinese, there are several encoding methods, Traditional
> Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese
> used in Mainland China is GBK, so if the message is in traditional
> Chinese char, it will be shown as [enc="Big5", lang="Traditional
> Chinese"], a little bit redundant. The Big5 also includes all English
> char so it can be a mix of Chinese and English.  
> 
> 
> 
> Regards,
>  
> Sheran
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
> (clonvick)
> Sent: Tuesday, November 29, 2005 10:22 AM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> Hi Rainer,
> 
> Why don't we look at it from the other direction?  We could state that
> any encoding is acceptable - for ease-of-use/migration with existing
> syslog implementations.  It is RECOMMENDED that UTF-8 be 
> used.  When it
> is used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8"
> lang="en"]
> 
> Thoughts?
> 
> All:  Let's discuss this and close this issue.
> 
> Thanks,
> Chris
> 
> On Tue, 29 Nov 2005, Rainer Gerhards wrote:
> 
> > Chris & WG,
> >
> >>> #5 Character encoding in MSG: due to my proof-of-concept
> >>>   implementation, I have raised the (ugly) question if we need
> >>>   to allow encodings other than UTF-8. Please note that this
> >>>   question arises from needs introduced by e.g. POSIX. So we
> >>>   can't easily argue them away by whishful thinking ;)
> >>>
> >>> Not even discussed yet.
> >>
> >> I haven't reviewed that yet.  However, I'll note that allowing 
> >> different encoding can be accomplished in the future as long as we 
> >> establish a default encoding and a way to identify it in 
> our current 
> >> work.
> >
> > I have read a little in the mailing archive. Please note 
> that in 2000 
> > it was consensus that the MSG part may contain encodings other then 
> > US-ASCII. Follow this threat:
> >
> > http://www.syslog.cc/ietf/autoarc/msg00127.html
> >
> > This discussion lead to RFC 3164 saying "other encodings 
> MAY be used".
> > While this was observed behaviour, we need still to be 
> aware that the 
> > POSIX (and glibc) API places the restrictions on us that we 
> simply do 
> > not know the character encoding used by the application. As 
> such, no 
> > *nix syslogd can be programmed to be compliant to 
> syslog-protocol if 
> > we demand UTF-8 exclusively.
> >
> > I propose that we RECOMMEND UTF-8 that MUST start with the Unicode 
> > Byte Order Mask (BOM) if used. If the MSG part does not 
> start with the
> 
> > BOM, it may be any encoding just as in RFC 3164. I do not see any 
> > alternative to this.
> >
> > Rainer
> >
> > ___
> > Syslog mailing list
> > Syslog@lists.ietf.org
> > https://www1.ietf.org/mailman/listinfo/syslog
> >
> 
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
> 
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-29 Thread Shyyunn Lin \(sheranl\)
Chris:

I think having SD-ID with [enc="utf-8" lang="English"] may be a good
approach. If different language use utf-8 encoding, then "lang=" can
distinguish it. 

Also want to clarify that you suggest that if the message is in ASCII,
it will not required SD-ID, but for all other encodings, SD-ID will be
required.

Note most other encoding methods already imply the language used, for
example, in Chinese, there are several encoding methods, Traditional
Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese
used in Mainland China is GBK, so if the message is in traditional
Chinese char, it will be shown as [enc="Big5", lang="Traditional
Chinese"], a little bit redundant. The Big5 also includes all English
char so it can be a mix of Chinese and English.  



Regards,
 
Sheran

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
(clonvick)
Sent: Tuesday, November 29, 2005 10:22 AM
To: Rainer Gerhards
Cc: [EMAIL PROTECTED]
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)

Hi Rainer,

Why don't we look at it from the other direction?  We could state that
any encoding is acceptable - for ease-of-use/migration with existing
syslog implementations.  It is RECOMMENDED that UTF-8 be used.  When it
is used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8"
lang="en"]

Thoughts?

All:  Let's discuss this and close this issue.

Thanks,
Chris

On Tue, 29 Nov 2005, Rainer Gerhards wrote:

> Chris & WG,
>
>>> #5 Character encoding in MSG: due to my proof-of-concept
>>>   implementation, I have raised the (ugly) question if we need
>>>   to allow encodings other than UTF-8. Please note that this
>>>   question arises from needs introduced by e.g. POSIX. So we
>>>   can't easily argue them away by whishful thinking ;)
>>>
>>> Not even discussed yet.
>>
>> I haven't reviewed that yet.  However, I'll note that allowing 
>> different encoding can be accomplished in the future as long as we 
>> establish a default encoding and a way to identify it in our current 
>> work.
>
> I have read a little in the mailing archive. Please note that in 2000 
> it was consensus that the MSG part may contain encodings other then 
> US-ASCII. Follow this threat:
>
> http://www.syslog.cc/ietf/autoarc/msg00127.html
>
> This discussion lead to RFC 3164 saying "other encodings MAY be used".
> While this was observed behaviour, we need still to be aware that the 
> POSIX (and glibc) API places the restrictions on us that we simply do 
> not know the character encoding used by the application. As such, no 
> *nix syslogd can be programmed to be compliant to syslog-protocol if 
> we demand UTF-8 exclusively.
>
> I propose that we RECOMMEND UTF-8 that MUST start with the Unicode 
> Byte Order Mask (BOM) if used. If the MSG part does not start with the

> BOM, it may be any encoding just as in RFC 3164. I do not see any 
> alternative to this.
>
> Rainer
>
> ___
> Syslog mailing list
> Syslog@lists.ietf.org
> https://www1.ietf.org/mailman/listinfo/syslog
>

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-29 Thread Rainer Gerhards
Chris,

I think that is a good compromise. It would also enable us to convey the
enconding information if we have it (anyhow, in that case it would be
more smarter to convert to UTF-8, but that's not yet important).

Rainer

> -Original Message-
> From: Chris Lonvick [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, November 29, 2005 7:22 PM
> To: Rainer Gerhards
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
> 
> 
> Hi Rainer,
> 
> Why don't we look at it from the other direction?  We could 
> state that any 
> encoding is acceptable - for ease-of-use/migration with 
> existing syslog 
> implementations.  It is RECOMMENDED that UTF-8 be used.  When it is 
> used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8" 
> lang="en"]
> 
> Thoughts?
> 
> All:  Let's discuss this and close this issue.
> 
> Thanks,
> Chris
> 
> On Tue, 29 Nov 2005, Rainer Gerhards wrote:
> 
> > Chris & WG,
> >
> >>> #5 Character encoding in MSG: due to my proof-of-concept
> >>>   implementation, I have raised the (ugly) question if we need
> >>>   to allow encodings other than UTF-8. Please note that this
> >>>   question arises from needs introduced by e.g. POSIX. So we
> >>>   can't easily argue them away by whishful thinking ;)
> >>>
> >>> Not even discussed yet.
> >>
> >> I haven't reviewed that yet.  However, I'll note that allowing 
> >> different encoding can be accomplished in the future as long as we 
> >> establish a default encoding and a way to identify it in 
> our current 
> >> work.
> >
> > I have read a little in the mailing archive. Please note 
> that in 2000 
> > it was consensus that the MSG part may contain encodings other then 
> > US-ASCII. Follow this threat:
> >
> > http://www.syslog.cc/ietf/autoarc/msg00127.html
> >
> > This discussion lead to RFC 3164 saying "other encodings 
> MAY be used". 
> > While this was observed behaviour, we need still to be 
> aware that the 
> > POSIX (and glibc) API places the restrictions on us that we 
> simply do 
> > not know the character encoding used by the application. As 
> such, no 
> > *nix syslogd can be programmed to be compliant to 
> syslog-protocol if 
> > we demand UTF-8 exclusively.
> >
> > I propose that we RECOMMEND UTF-8 that MUST start with the Unicode 
> > Byte Order Mask (BOM) if used. If the MSG part does not 
> start with the 
> > BOM, it may be any encoding just as in RFC 3164. I do not see any 
> > alternative to this.
> >
> > Rainer
> >
> > ___
> > Syslog mailing list
> > Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog
> >
> 

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


Re: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-29 Thread Darren Reed
> Hi Rainer,
> 
> Why don't we look at it from the other direction?  We could state that any 
> encoding is acceptable - for ease-of-use/migration with existing syslog 
> implementations.  It is RECOMMENDED that UTF-8 be used.  When it is 
> used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8" lang="en"]
> 
> Thoughts?

I think this is a very sensible approach.

Darren

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-29 Thread Chris Lonvick

Hi Rainer,

Why don't we look at it from the other direction?  We could state that any 
encoding is acceptable - for ease-of-use/migration with existing syslog 
implementations.  It is RECOMMENDED that UTF-8 be used.  When it is 
used, an SD-ID element will be REQUIRED.  e.g. - [enc="utf-8" lang="en"]


Thoughts?

All:  Let's discuss this and close this issue.

Thanks,
Chris

On Tue, 29 Nov 2005, Rainer Gerhards wrote:


Chris & WG,


#5 Character encoding in MSG: due to my proof-of-concept
  implementation, I have raised the (ugly) question if we need
  to allow encodings other than UTF-8. Please note that this
  question arises from needs introduced by e.g. POSIX. So we
  can't easily argue them away by whishful thinking ;)

Not even discussed yet.


I haven't reviewed that yet.  However, I'll note that
allowing different
encoding can be accomplished in the future as long as we establish a
default encoding and a way to identify it in our current work.


I have read a little in the mailing archive. Please note that in 2000 it
was consensus that the MSG part may contain encodings other then
US-ASCII. Follow this threat:

http://www.syslog.cc/ietf/autoarc/msg00127.html

This discussion lead to RFC 3164 saying "other encodings MAY be used".
While this was observed behaviour, we need still to be aware that the
POSIX (and glibc) API places the restrictions on us that we simply do
not know the character encoding used by the application. As such, no
*nix syslogd can be programmed to be compliant to syslog-protocol if we
demand UTF-8 exclusively.

I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte
Order Mask (BOM) if used. If the MSG part does not start with the BOM,
it may be any encoding just as in RFC 3164. I do not see any alternative
to this.

Rainer

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog



___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog


RE: [Syslog] #5 - character encoding (was: Consensus?)

2005-11-29 Thread Rainer Gerhards
Chris & WG,

> > #5 Character encoding in MSG: due to my proof-of-concept
> >   implementation, I have raised the (ugly) question if we need
> >   to allow encodings other than UTF-8. Please note that this
> >   question arises from needs introduced by e.g. POSIX. So we
> >   can't easily argue them away by whishful thinking ;)
> >
> > Not even discussed yet.
> 
> I haven't reviewed that yet.  However, I'll note that 
> allowing different 
> encoding can be accomplished in the future as long as we establish a 
> default encoding and a way to identify it in our current work.

I have read a little in the mailing archive. Please note that in 2000 it
was consensus that the MSG part may contain encodings other then
US-ASCII. Follow this threat:

http://www.syslog.cc/ietf/autoarc/msg00127.html

This discussion lead to RFC 3164 saying "other encodings MAY be used".
While this was observed behaviour, we need still to be aware that the
POSIX (and glibc) API places the restrictions on us that we simply do
not know the character encoding used by the application. As such, no
*nix syslogd can be programmed to be compliant to syslog-protocol if we
demand UTF-8 exclusively.

I propose that we RECOMMEND UTF-8 that MUST start with the Unicode Byte
Order Mask (BOM) if used. If the MSG part does not start with the BOM,
it may be any encoding just as in RFC 3164. I do not see any alternative
to this.

Rainer

___
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog