On Fri, Apr 6, 2012 at 1:06 PM, Vinay Sajip <vinay_sa...@yahoo.co.uk> wrote:

> There is a problem with the way logging.handlers.SysLogHandler works
> when presented with Unicode messages. According to RFC 5424, Unicode
> is supposed to be sent encoded as UTF-8 and preceded by a BOM.
> However, the current handler implementation puts the BOM at the start
> of the formatted message, and this is wrong in scenarios where you
> want to put some additional structured data in front of the
> unstructured message part; the BOM is supposed to go after the
> structured part (which, therefore, has to be ASCII) and before the
> unstructured part. In that scenario, the handler's current behaviour
> does not strictly conform to RFC 5424.
>
> The issue is described in [1]. The BOM was originally added / position
> changed in response to [2] and [3].
>
> It is not possible to achieve conformance with the current
> implementation of the handler, unless you subclass the handler and
> override the whole emit() method. This is not ideal. For 3.3, I will
> refactor the implementation to expose a method which creates the byte
> string which is sent over the wire to the syslog daemon. This method
> can then be overridden for specific use cases where needed.
>
> However, for 2.7 and 3.2, removing the BOM insertion would bring the
> implementation into conformance to the RFC, though the entire message
> would have to be regarded as just a set of octets. A Unicode message
> would still be encoded using UTF-8, but the BOM would be left out.
>
> I am thinking of removing the BOM insertion in 2.7 and 3.2 - although
> it is a change in behaviour, the current behaviour does seem broken
> with regard to RFC 5424 conformance. However, as some might disagree
> with that assessment and view it as a backwards-incompatible behaviour
> change, I thought I should post this to get some opinions about
> whether this change is viewed as objectionable.
>

Given the existing brokenness I personally think that removing the BOM
insertion (because it is incorrect) in 2.7 and 3.2 is fine if you cannot
find a way to make it correct in 2.7 and 3.2 without breaking existing APIs.

could a private method to create the byte string not be added and used in
2.7 and 3.2 that correctly add the BOM?


> Regards,
>
> Vinay Sajip
>
> [1] http://bugs.python.org/issue14452
> [2] http://bugs.python.org/issue7077
> [3] http://bugs.python.org/issue8795
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to