Neil Hodgson wrote: > Fuzzyman: > > > Thanks - so I need to decode to unicode and *then* split on line > > endings. Problem is, that means I can't use Python to handle line > > endings where I don't know the encoding in advance. > > > > In another thread I've posted a small function that *guesses* line > > endings in use. > > You can normalise line endings: > > >>> x = "a\r\nb\rc\nd\n\re" > >>> y = x.replace("\r\n", "\n").replace("\r","\n") > >>> y > 'a\nb\nc\nd\n\ne' > >>> print y > a > b > c > d > > e > > The empty line is because "\n\r" is 2 line ends. >
Thanks - that works, but replaces *all* instances of '\r' to '\n' - even if they aren't used as line terminators. (Unlikely perhaps). It also doesn't tell me what line ending was used. Apparently files opened in universal mode - 'rU' - have a newline attribute. That makes it a bit easier. :-) Fuzzyman http://www.voidspace.org.uk/python/index.shtml > Neil -- http://mail.python.org/mailman/listinfo/python-list