Re: detecting newline character

Daniel Geržo Sun, 24 Apr 2011 00:47:30 -0700

On 23.4.2011 21:18, Thomas 'PointedEars' Lahn wrote:

Daniel Geržo wrote:

I need to detect the newline characters used in the file I am reading.
For this purpose I am using the following code:

def _read_lines(self):
      with contextlib.closing(codecs.open(self.path, "rU")) as fobj:
          fobj.readlines()
          if isinstance(fobj.newlines, tuple):
              self.newline = fobj.newlines[0]
          else:
              self.newline = fobj.newlines

This works fine, if I call codecs.open() without encoding argument; I am
testing with an ASCII enghlish text file, and in such case the
fobj.newlines is correctly detected being as '\r\n'. However, when I
call codecs.open() with encoding='ascii' argument, the fobj.newlines is
None and I can't figure out why that is the case. Reading the PEP at
http://www.python.org/dev/peps/pep-0278/ I don't see any reason why
would I end up with newlines being None after I call readlines().

Anyone has an idea? You can fetch the file I am testing with from
http://danger.rulez.sk/subrip_ascii.srt


I see nothing suspicious in your .srt *after* downloading it.  file -i
confirms that it only contains US-ASCII characters (but see below).


That is indeed the case in my environment too.

danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> file -isubrip_ascii.srt

subrip_ascii.srt: regular file

danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> filesubrip_ascii.srt

subrip_ascii.srt: ASCII English text, with CRLF line terminators

The only reason I can think of for this not working ATM comes from the
documentation, where it says that 'U' requires Python to be built with
universal newline support; that it is *usually* so, but might not be so in
your case (but then the question remains: How could it be not None without
`encoding' argument?)

Yes, this is what does not make sense. If I didn't have the universalnewline support enabled, I wouldn't have the newlines attribute at all.

<http://docs.python.org/library/codecs.html?highlight=codecs.open#codecs.open>
<http://docs.python.org/library/functions.html#open>

WFM with and without `encoding' argument in python-2.7.1-8 (CPython), Debian
GNU/Linux 6.0.1, Linux 2.6.35.5-pe (custom) SMP i686.

Which Python implementation and version are you using on which system?

This is a standard python installation from MacPorts. System is OS X10.6.7. I have now tried both python 2.7.1 and python 2.6.6 fromMacPorts and also 2.6.6 on FreeBSD. All fail for me when I set encoding.

On which system has the "ASCII" file been created and how?  Note that both
uploading the file with FTP in ASCII mode and downloading over HTTP might
have removed the problem Python has with it.

Unfortunately I am not 100% sure where I created the file, it was quitesome time ago, but it was either WinXP, or OS X Leopard. The source codecan be found at https://bitbucket.org/danger/pysublib/src - I noticedthe subtitle file tests (e.g. test/test_subripfile.py) are failing forme and I have identified the problem with newlines being None aftercalling read().


--
S pozdravom / Best regards
  Daniel Gerzo
--
http://mail.python.org/mailman/listinfo/python-list

Re: detecting newline character

Reply via email to