Re: Newby: how to transform text into lines of text

2009-01-26 Thread Tim Rowe
2009/1/25 Tim Chase python.l...@tim.thechases.com:

 (again, a malformed text-file with no terminal '\n' may cause it
 to be absent from the last line)

Ahem. That may be malformed for some specific file specification,
but it is only malformed in general if you are using an operating
system that treats '\n' as a terminator (eg, Linux) rather than as a
separator (eg, MS DOS/Windows).

Perhaps what you don't /really/ want to be reminded of is the
existence of operating systems other than your preffered one?

-- 
Tim Rowe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Sion Arrowsmith
Diez B. Roggisch de...@nospam.web.de wrote:
 [ ... ] Your approach of reading the full contents can be 
used like this:

content = a.read()
for line in content.split(\n):
 print line


Or if you want the full content in memory but only ever access it on a
line-by-line basis:

content = a.readlines()

(Just because we can now write for line in file doesn't mean that
readlines() is *totally* redundant.)

-- 
\S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
   Frankly I have no feelings towards penguins one way or the other
-- Arthur C. Clarke
   her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Marc 'BlackJack' Rintsch
On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote:

 content = a.readlines()
 
 (Just because we can now write for line in file doesn't mean that
 readlines() is *totally* redundant.)

But ``content = list(a)`` is shorter.  :-)

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Andreas Waldenburger
On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net
wrote:

 On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote:
 
  content = a.readlines()
  
  (Just because we can now write for line in file doesn't mean that
  readlines() is *totally* redundant.)
 
 But ``content = list(a)`` is shorter.  :-)
 
But much less clear, wouldn't you say?

content is now what? A list of lines? Characters? Bytes? I-Nodes?
Dates? Granted, it can be inferred from the fact that a file is its
own iterator over its lines, but that is a mental step that readlines()
frees you from doing.

My ~0.0154 €.

/W

-- 
My real email address is constructed by swapping the domain with the
recipient (local part).

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread J. Cliff Dyer

On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote:
 On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
 wrote:
  En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase  
  python.l...@tim.thechases.com escribió:
 
 
 
   Unfortunately, a raw rstrip() eats other whitespace that may be  
   important.  I frequently get tab-delimited files, using the following  
   pseudo-code:
 
  def clean_line(line):
return line.rstrip('\r\n').split('\t')
 
  f = file('customer_x.txt')
  headers = clean_line(f.next())
  for line in f:
field1, field2, field3 = clean_line(line)
do_stuff()
 
   if field3 is empty in the source-file, using rstrip(None) as you suggest  
   triggers errors on the tuple assignment because it eats the tab that  
   defined it.
 
   I suppose if I were really smart, I'd dig a little deeper in the CSV  
   module to sniff out the right way to parse tab-delimited files.
 
  It's so easy that don't doing that is just inexcusable lazyness :)
  Your own example, written using the csv module:
 
  import csv
 
  f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
  headers = f.next()
  for line in f:
   field1, field2, field3 = line
   do_stuff()
 
 
 And where in all of that do you recommend that .decode(some_encoding)
 be inserted?
 

If encoding is an issue for your application, then I'd recommend you use
codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open()

 --
 http://mail.python.org/mailman/listinfo/python-list
 

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Gabriel Genellina
En Mon, 26 Jan 2009 13:35:39 -0200, J. Cliff Dyer j...@sdf.lonestar.org  
escribió:

On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote:

On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase
 python.l...@tim.thechases.com escribió:



  I suppose if I were really smart, I'd dig a little deeper in the CSV
  module to sniff out the right way to parse tab-delimited files.

 It's so easy that don't doing that is just inexcusable lazyness :)
 Your own example, written using the csv module:

 import csv

 f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
 headers = f.next()
 for line in f:
  field1, field2, field3 = line
  do_stuff()


And where in all of that do you recommend that .decode(some_encoding)
be inserted?


If encoding is an issue for your application, then I'd recommend you use
codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open()


This would be the best way *if* the csv module could handle Unicode input,  
but unfortunately this is not the case. See my other reply.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Gabriel Genellina
En Mon, 26 Jan 2009 13:35:39 -0200, J. Cliff Dyer j...@sdf.lonestar.org  
escribió:

On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote:

On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase
 python.l...@tim.thechases.com escribió:



  I suppose if I were really smart, I'd dig a little deeper in the CSV
  module to sniff out the right way to parse tab-delimited files.

 It's so easy that don't doing that is just inexcusable lazyness :)
 Your own example, written using the csv module:

 import csv

 f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
 headers = f.next()
 for line in f:
  field1, field2, field3 = line
  do_stuff()


And where in all of that do you recommend that .decode(some_encoding)
be inserted?


If encoding is an issue for your application, then I'd recommend you use
codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open()


This would be the best way *if* the csv module could handle Unicode input,  
but unfortunately this is not the case. See my other reply.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Marc 'BlackJack' Rintsch
On Mon, 26 Jan 2009 16:10:11 +0100, Andreas Waldenburger wrote:

 On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net
 wrote:
 
 On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote:
 
  content = a.readlines()
  
  (Just because we can now write for line in file doesn't mean that
  readlines() is *totally* redundant.)
 
 But ``content = list(a)`` is shorter.  :-)
 
 But much less clear, wouldn't you say?

Okay, so let's make it clearer and even shorter: ``lines = list(a)``.  :-)

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-26 Thread Andreas Waldenburger
On 26 Jan 2009 22:12:43 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net
wrote:

 On Mon, 26 Jan 2009 16:10:11 +0100, Andreas Waldenburger wrote:
 
  On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch
  bj_...@gmx.net wrote:
  
  On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote:
  
   content = a.readlines()
   
   (Just because we can now write for line in file doesn't mean
   that readlines() is *totally* redundant.)
  
  But ``content = list(a)`` is shorter.  :-)
  
  But much less clear, wouldn't you say?
 
 Okay, so let's make it clearer and even shorter: ``lines =
 list(a)``.  :-)
 
OK, you win. :)

/W

-- 
My real email address is constructed by swapping the domain with the
recipient (local part).
--
http://mail.python.org/mailman/listinfo/python-list


Newby: how to transform text into lines of text

2009-01-25 Thread vsoler
Hello,

I'va read a text file into variable a

 a=open('FicheroTexto.txt','r')
 a.read()

a contains all the lines of the text separated by '\n' characters.

Now, I want to work with each line separately, without the '\n'
character.

How can I get variable b as a list of such lines?

Thank you for your help
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Diez B. Roggisch

vsoler schrieb:

Hello,

I'va read a text file into variable a

 a=open('FicheroTexto.txt','r')
 a.read()

a contains all the lines of the text separated by '\n' characters.


No, it doesn't. a.read() *returns* the contents, but you don't assign 
it, so it is discarded.



Now, I want to work with each line separately, without the '\n'
character.

How can I get variable b as a list of such lines?



The idiomatic way would be iterating over the file-object itself - which 
will get you the lines:


with open(foo.txt) as inf:
for line in inf:
print line


The advantage is that this works even for large files that otherwise 
won't fit into memory. Your approach of reading the full contents can be 
used like this:


content = a.read()
for line in content.split(\n):
print line


Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Tim Chase
The idiomatic way would be iterating over the file-object itself - which 
will get you the lines:


with open(foo.txt) as inf:
 for line in inf:
 print line


In versions of Python before the with was introduced (as in the 
2.4 installations I've got at both home and work), this can simply be


  for line in open(foo.txt):
print line

If you are processing lots of files, you can use

  f = open(foo.txt)
  for line in f:
print line
  f.close()

One other caveat here, line contains the newline at the end, so 
you might have


 print line.rstrip('\r\n')

to remove them.



content = a.read()
for line in content.split(\n):
 print line


Strings have a splitlines() method for this purpose:

  content = a.read()
  for line in content.splitlines():
print line

-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread vsoler
On 25 ene, 14:36, Diez B. Roggisch de...@nospam.web.de wrote:
 vsoler schrieb:

  Hello,

  I'va read a text file into variable a

       a=open('FicheroTexto.txt','r')
       a.read()

  a contains all the lines of the text separated by '\n' characters.

 No, it doesn't. a.read() *returns* the contents, but you don't assign
 it, so it is discarded.

  Now, I want to work with each line separately, without the '\n'
  character.

  How can I get variable b as a list of such lines?

 The idiomatic way would be iterating over the file-object itself - which
 will get you the lines:

 with open(foo.txt) as inf:
      for line in inf:
          print line

 The advantage is that this works even for large files that otherwise
 won't fit into memory. Your approach of reading the full contents can be
 used like this:

 content = a.read()
 for line in content.split(\n):
      print line

 Diez

Thanks a lot. Very quick and clear
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread John Machin
On Jan 26, 12:54 am, Tim Chase python.l...@tim.thechases.com wrote:

 One other caveat here, line contains the newline at the end, so
 you might have

   print line.rstrip('\r\n')

 to remove them.

I don't understand the presence of the '\r' there. Any '\x0d' that
remains after reading the file in text mode and is removed by that
rstrip would be a strange occurrence in the data which the OP may
prefer to find out about and deal with; it is not part of the
newline. Why suppress one particular data character in preference to
others?

The same applies in any case to the use of rstrip('\n'); if that finds
more than one ocurrence of '\x0a' to remove, it has exceeded the
mandate of removing the newline (if any).

So, we are left with the unfortunately awkward
if line.endswith('\n'):
line = line[:-1]

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Tim Chase

One other caveat here, line contains the newline at the end, so
you might have

  print line.rstrip('\r\n')

to remove them.


I don't understand the presence of the '\r' there. Any '\x0d' that
remains after reading the file in text mode and is removed by that
rstrip would be a strange occurrence in the data which the OP may
prefer to find out about and deal with; it is not part of the
newline. Why suppress one particular data character in preference to
others?


In an ideal world where everybody knew how to make a proper 
text-file, it wouldn't be an issue.  Recreating the form of some 
of the data I get from customers/providers:


  f = file('tmp/x.txt', 'wb')
  f.write('headers\n')  # headers in Unix format
  f.write('data1\r\n')  # data in Dos format
  f.write('data2\r\n')
  f.write('data3')   # no trailing newline of any sort
  f.close()

Then reading it back in:

  for line in file('tmp/x.txt'): print repr(line)
 ...
 'headers\n'
 'data1\r\n'
 'data2\r\n'
 'data3'

As for wanting to know about stray '\r' characters, I only want 
the data -- I don't particularly like to be reminded of the 
incompetence of those who send me malformed text-files ;-)



The same applies in any case to the use of rstrip('\n'); if that finds
more than one ocurrence of '\x0a' to remove, it has exceeded the
mandate of removing the newline (if any).


I believe that using the formulaic for line in file(FILENAME) 
iteration guarantees that each line will have at most only one 
'\n' and it will be at the end (again, a malformed text-file with 
no terminal '\n' may cause it to be absent from the last line)



So, we are left with the unfortunately awkward
if line.endswith('\n'):
line = line[:-1]


You're welcome to it, but I'll stick with my more DWIM solution 
of get rid of anything that resembles an attempt at a CR/LF.


Thank goodness I haven't found any of my data-sources using 
\n\r instead, which would require me to left-strip '\r' 
characters as well.  Sigh.  My kingdom for competency. :-/


-tkc





--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread John Machin

On 26/01/2009 10:34 AM, Tim Chase wrote:

I believe that using the formulaic for line in file(FILENAME) 
iteration guarantees that each line will have at most only one '\n' 
and it will be at the end (again, a malformed text-file with no terminal 
'\n' may cause it to be absent from the last line)


It seems that you are right -- not that I can find such a guarantee 
written anywhere. I had armchair-philosophised that writing 
foo\n\r\nbar\r\n to a file in binary mode and reading it on Windows in 
text mode would be strict and report the first line as foo\n\n; I was 
wrong.





So, we are left with the unfortunately awkward
if line.endswith('\n'):
line = line[:-1]


You're welcome to it, but I'll stick with my more DWIM solution of get 
rid of anything that resembles an attempt at a CR/LF.


Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell 
the OP exactly what you meant).


My approach to DWIM with data is, given
   norm_space = lambda s: u' '.join(s.split())
to break up the line into fields first (just in case the field delimiter 
== '\t') then apply norm_space to each field. This gets rid of your '\r' 
at end (or start!) of line, and multiple whitespace characters are 
replaced by a single space. Whitespace includes NBSP (U+00A0) as an 
added bonus for being righteous and using Unicode :-)


Thank goodness I haven't found any of my data-sources using \n\r 
instead, which would require me to left-strip '\r' characters as well.  
Sigh.  My kingdom for competency. :-/


Indeed. I actually got data in that format once from a *x programmer who 
was so kind as to do it that way just for me because he knew that I use 
Windows and he thought that's what Windows text files looked like. No 
kidding.


Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Scott David Daniels

John Machin wrote:

On 26/01/2009 10:34 AM, Tim Chase wrote:

I believe that using the formulaic for line in file(FILENAME) 
iteration guarantees that each line will have at most only one '\n' 
and it will be at the end (again, a malformed text-file with no 
terminal '\n' may cause it to be absent from the last line)


It seems that you are right -- not that I can find such a guarantee 
written anywhere. I had armchair-philosophised that writing 
foo\n\r\nbar\r\n to a file in binary mode and reading it on Windows in 
text mode would be strict and report the first line as foo\n\n; I was 
wrong.


Here's how I'd do it:
with open('deheap/deheap.py', 'rU') as source:
for line in source:
print line.rstrip()  # Avoid trailing spaces as well.

This should handle \n, \r\n, and \n\r lines.

--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Steven D'Aprano
On Sun, 25 Jan 2009 17:34:18 -0600, Tim Chase wrote:

 Thank goodness I haven't found any of my data-sources using \n\r
 instead, which would require me to left-strip '\r' characters as well. 
 Sigh.  My kingdom for competency. :-/

If I recall correctly, one of the accounting systems I used eight years 
ago gave you the option of exporting text files with either \r\n or \n\r 
as the end-of-line mark. Neither \n nor \r (POSIX or classic Mac) line 
endings were supported, as that would have been useful.

(It may have been Arrow Accounting, but don't quote me on that.)

I can only imagine the developer couldn't remember which order the 
characters were supposed to go, so rather than look it up, he made it 
optional.



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Tim Chase

Scott David Daniels wrote:

Here's how I'd do it:
 with open('deheap/deheap.py', 'rU') as source:
 for line in source:
 print line.rstrip()  # Avoid trailing spaces as well.

This should handle \n, \r\n, and \n\r lines.



Unfortunately, a raw rstrip() eats other whitespace that may be 
important.  I frequently get tab-delimited files, using the 
following pseudo-code:


  def clean_line(line):
return line.rstrip('\r\n').split('\t')

  f = file('customer_x.txt')
  headers = clean_line(f.next())
  for line in f:
field1, field2, field3 = clean_line(line)
do_stuff()

if field3 is empty in the source-file, using rstrip(None) as you 
suggest triggers errors on the tuple assignment because it eats 
the tab that defined it.


I suppose if I were really smart, I'd dig a little deeper in the 
CSV module to sniff out the right way to parse tab-delimited files.


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Gabriel Genellina
En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase  
python.l...@tim.thechases.com escribió:


Unfortunately, a raw rstrip() eats other whitespace that may be  
important.  I frequently get tab-delimited files, using the following  
pseudo-code:


   def clean_line(line):
 return line.rstrip('\r\n').split('\t')

   f = file('customer_x.txt')
   headers = clean_line(f.next())
   for line in f:
 field1, field2, field3 = clean_line(line)
 do_stuff()

if field3 is empty in the source-file, using rstrip(None) as you suggest  
triggers errors on the tuple assignment because it eats the tab that  
defined it.


I suppose if I were really smart, I'd dig a little deeper in the CSV  
module to sniff out the right way to parse tab-delimited files.


It's so easy that don't doing that is just inexcusable lazyness :)
Your own example, written using the csv module:

import csv

f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
headers = f.next()
for line in f:
field1, field2, field3 = line
do_stuff()

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread John Machin
On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase  
 python.l...@tim.thechases.com escribió:



  Unfortunately, a raw rstrip() eats other whitespace that may be  
  important.  I frequently get tab-delimited files, using the following  
  pseudo-code:

     def clean_line(line):
       return line.rstrip('\r\n').split('\t')

     f = file('customer_x.txt')
     headers = clean_line(f.next())
     for line in f:
       field1, field2, field3 = clean_line(line)
       do_stuff()

  if field3 is empty in the source-file, using rstrip(None) as you suggest  
  triggers errors on the tuple assignment because it eats the tab that  
  defined it.

  I suppose if I were really smart, I'd dig a little deeper in the CSV  
  module to sniff out the right way to parse tab-delimited files.

 It's so easy that don't doing that is just inexcusable lazyness :)
 Your own example, written using the csv module:

 import csv

 f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
 headers = f.next()
 for line in f:
      field1, field2, field3 = line
      do_stuff()


And where in all of that do you recommend that .decode(some_encoding)
be inserted?

--
http://mail.python.org/mailman/listinfo/python-list


Re: Newby: how to transform text into lines of text

2009-01-25 Thread Gabriel Genellina
En Mon, 26 Jan 2009 00:23:30 -0200, John Machin sjmac...@lexicon.net  
escribió:

On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:



It's so easy that don't doing that is just inexcusable lazyness :)
Your own example, written using the csv module:

import csv

f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
headers = f.next()
for line in f:
     field1, field2, field3 = line
     do_stuff()


And where in all of that do you recommend that .decode(some_encoding)
be inserted?


For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode  
the fields right when extracting them:


field1, field2, field3 = (field.decode('utf8') for field in line)

For encodings that allow NUL bytes, I'd use any of the recipes in the csv  
module documentation.


(That is, if I care about the encoding at all. Perhaps the file contains  
only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only  
interested in some fields for which the encoding is irrelevant. Perhaps it  
is an internally generated file and it doesn't matter as long as I use the  
same encoding on output)
But I admit that in general, the decode input early when reading, work in  
unicode, encode output late when writing is the best practice.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list