Re: Newby: how to transform text into lines of text
2009/1/25 Tim Chase python.l...@tim.thechases.com: (again, a malformed text-file with no terminal '\n' may cause it to be absent from the last line) Ahem. That may be malformed for some specific file specification, but it is only malformed in general if you are using an operating system that treats '\n' as a terminator (eg, Linux) rather than as a separator (eg, MS DOS/Windows). Perhaps what you don't /really/ want to be reminded of is the existence of operating systems other than your preffered one? -- Tim Rowe -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
Diez B. Roggisch de...@nospam.web.de wrote: [ ... ] Your approach of reading the full contents can be used like this: content = a.read() for line in content.split(\n): print line Or if you want the full content in memory but only ever access it on a line-by-line basis: content = a.readlines() (Just because we can now write for line in file doesn't mean that readlines() is *totally* redundant.) -- \S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/ Frankly I have no feelings towards penguins one way or the other -- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote: content = a.readlines() (Just because we can now write for line in file doesn't mean that readlines() is *totally* redundant.) But ``content = list(a)`` is shorter. :-) Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote: content = a.readlines() (Just because we can now write for line in file doesn't mean that readlines() is *totally* redundant.) But ``content = list(a)`` is shorter. :-) But much less clear, wouldn't you say? content is now what? A list of lines? Characters? Bytes? I-Nodes? Dates? Granted, it can be inferred from the fact that a file is its own iterator over its lines, but that is a mental step that readlines() frees you from doing. My ~0.0154 €. /W -- My real email address is constructed by swapping the domain with the recipient (local part). -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote: On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase python.l...@tim.thechases.com escribió: Unfortunately, a raw rstrip() eats other whitespace that may be important. I frequently get tab-delimited files, using the following pseudo-code: def clean_line(line): return line.rstrip('\r\n').split('\t') f = file('customer_x.txt') headers = clean_line(f.next()) for line in f: field1, field2, field3 = clean_line(line) do_stuff() if field3 is empty in the source-file, using rstrip(None) as you suggest triggers errors on the tuple assignment because it eats the tab that defined it. I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() And where in all of that do you recommend that .decode(some_encoding) be inserted? If encoding is an issue for your application, then I'd recommend you use codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open() -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
En Mon, 26 Jan 2009 13:35:39 -0200, J. Cliff Dyer j...@sdf.lonestar.org escribió: On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote: On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase python.l...@tim.thechases.com escribió: I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() And where in all of that do you recommend that .decode(some_encoding) be inserted? If encoding is an issue for your application, then I'd recommend you use codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open() This would be the best way *if* the csv module could handle Unicode input, but unfortunately this is not the case. See my other reply. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
En Mon, 26 Jan 2009 13:35:39 -0200, J. Cliff Dyer j...@sdf.lonestar.org escribió: On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote: On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase python.l...@tim.thechases.com escribió: I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() And where in all of that do you recommend that .decode(some_encoding) be inserted? If encoding is an issue for your application, then I'd recommend you use codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open() This would be the best way *if* the csv module could handle Unicode input, but unfortunately this is not the case. See my other reply. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Mon, 26 Jan 2009 16:10:11 +0100, Andreas Waldenburger wrote: On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote: content = a.readlines() (Just because we can now write for line in file doesn't mean that readlines() is *totally* redundant.) But ``content = list(a)`` is shorter. :-) But much less clear, wouldn't you say? Okay, so let's make it clearer and even shorter: ``lines = list(a)``. :-) Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On 26 Jan 2009 22:12:43 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Mon, 26 Jan 2009 16:10:11 +0100, Andreas Waldenburger wrote: On 26 Jan 2009 14:51:33 GMT Marc 'BlackJack' Rintsch bj_...@gmx.net wrote: On Mon, 26 Jan 2009 12:22:18 +, Sion Arrowsmith wrote: content = a.readlines() (Just because we can now write for line in file doesn't mean that readlines() is *totally* redundant.) But ``content = list(a)`` is shorter. :-) But much less clear, wouldn't you say? Okay, so let's make it clearer and even shorter: ``lines = list(a)``. :-) OK, you win. :) /W -- My real email address is constructed by swapping the domain with the recipient (local part). -- http://mail.python.org/mailman/listinfo/python-list
Newby: how to transform text into lines of text
Hello, I'va read a text file into variable a a=open('FicheroTexto.txt','r') a.read() a contains all the lines of the text separated by '\n' characters. Now, I want to work with each line separately, without the '\n' character. How can I get variable b as a list of such lines? Thank you for your help -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
vsoler schrieb: Hello, I'va read a text file into variable a a=open('FicheroTexto.txt','r') a.read() a contains all the lines of the text separated by '\n' characters. No, it doesn't. a.read() *returns* the contents, but you don't assign it, so it is discarded. Now, I want to work with each line separately, without the '\n' character. How can I get variable b as a list of such lines? The idiomatic way would be iterating over the file-object itself - which will get you the lines: with open(foo.txt) as inf: for line in inf: print line The advantage is that this works even for large files that otherwise won't fit into memory. Your approach of reading the full contents can be used like this: content = a.read() for line in content.split(\n): print line Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
The idiomatic way would be iterating over the file-object itself - which will get you the lines: with open(foo.txt) as inf: for line in inf: print line In versions of Python before the with was introduced (as in the 2.4 installations I've got at both home and work), this can simply be for line in open(foo.txt): print line If you are processing lots of files, you can use f = open(foo.txt) for line in f: print line f.close() One other caveat here, line contains the newline at the end, so you might have print line.rstrip('\r\n') to remove them. content = a.read() for line in content.split(\n): print line Strings have a splitlines() method for this purpose: content = a.read() for line in content.splitlines(): print line -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On 25 ene, 14:36, Diez B. Roggisch de...@nospam.web.de wrote: vsoler schrieb: Hello, I'va read a text file into variable a a=open('FicheroTexto.txt','r') a.read() a contains all the lines of the text separated by '\n' characters. No, it doesn't. a.read() *returns* the contents, but you don't assign it, so it is discarded. Now, I want to work with each line separately, without the '\n' character. How can I get variable b as a list of such lines? The idiomatic way would be iterating over the file-object itself - which will get you the lines: with open(foo.txt) as inf: for line in inf: print line The advantage is that this works even for large files that otherwise won't fit into memory. Your approach of reading the full contents can be used like this: content = a.read() for line in content.split(\n): print line Diez Thanks a lot. Very quick and clear -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Jan 26, 12:54 am, Tim Chase python.l...@tim.thechases.com wrote: One other caveat here, line contains the newline at the end, so you might have print line.rstrip('\r\n') to remove them. I don't understand the presence of the '\r' there. Any '\x0d' that remains after reading the file in text mode and is removed by that rstrip would be a strange occurrence in the data which the OP may prefer to find out about and deal with; it is not part of the newline. Why suppress one particular data character in preference to others? The same applies in any case to the use of rstrip('\n'); if that finds more than one ocurrence of '\x0a' to remove, it has exceeded the mandate of removing the newline (if any). So, we are left with the unfortunately awkward if line.endswith('\n'): line = line[:-1] Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
One other caveat here, line contains the newline at the end, so you might have print line.rstrip('\r\n') to remove them. I don't understand the presence of the '\r' there. Any '\x0d' that remains after reading the file in text mode and is removed by that rstrip would be a strange occurrence in the data which the OP may prefer to find out about and deal with; it is not part of the newline. Why suppress one particular data character in preference to others? In an ideal world where everybody knew how to make a proper text-file, it wouldn't be an issue. Recreating the form of some of the data I get from customers/providers: f = file('tmp/x.txt', 'wb') f.write('headers\n') # headers in Unix format f.write('data1\r\n') # data in Dos format f.write('data2\r\n') f.write('data3') # no trailing newline of any sort f.close() Then reading it back in: for line in file('tmp/x.txt'): print repr(line) ... 'headers\n' 'data1\r\n' 'data2\r\n' 'data3' As for wanting to know about stray '\r' characters, I only want the data -- I don't particularly like to be reminded of the incompetence of those who send me malformed text-files ;-) The same applies in any case to the use of rstrip('\n'); if that finds more than one ocurrence of '\x0a' to remove, it has exceeded the mandate of removing the newline (if any). I believe that using the formulaic for line in file(FILENAME) iteration guarantees that each line will have at most only one '\n' and it will be at the end (again, a malformed text-file with no terminal '\n' may cause it to be absent from the last line) So, we are left with the unfortunately awkward if line.endswith('\n'): line = line[:-1] You're welcome to it, but I'll stick with my more DWIM solution of get rid of anything that resembles an attempt at a CR/LF. Thank goodness I haven't found any of my data-sources using \n\r instead, which would require me to left-strip '\r' characters as well. Sigh. My kingdom for competency. :-/ -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On 26/01/2009 10:34 AM, Tim Chase wrote: I believe that using the formulaic for line in file(FILENAME) iteration guarantees that each line will have at most only one '\n' and it will be at the end (again, a malformed text-file with no terminal '\n' may cause it to be absent from the last line) It seems that you are right -- not that I can find such a guarantee written anywhere. I had armchair-philosophised that writing foo\n\r\nbar\r\n to a file in binary mode and reading it on Windows in text mode would be strict and report the first line as foo\n\n; I was wrong. So, we are left with the unfortunately awkward if line.endswith('\n'): line = line[:-1] You're welcome to it, but I'll stick with my more DWIM solution of get rid of anything that resembles an attempt at a CR/LF. Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell the OP exactly what you meant). My approach to DWIM with data is, given norm_space = lambda s: u' '.join(s.split()) to break up the line into fields first (just in case the field delimiter == '\t') then apply norm_space to each field. This gets rid of your '\r' at end (or start!) of line, and multiple whitespace characters are replaced by a single space. Whitespace includes NBSP (U+00A0) as an added bonus for being righteous and using Unicode :-) Thank goodness I haven't found any of my data-sources using \n\r instead, which would require me to left-strip '\r' characters as well. Sigh. My kingdom for competency. :-/ Indeed. I actually got data in that format once from a *x programmer who was so kind as to do it that way just for me because he knew that I use Windows and he thought that's what Windows text files looked like. No kidding. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
John Machin wrote: On 26/01/2009 10:34 AM, Tim Chase wrote: I believe that using the formulaic for line in file(FILENAME) iteration guarantees that each line will have at most only one '\n' and it will be at the end (again, a malformed text-file with no terminal '\n' may cause it to be absent from the last line) It seems that you are right -- not that I can find such a guarantee written anywhere. I had armchair-philosophised that writing foo\n\r\nbar\r\n to a file in binary mode and reading it on Windows in text mode would be strict and report the first line as foo\n\n; I was wrong. Here's how I'd do it: with open('deheap/deheap.py', 'rU') as source: for line in source: print line.rstrip() # Avoid trailing spaces as well. This should handle \n, \r\n, and \n\r lines. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Sun, 25 Jan 2009 17:34:18 -0600, Tim Chase wrote: Thank goodness I haven't found any of my data-sources using \n\r instead, which would require me to left-strip '\r' characters as well. Sigh. My kingdom for competency. :-/ If I recall correctly, one of the accounting systems I used eight years ago gave you the option of exporting text files with either \r\n or \n\r as the end-of-line mark. Neither \n nor \r (POSIX or classic Mac) line endings were supported, as that would have been useful. (It may have been Arrow Accounting, but don't quote me on that.) I can only imagine the developer couldn't remember which order the characters were supposed to go, so rather than look it up, he made it optional. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
Scott David Daniels wrote: Here's how I'd do it: with open('deheap/deheap.py', 'rU') as source: for line in source: print line.rstrip() # Avoid trailing spaces as well. This should handle \n, \r\n, and \n\r lines. Unfortunately, a raw rstrip() eats other whitespace that may be important. I frequently get tab-delimited files, using the following pseudo-code: def clean_line(line): return line.rstrip('\r\n').split('\t') f = file('customer_x.txt') headers = clean_line(f.next()) for line in f: field1, field2, field3 = clean_line(line) do_stuff() if field3 is empty in the source-file, using rstrip(None) as you suggest triggers errors on the tuple assignment because it eats the tab that defined it. I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase python.l...@tim.thechases.com escribió: Unfortunately, a raw rstrip() eats other whitespace that may be important. I frequently get tab-delimited files, using the following pseudo-code: def clean_line(line): return line.rstrip('\r\n').split('\t') f = file('customer_x.txt') headers = clean_line(f.next()) for line in f: field1, field2, field3 = clean_line(line) do_stuff() if field3 is empty in the source-file, using rstrip(None) as you suggest triggers errors on the tuple assignment because it eats the tab that defined it. I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase python.l...@tim.thechases.com escribió: Unfortunately, a raw rstrip() eats other whitespace that may be important. I frequently get tab-delimited files, using the following pseudo-code: def clean_line(line): return line.rstrip('\r\n').split('\t') f = file('customer_x.txt') headers = clean_line(f.next()) for line in f: field1, field2, field3 = clean_line(line) do_stuff() if field3 is empty in the source-file, using rstrip(None) as you suggest triggers errors on the tuple assignment because it eats the tab that defined it. I suppose if I were really smart, I'd dig a little deeper in the CSV module to sniff out the right way to parse tab-delimited files. It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() And where in all of that do you recommend that .decode(some_encoding) be inserted? -- http://mail.python.org/mailman/listinfo/python-list
Re: Newby: how to transform text into lines of text
En Mon, 26 Jan 2009 00:23:30 -0200, John Machin sjmac...@lexicon.net escribió: On Jan 26, 1:03 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: It's so easy that don't doing that is just inexcusable lazyness :) Your own example, written using the csv module: import csv f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') headers = f.next() for line in f: field1, field2, field3 = line do_stuff() And where in all of that do you recommend that .decode(some_encoding) be inserted? For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode the fields right when extracting them: field1, field2, field3 = (field.decode('utf8') for field in line) For encodings that allow NUL bytes, I'd use any of the recipes in the csv module documentation. (That is, if I care about the encoding at all. Perhaps the file contains only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only interested in some fields for which the encoding is irrelevant. Perhaps it is an internally generated file and it doesn't matter as long as I use the same encoding on output) But I admit that in general, the decode input early when reading, work in unicode, encode output late when writing is the best practice. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list