Re: Record seperator

Roy Smith Sat, 27 Aug 2011 10:53:16 -0700

In article <[email protected]>,
 Steven D'Aprano <[email protected]> wrote:


> open("file.txt")   # opens the file
>  .read()           # reads the contents of the file
>  .split("\n\n")    # splits the text on double-newlines.

The biggest problem with this code is that read() slurps the entire file 
into a string.  That's fine for moderately sized files, but will fail 
(or at least be grossly inefficient) for very large files.

It's always annoyed me a little that while it's easy to iterate over the 
lines of a file, it's more complicated to iterate over a file character 
by character.  You could write your own generator to do that:

for c in getchar(open("file.txt")):
   whatever

def getchar(f):
   for line in f:
      for c in line:
         yield c

but that's annoyingly verbose (and probably not hugely efficient).

Of course, the next problem for the specific problem at hand is that 
even with an iterator over the characters of a file, split() only works 
on strings.  It would be nice to have a version of split which took an 
iterable and returned an iterator over the split components.  Maybe 
there is such a thing and I'm just missing it?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Record seperator

Reply via email to