Am 17.09.2012 04:28 schrieb Jadhav, Alok:
Thanks Dave for clean explanation. I clearly understand what is going on
now. I still need some suggestions from you on this.
There are 2 reasons why I was using self.rawfile.read().split('|\n')
instead of self.rawfile.readlines()
- As you have seen, the line separator is not '\n' but its '|\n'.
Sometimes the data itself has '\n' characters in the middle of the line
and only way to find true end of the line is that previous character
should be a bar '|'. I was not able specify end of line using
readlines() function, but I could do it using split() function.
(One hack would be to readlines and combine them until I find '|\n'. is
there a cleaner way to do this?)
- Reading whole file at once and processing line by line was must
faster. Though speed is not of very important issue here but I think the
tie it took to parse complete file was reduced to one third of original
time.
With
def itersep(f, sep='\0', buffering=1024, keepsep=True):
if keepsep:
keepsep=sep
else: keepsep=''
data = f.read(buffering)
next_line = data # empty? -> end.
while next_line: # -> data is empty as well.
lines = data.split(sep)
for line in lines[:-1]:
yield line+keepsep
next_line = f.read(buffering)
data = lines[-1] + next_line
# keepsep: only if we have something.
if (not keepsep) or data:
yield data
you can iterate over everything you want without needing too much
memory. Using a larger "buffering" might improve speed a little bit.
Thomas
--
http://mail.python.org/mailman/listinfo/python-list