Deleting lines from a file
Hi, I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
Horacius ReX wrote: Hi, I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? Not using a file but a database instead. If that's not possible, you can't do anything but open/read/filter/write - filesystems (at least not the known ones) don't support random deletion. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
Horacius ReX wrote: Hi, I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? Thanks One way would be to mark the lines as being deleted by either: 1) replacing them with some known character sequence that you treat as deleted. This assumes that the lines are long enough. or 2) by keeping a separate dictionary that holds line numbers and deleteflag. Pickle and dump this dictionary before program execution ends. Load it at program execution beginning. deletedFlags={1:False, 2: True, ...} def load(): pFiles=deletedLines.toc fp=open(pFiles, 'wb') deletedFlags=pickle.dump(fp) fp.close() def dump(deletedFlags): pFiles=deletedLines.toc fp=open(pFiles, 'rb') pickle.dump(deletedFlags, fp) fp.close() Caveats: 1) you must write EXACTLY the same number of bytes (padded with spaces, etc.) on top of deleted lines. This method doesn't work if any of the lines are so short they don't support your DELETED flag string. 2) You must be very careful to maintain consistency of the deletedFlags dictionary and the data file (by using try/except/finally around your entire process). Personally I would employ method #2 and periodically pack the file with a separate process. That could run unattended (e.g. at night). Or, if I did this a lot, I would use a database instead. -Larry -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
On Dec 17, 2007, at 5:34 AM, Horacius ReX wrote: I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? AFAIK, there really isn't much you can do to *speed* the reading and writing of the large text file. But maybe you can avoid doing it too much. If you must make many changes it might help to just keep a list of lines to consider deleted -- and write the modified file out later. hth, Michael --- I use tuples simply because of their mellifluous appellation. --Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
and regardless of the speed, what do you think would be the best method to do this ? Michael Bentley wrote: On Dec 17, 2007, at 5:34 AM, Horacius ReX wrote: I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? AFAIK, there really isn't much you can do to *speed* the reading and writing of the large text file. But maybe you can avoid doing it too much. If you must make many changes it might help to just keep a list of lines to consider deleted -- and write the modified file out later. hth, Michael --- I use tuples simply because of their mellifluous appellation. --Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
I need to write a program which reads an external text file. Each time it reads, then it needs to delete some lines, for instance from second line to 55th line. The file is really big, so what do you think is the fastest method to delete specific lines in a text file ? Generally, with files that are really big, you either want to edit them in place (which takes a database-type structure), or you need to stream through the file a line/window at a time, dumping the output to a temporary output file. The *nix tool for this job is sed: sed '2,55d' infile.txt outfile.txt (it doesn't get much more consise than this). That's about the same as the following in Python out = file('outfile.txt', 'w') for i, line in enumerate(file('infile.txt')): if 1 i 54: continue out.write(line) out.close() If you want it in place, sed will do the output file and renaming for you with sed -i '2,55d' file.txt whereas in the Python variant, you'd have to then use the os.rename call to move outfile.txt to infile.txt The Python version is a bit more flexible, as you can add other logic to change your bounds. Not that sed isn't flexible, but it starts getting unreadible very quickly as logic grows. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
Horacius ReX wrote: and regardless of the speed, what do you think would be the best method to do this ? Without more information about the contents of the file and who's reading them, we can't say more. if the reader is not under your control doesn't deal with deletion-marks or anything such in the file, you can't do anything but really delete the lines. If you can control it, it depends on how you process the file - has it fixed line length, or not, and so forth. Because you need to use seek to position the file-pointer to the proper location in the file to write a deletion mark, but to do so you of course need to determine it first - and that will need to be done in a two-pass apporach most probably. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
On 12/17/07, Horacius ReX [EMAIL PROTECTED] wrote: and regardless of the speed, what do you think would be the best method to do this ? use sqlite -- Vladimir Rusinov GreenMice Solutions: IT-решения на базе Linux http://greenmice.info/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Deleting lines from a file
On Dec 17, 2007, at 6:25 AM, Horacius ReX wrote: and regardless of the speed, what do you think would be the best method to do this ? The first thing I'd look into is reading the whole file into memory, making all the deletions, and finally writing it out. But you said the file is big, so here's a quick stab at it (with multiple read passes and a single write): import string rm = [] #first pass through file -- mark some lines for deletion for line, text in enumerate(file('words')): if text[0] in string.uppercase: rm.append(line) #second pass -- mark lines with 'e' for deletion for line, text in enumerate(file('words')): if line in rm: print 'skipping %s' % line continue if 'e' in text: rm.append(line) # now write the modified file print 'Writing %d of %d lines' % (len(rm), line) outFile = file('newWords', 'w') for line, text in enumerate(file('words')): if line not in rm: outFile.write(text) hth, Michael --- Simplicity is the ultimate sophistication. -Leonardo da Vinci -- http://mail.python.org/mailman/listinfo/python-list