On Mon, Jul 28, 2014 at 4:08 AM, CM <cmpyt...@gmail.com> wrote:
> I can go through it with opening the text file and reading in the lines, and 
> if the first character is a "-" then count that as the start of a bug block, 
> but I am not sure how to find the last line of a bug block...it would be the 
> line before the first line of the next bug block, but not sure the best way 
> to go about it.
>
> There must be a rather standard way to do something like this in Python, and 
> I'm requesting pointers toward that standard way (or what this type of task 
> is usually called).  Thanks.

This is a fairly standard sort of job, but there's not really a
ready-to-go bit of code. This is just straight-forward text
processing.

What I'd do is a stateful parser. Something like this:

block = None
with open("bugs.txt",encoding="utf-8") as f:
    for line in f:
        if line.startswith("- "):
            if block: save_to_database(block)
            block = line
        else:
            block += "\n" + line
if block: save_to_database(block) # don't forget to grab that last one!

This is extremely simple, and you might want to use a regex to look
for the upper-case word and date as well (this would falsely notice
any description line that happens to begin with a hyphen and a space).
But the basic idea is: initialize an accumulator to a null state;
whenever you find the beginning of something, emit the previous and
reset the accumulator; otherwise, add to the accumulator. At the end,
emit any current block.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to