On 7/27/2014 2:08 PM, CM wrote:
I have a big text file of bugs that I want to use Python to parse
such that the bugs can be neatly filed into a database. I can bumble
toward a solution with looping but feel this is a classic example of
reinventing the wheel, and yet I'm finding it hard to Google for.

Basically the file is structured like this (silly examples, of
course), with each of these three lets call a "bug block":


- BUG 2.13.14  When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat", the application becomes
available again.

- ISSUE 2.13.14  During thunderstorms, the application runs
backwards.

- BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
That's too bad.


Generally, every bug block starts with a "-" as the first character,

I will assume 'always'

then some words in all caps, a date in that format, and then the
descriptive text. There is always a blank line in between bug blocks,
but sometimes there may be a blank line within the bug description as
well.

The goal is to grab each bug block, clean up that text (there are CRs
in it, etc., but I can do that), and dump it into a database record
(the db stuff I can do).  Grabbing the date along the way would be
wonderful as well.

I can go through it with opening the text file and reading in the
lines, and if the first character is a "-" then count that as the
start of a bug block, but I am not sure how to find the last line of
a bug block...it would be the line before the first line of the next
bug block, but not sure the best way to go about it.

There must be a rather standard way to do something like this in
Python, and I'm requesting pointers toward that standard way (or what
this type of task is usually called).  Thanks.

Split the processing into two phases: generating individual bugs and processing each bug. Here is a prototype.

with open(bugfile) as f:
    for bug in bugs(f):
        process(bug)

Here are two examples of the first phase. Use the second for a big file. (If individual bugs are more than a few lines, I would collect lines in the generator in a list and use ''.join(<list>)).

bugtext = '''\
- BUG 2.13.14  When you wear a purple hat, the application locks up.
If you sing the theme to "The Love Boat",
the application becomes available again.

- ISSUE 2.13.14  During thunderstorms, the application runs backwards.

- BUG/OPTIMIZE 11.12.12:  Sometimes the application is really slow.
That's too bad
'''

buglist1 = [bug.strip().replace('\n', '') for bug in bugtext[1:].split('\n-')]
for bug in buglist1: print(bug)

def bugs(lines):
    lines = iter(lines)
    bug = next(lines)[1:]
    for line in lines:
        if line[:1] != '-':
            bug += line
        else:
            yield bug.strip()
            bug = line[1:]
    yield bug.strip()


buglist2 = [bug for bug in bugs(bugtext.splitlines())]
for bug in buglist2: print(bug)
print(buglist1 == buglist2)

>>>
BUG 2.13.14 When you wear a purple hat, the application locks up.If you sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's too bad BUG 2.13.14 When you wear a purple hat, the application locks up.If you sing the theme to "The Love Boat",the application becomes available again.
ISSUE 2.13.14  During thunderstorms, the application runs backwards.
BUG/OPTIMIZE 11.12.12: Sometimes the application is really slow.That's too bad
True

Now write process(bug)

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to