David Huard wrote:
Would everyone be satisfied with a solution using regular expressions ?

Maybe it's because regular expressions make me itch, but I think it's overkill for this.

The issue here is a result of what I consider a wart in python's string methods -- string.find() returns a valid index( -1 ) when it fails to find anything. The usual way to work with this is to test for it:

print "test for comment not found:"
for line in SampleLines:
    i = line.find(comments)
    if i == -1:
        line = line.strip()
    else:
        line = line[:i].strip()
    print line

which does seem like a lot of extra code.

In this case, that wasn't' done, as most of the time there is a newline at the end that can be thrown away anyway, so the -1 index is OK. So that inspired the following solution -- just add an extra space every time:

print "simply pad the line with a space:"
for line in SampleLines:
    line += " "
    line = line[:(line).find(comments)].strip()
    print line

an extra string creation, but simple.

pattern = re.compile(r"""
    ^\s* # leading white space
    (.*) # Data
    %s?  # Zero or one comment character
    (.*) # Comments
    \s*$ # Trailing white space
    """%comments, re.VERBOSE)

This pattern fails if the last character of the line is a comment character, and if it is a comment only line, though I'm sure that could be fixed. I still prefer the python string methods approaches, though.

I've enclosed a little test code, that gives these results:

old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5

with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

My suggestions work on all my test cases. We really should put these, and others, into a real unit test when this fix is added.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[EMAIL PROTECTED]
#!/usr/bin/env python

"""
test of loadtext issue
"""

comments = "#"

SampleLines = [" 1 2 3 4 5\n",
               " 1 2 3 4 5",
               " 1 2 3 4 5#",
               "  # 1 2 3 4 5",
               ]


#SampleLines = ["a line with a comment # this is the comment"
#               "# a comment-only line",
#               " a line with no comment, and no newline",
#               " a line with a trailing comment character, and no newline#",
#               ]

print "old way -- this fails with no comment of newline"
for line in SampleLines: 
    line = line[:line.find(comments)].strip()
    print line

print "with regular expression:"
import re
pattern = re.compile(r"""
    ^\s* # leading white space
    (.*) # Data
    %s?  # Zero or one comment character
    (.*) # Comments
    \s*$ # Trailing white space
    """%comments, re.VERBOSE)

match = pattern.search(line)
line, comment = match.groups()
for line in SampleLines:
    match = pattern.search(line)
    line, comment = match.groups()
    print line

print "simply pad the line with a space:"
for line in SampleLines: 
    line += " "
    line = line[:(line).find(comments)].strip()
    print line

print "test for comment not found:"
for line in SampleLines:
    i = line.find(comments)
    if i == -1:
        line = line.strip() 
    else:
        line = line[:i].strip()
    print line

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to