David Huard wrote:
Would everyone be satisfied with a solution using regular expressions ?
Maybe it's because regular expressions make me itch, but I think it's
overkill for this.
The issue here is a result of what I consider a wart in python's string
methods -- string.find() returns a valid index( -1 ) when it fails to
find anything. The usual way to work with this is to test for it:
print "test for comment not found:"
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip()
else:
line = line[:i].strip()
print line
which does seem like a lot of extra code.
In this case, that wasn't' done, as most of the time there is a newline
at the end that can be thrown away anyway, so the -1 index is OK. So
that inspired the following solution -- just add an extra space every time:
print "simply pad the line with a space:"
for line in SampleLines:
line += " "
line = line[:(line).find(comments)].strip()
print line
an extra string creation, but simple.
pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s? # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)
This pattern fails if the last character of the line is a comment
character, and if it is a comment only line, though I'm sure that could
be fixed. I still prefer the python string methods approaches, though.
I've enclosed a little test code, that gives these results:
old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5
with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
My suggestions work on all my test cases. We really should put these,
and others, into a real unit test when this fix is added.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
[EMAIL PROTECTED]
#!/usr/bin/env python
"""
test of loadtext issue
"""
comments = "#"
SampleLines = [" 1 2 3 4 5\n",
" 1 2 3 4 5",
" 1 2 3 4 5#",
" # 1 2 3 4 5",
]
#SampleLines = ["a line with a comment # this is the comment"
# "# a comment-only line",
# " a line with no comment, and no newline",
# " a line with a trailing comment character, and no newline#",
# ]
print "old way -- this fails with no comment of newline"
for line in SampleLines:
line = line[:line.find(comments)].strip()
print line
print "with regular expression:"
import re
pattern = re.compile(r"""
^\s* # leading white space
(.*) # Data
%s? # Zero or one comment character
(.*) # Comments
\s*$ # Trailing white space
"""%comments, re.VERBOSE)
match = pattern.search(line)
line, comment = match.groups()
for line in SampleLines:
match = pattern.search(line)
line, comment = match.groups()
print line
print "simply pad the line with a space:"
for line in SampleLines:
line += " "
line = line[:(line).find(comments)].strip()
print line
print "test for comment not found:"
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip()
else:
line = line[:i].strip()
print line
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion