On Thu, 06 Jul 2006 03:34:32 -0700, manstey wrote:

> Hi,
> 
> I have a text file called a.txt:
> 
> # comments
> [('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
> [('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
> [('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]
> 
> I read it using this:
> 
> filAnsMorph = codecs.open('a.txt', 'r', 'utf-8') # Initialise input
> file
> dicAnsMorph = {}
> for line in filAnsMorph:
>     if line[0] != '#': # Get rid of comment lines
>         x = eval(line)
>         dicAnsMorph[x[0][1]] = x[1][1] # recid is key, parse dict is
> value
> 
> But it crashes every time on x = eval(line). Why is this?

Some people have incorrectly suggested the solution is to remove the
newline from the end of the line. Others have already pointed out one
possible solution.

I'd like to ask, why are you using eval in the first place?

The problem with eval is that it is simultaneously too finicky and too
powerful. It is finicky -- it has problems with lines ending with a
carriage return, empty lines, and probably other things. But it is also
too powerful. Your program wants a specific piece of data, but eval
will accept any string which is a valid Python expression. eval is quite
capable of giving you a dictionary, or an int, or just about anything --
and, depending on your code, you might not find out for a long time,
leading to hard-to-debug bugs. 

Is your data under your control? Could some malicious person inject data
into your file a.txt? If so, you should be aware of the security
implications:

# comment
[('recId', 3), ('parse', {'pos': u'np', 'gen': u'm'})]
[('recId', 5), ('parse', {'pos': u'np', 'gen': u'm'})]
# line injected by a malicious user
"__import__('os').system('echo if I were bad I could do worse')"
[('recId', 7 ), ('parse', {'pos': u'np', 'gen': u'm'})]

Now, if the malicious user can only damage their own system, maybe you
don't care -- but the security hole is there. Are you sure that no
malicious third party, given *only* write permission to the file a.txt,
could compromise your entire system?

Personally, I would never use eval on any string I didn't write myself. If
I was thinking about evaluating a user-string, I would always write a
function to parse the string and accept only the specific sort of data I
expected. In your case, a quick-and-dirty untested function might be:

def parse(s):
    """Parse string s, and return a two-item list like this:

    [tuple(string, integer), tuple(string, dict(string: string)]
    """

    def parse_tuple(s):
        """Parse a tuple with two items exactly."""
        s = s.strip()
        assert s.startswith("(")
        assert s.endswith(")")
        a, b = s[1:-1].split(",")
        return (a.strip(), b.strip())

    def parse_dict(s):
        """Parse a dict with two items exactly."""
        s = s.strip()
        assert s.startswith("{")
        assert s.endswith("}")
        a, b = s[1:-1].split(",")
        key1, value1 = a.strip().split(":")
        key2, value2 = b.strip().split(":")
        return {key1.strip(): value1.strip(), key2.strip(): value2.strip()}

    def parse_list(s):
        """Parse a list with two items exactly."""
        s = s.strip()
        assert s.startswith("[")
        assert s.endswith("]")
        a, b = s[1:-1].split(",")
        return [a.strip(), b.strip()]

    # Expected format is something like:
    # [tuple(string, integer), tuple(string, dict(string: string)]
    L = parse_list(s)
    T0 = parse_tuple(L[0])
    T1 = parse_tuple(L[1])
    T0 = (T0[0], int(T0[1]))
    T1 = (T1[0], parse_dict(T1[1]))
    return [T0, T1]


That's a bit more work than eval, but I believe it is worth it.

-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to