Brendon> Turns out that the website in question stores its data in the Brendon> format of a Python list Brendon> (http://quotes.nasdaq.com/quote.dll?page=nasdaq100, search the Brendon> source for "var table_body"). So, the part of my code that Brendon> extracts the data looks something like this:
... Brendon> return eval(data[pos1+len(START_MARKER):END_MARKER]) Brendon> My question is: what's the safe way to do this? At the top level the lines look like a Python list. On a line-by-line basis they also have consistent structure. Read it line-by-line, parse the lines (using regular expressions or whatever), then append the parsed values to a list, something like (untested): import re symbolinfo = [] sympat = re.compile( r'\[', r'"(?P<sym>[^"]+)",' r' *"(?P<name>[^"]+)",' r' *(?<n1>[^,]+,' r' *(?<n2>[^,]+,' r' *(?<n3>[^,]+,' r' *(?<n4>[^,]+,' r' *(?<n5>[^,]+,' r' *"(?P<s1>[^"]*)" r' *"(?P<s2>[^"]*)" r'\]') for line in urllib.urlopen("http://..."): mat = sympat.match(line) if mat is not None: symbolinfo.append(mat.groupdict()) The regular expression is fairly fragile, but that's okay. If their format changed from a list of ten elements to a list of eight or twelve elements, you'd probably be interested in knowing about that asap. eval() probably wouldn't fail unless they completely butchered the table syntax. With a small amount of input massaging, you could do this more cleanly with the csv module. That's left as an exercise for the reader. Skip -- http://mail.python.org/mailman/listinfo/python-list