mlimber a écrit : > I'm writing a text processing program to process some survey results. > I'm familiar with C++ and could write it in that, but I thought I'd > try out Python. I've got a handle on the file I/O and regular > expression processing,
FWIW, and depending on your text format, there may be better solutions than regexps. > but I'm wondering about building my array of > classes (I'd probably use a struct in C++ since there are no methods, > just data). If you have no methods and you're sure you won't have no methods, then just use a dict (name-indexed record) or a tuple (position-indexed record). > I want something like (C++ code): > > struct Response > { > std::string name; > int age; > int iData[ 10 ]; > std::string sData; > }; > > // Prototype > void Process( const std::vector<Response>& ); > > int main() > { > std::vector<Response> responses; > > while( /* not end of file */ ) > { > Response r; > > // Fill struct from file > r.name = /* get the data from the file */; > r.age = /* ... */; > r.iData[0] = /* ... */; > // ... > r.sData = /* ... */; > responses.push_back( r ); > } > > // Do some processing on the responses > Process( responses ); > } > > What is the preferred way to do this sort of thing in Python? # assuming you're using a line-oriented format, and not # worrying about exception handling etc... def extract(line): data = dict() data['name'] = # get the name data['age'] = # get the age data['data'] = # etc... return data def process(responses): # code here if name == '__main__': import sys path = sys.argv[1] responses = [extract(line) for line in open(path)] process(response) If you have a very huge dataset, you may want to either use tuples instead of dicts (less overhead) and/or use a more stream-oriented approach using generators - if applyable of course (that is, if you don't need to extract all results before processing) HTH -- http://mail.python.org/mailman/listinfo/python-list