Stéfan van der Walt wrote: > 2008/9/9 Christopher Barker <[EMAIL PROTECTED]>:
>> Anyone want to help with improvements to fromfile() for text files? > > This is low hanging fruit for anyone with some experience in C. We > can definitely get it done for 1.3. Chris, would you file a ticket > and add the detail from your mailing list posts, if that hasn't > already been done? Done: http://scipy.org/scipy/numpy/ticket/909 ( By the way, is there a way to fix the typo in the ticket title? --oops!) There are a few fromfile() related tickets that I referenced as well. It's not totally straightforward what should be done, so I've included the text of the ticket here to start a discussion: Proposed Enhancements and bug fixes for fromfile() and fromstring() text handling: Motivation: The goal of the fromfile() text file handling capability is to enable users to write code that can read a lot of numbers from a text file into an array. Python provides a lot of nifty text processing capabilities, and there are a number of higher level facilities for reading blocks of data (including numpy.loadtxt). These are very capable, but there really is a significant performance hit, at least when loading 10s of thousands of numbers into a file. We don't want to write all of loadtxt() and friends in C. Rather, the goal is to allow the simple cases to be done very efficiently, and hopefully fancier text reading packages can build on it to add more features. Unfortunately, the current (numpy version 1.2) version has a few bugs and limitations that keep of from being nearly as useful as it could be. Possible features: * Create fromtextfile() and fromtextstring functions, distinct from fromfile() and fromstring(). It really is a different functionality. fromfile() could still call fromtextfile() for backward compatibility. * Allow more than one separator? for example, a comma or whitespace? In the general case, the user could perhaps specify any number of separators, though I doubt that would be useful in practice. At the very least, however, fromtextfile() should support reading files that look like: 43.5, 345.6, 123.456, 234.33 34.5, 22.57, 2345, 2345, 252 ... That is, comma separated, but being able to read multiple lines in one shot. The easiest way to support that would probably be to always allow whitespace as a separator, and add the one passed in. I can't think of a reason not to do this, but maybe I'm not very imaginative. * Allow the user to specify a shape for the output array. There may be little point, as all this does is save a calls to reshape(), but it may be another way to support the above. i.e. you could read that data with: a = np.fromtextfile(infile, dtype=np.float, sep=',', shape=(-1, 4)) Then it would know to skip the newlines every 4 elements. * Allow the user to specify a comment string. The reader would then skip everything in the file between the comment string and a newline. Maybe Universal newline -- any of \r, \n or \r\n. Or simply expect that the user has opened the file with mode 'U' if they want that. This could also be extended to support C-style comments with an opening and closing character sequence, but that's a lot less common. * Allow the user to specify a Locale. It may be best to be able to specify a locale, rather than relying on the system on (whether '.' or ',' is the decimal separator, for instance. (ticket #884) * parsing of "Inf" and the like that doesn't depend on system (ticket #510). This would be nice, but maybe too difficult -- would we need to write our own scanf? Bugs to be fixed: ¶ * fromfile() and fromstring handling malformed data poorly: ticket #883 * Any others? NOTE: my C is pretty lame, or I'd do some of this. I could help out with writing tests, etc. though. Thanks all, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion