As others on this list, I've also been confused a bit by the prolific numpy interfaces to reading text. Would it be an idea to create some sort of object oriented solution for this purpose?
reader = np.FileReader('my_file.txt') reader.loadtxt() # for backwards compat.; np.loadtxt could instantiate a reader and call this function if one wants to keep the interface reader.very_general_and_typically_slow_reading(missing_data=True) reader.my_files_look_like_this_plz_be_fast(fmt='%20.8e', separator=',', ncol=2) reader.cvs_read() # same as above, but with sensible defaults reader.lazy_read() # returns a generator/iterator, so you can slice out a small part of a huge array, for instance, even when working with text (yes, inefficient) reader.convert_line_by_line(myfunc) # line-by-line call myfunc, letting the user somehow convert easily to his/her format of choice: netcdf, hdf5, ... Not fast, but convenient Another option is to create a hierarchy of readers implemented as classes. Not sure if the benefits outweigh the disadvantages. Just a crazy idea - it would at least gather all the file reading interfaces into one place (or one object hierarchy) so folks know where to look. The whole numpy namespace is a bit cluttered, imho, and for newbies it would be beneficial to use submodules to a greater extent than today - but that's a more long-term discussion. Paul On 23. feb. 2012, at 21:08, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to the > list. > > In fact, we have someone starting work on it this week. It is his first > project so it will take him a little time to get up to speed on it, but he > will contact Wes and work with him and report progress to this list. > > Integration with np.loadtxt is a high-priority. I think loadtxt is now the > 3rd or 4th "text-reading" interface I've seen in NumPy. I have no interest > in making a new one if we can avoid it. But, we do need to make it faster > with less memory overhead for simple cases like Wes describes. > > -Travis > > > > On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: > >> Hi, >> >> 23.02.2012 20:32, Wes McKinney kirjoitti: >> [clip] >>> To be clear: I'm going to do this eventually whether or not it >>> happens in NumPy because it's an existing problem for heavy >>> pandas users. I see no reason why the code can't emit structured >>> arrays, too, so we might as well have a common library component >>> that I can use in pandas and specialize to the DataFrame internal >>> structure. >> >> If you do this, one useful aim could be to design the code such that it >> can be used in loadtxt, at least as a fast path for common cases. I'd >> really like to avoid increasing the number of APIs for text file loading. >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion