Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Paul Anton Letnes Thu, 23 Feb 2012 21:46:10 -0800

As others on this list, I've also been confused a bit by the prolific numpy 
interfaces to reading text. Would it be an idea to create some sort of object 
oriented solution for this purpose?

reader = np.FileReader('my_file.txt')
reader.loadtxt() # for backwards compat.; np.loadtxt could instantiate a reader 
and call this function if one wants to keep the interface
reader.very_general_and_typically_slow_reading(missing_data=True)
reader.my_files_look_like_this_plz_be_fast(fmt='%20.8e', separator=',', ncol=2)
reader.cvs_read() # same as above, but with sensible defaults
reader.lazy_read() # returns a generator/iterator, so you can slice out a small 
part of a huge array, for instance, even when working with text (yes, 
inefficient)
reader.convert_line_by_line(myfunc) # line-by-line call myfunc, letting the 
user somehow convert easily to his/her format of choice: netcdf, hdf5, ... Not 
fast, but convenient

Another option is to create a hierarchy of readers implemented as classes. Not 
sure if the benefits outweigh the disadvantages.

Just a crazy idea - it would at least gather all the file reading interfaces 
into one place (or one object hierarchy) so folks know where to look. The whole 
numpy namespace is a bit cluttered, imho, and for newbies it would be 
beneficial to use submodules to a greater extent than today - but that's a more 
long-term discussion.

Paul

On 23. feb. 2012, at 21:08, Travis Oliphant wrote:

> This is actually on my short-list as well --- it just didn't make it to the 
> list. 
> 
> In fact, we have someone starting work on it this week.  It is his first 
> project so it will take him a little time to get up to speed on it, but he 
> will contact Wes and work with him and report progress to this list. 
> 
> Integration with np.loadtxt is a high-priority.  I think loadtxt is now the 
> 3rd or 4th "text-reading" interface I've seen in NumPy.  I have no interest 
> in making a new one if we can avoid it.   But, we do need to make it faster 
> with less memory overhead for simple cases like Wes describes.
> 
> -Travis
> 
> 
> 
> On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:
> 
>> Hi,
>> 
>> 23.02.2012 20:32, Wes McKinney kirjoitti:
>> [clip]
>>> To be clear: I'm going to do this eventually whether or not it
>>> happens in NumPy because it's an existing problem for heavy
>>> pandas users. I see no reason why the code can't emit structured
>>> arrays, too, so we might as well have a common library component
>>> that I can use in pandas and specialize to the DataFrame internal
>>> structure.
>> 
>> If you do this, one useful aim could be to design the code such that it
>> can be used in loadtxt, at least as a fast path for common cases. I'd
>> really like to avoid increasing the number of APIs for text file loading.
>> 
>> -- 
>> Pauli Virtanen
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Reply via email to