FWIW, when I needed a fast Fixed Width reader for a very large dataset last year, I found that np.genfromtext() was faster than pandas' read_fwf(). IIRC, pandas' text reading code fell back to pure python for fixed width scenarios.
On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal < chris.bar...@noaa.gov> wrote: > Grabbing the pandas csv reader would be great, and I hope it happens > sooner than later, though alas, I haven't the spare cycles for it either. > > In the meantime though, can we put a deprecation Warning in when using > fromstring() on text files? It's really pretty broken. > > -Chris > > On Oct 23, 2015, at 4:02 PM, Jeff Reback <jeffreb...@gmail.com> wrote: > > > > On Oct 23, 2015, at 6:49 PM, Nathaniel Smith <n...@pobox.com> wrote: > > On Oct 23, 2015 3:30 PM, "Jeff Reback" <jeffreb...@gmail.com> wrote: > > > > On Oct 23, 2015, at 6:13 PM, Charles R Harris <charlesr.har...@gmail.com> > wrote: > > > >> > >> > >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal < > chris.bar...@noaa.gov> wrote: > >>> > >>> > >>>> I think it would be good to keep the usage to read binary data at > least. > >>> > >>> > >>> Agreed -- it's only the text file reading I'm proposing to deprecate. > It was kind of weird to cram it in there in the first place. > >>> > >>> Oh, fromfile() has the same issues. > >>> > >>> Chris > >>> > >>> > >>>> Or is there a good alternative to `np.fromstring(<bytes>, > dtype=...)`? -- Marten > >>>> > >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker <chris.bar...@noaa.gov> > wrote: > >>>>> > >>>>> There was just a question about a bug/issue with scipy.fromstring > (which is numpy.fromstring) when used to read integers from a text file. > >>>>> > >>>>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html > >>>>> > >>>>> fromstring() is bugging and inflexible for reading text files -- and > it is a very, very ugly mess of code. I dug into it a while back, and gave > up -- just to much of a mess! > >>>>> > >>>>> So we really should completely re-implement it, or deprecate it. I > doubt anyone is going to do a big refactor, so that means deprecating it. > >>>>> > >>>>> Also -- if we do want a fast read numbers from text files function > (which would be nice, actually), it really should get a new name anyway. > >>>>> > >>>>> (and the hopefully coming new dtype system would make it easier to > write cleanly) > >>>>> > >>>>> I'm not sure what deprecating something means, though -- have it > raise a deprecation warning in the next version? > >>>>> > >> > >> There was discussion at SciPy 2015 of separating out the text reading > abilities of Pandas so that numpy could include it. We should contact Jeff > Rebeck and see about moving that forward. > > > > > > IIRC Thomas Caswell was interested in doing this :) > > When he was in Berkeley a few weeks ago he assured me that every night > since SciPy he has dutifully been feeling guilty about not having done it > yet. I think this week his paltry excuse is that he's "on his honeymoon" or > something. > > ...which is to say that if someone has some spare cycles to take this over > then I think that might be a nice wedding present for him :-). > > (The basic idea is to take the text reading backend behind pandas.read_csv > and extract it into a standalone package that pandas could depend on, and > that could also be used by other packages like numpy (among others -- I > thing dato's SFrame package has a fork of this code as well?)) > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > I can certainly provide guidance on how/what to extract but don't have > spare cycles myself for this :( > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion