On Wed, Dec 7, 2011 at 8:45 PM, Bruce Southey <bsout...@gmail.com> wrote:

> On Tue, Dec 6, 2011 at 4:13 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> > On Tue, Dec 6, 2011 at 4:11 PM, Ralf Gommers
> > <ralf.gomm...@googlemail.com> wrote:
> >>
> >>
> >> On Mon, Dec 5, 2011 at 8:43 PM, Ralf Gommers <
> ralf.gomm...@googlemail.com>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> It's been a little over 6 months since the release of 1.6.0 and the NA
> >>> debate has quieted down, so I'd like to ask your opinion on the timing
> of
> >>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and
> small
> >>> improvements, plus three larger chucks of work:
> >>>
> >>> - datetime
> >>> - NA
> >>> - Bento support
> >>>
> >>> My impression is that both datetime and NA are releasable, but should
> be
> >>> labeled "tech preview" or something similar, because they may still see
> >>> significant changes. Please correct me if I'm wrong.
> >>>
> >>> There's still some maintenance work to do and pull requests to merge,
> but
> >>> a beta release by Christmas should be feasible.
> >>
> >>
> >> To be a bit more detailed here, these are the most significant pull
> requests
> >> / patches that I think can be merged with a limited amount of work:
> >> meshgrid enhancements: http://projects.scipy.org/numpy/ticket/966
> >> sample_from function: https://github.com/numpy/numpy/pull/151
> >> loadtable function: https://github.com/numpy/numpy/pull/143
> >>
> >> Other maintenance things:
> >> - un-deprecate putmask
> >> - clean up causes of "DType strings 'O4' and 'O8' are deprecated..."
> >> - fix failing einsum and polyfit tests
> >> - update release notes
> >>
> >> Cheers,
> >> Ralf
> >>
> >>
> >>> What do you all think?
> >>>
> >>>
> >>> Cheers,
> >>> Ralf
> >>
> >>
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> > This isn't the place for this discussion but we should start talking
> > about building a *high performance* flat file loading solution with
> > good column type inference and sensible defaults, etc. It's clear that
> > loadtable is aiming for highest compatibility-- for example I can read
> > a 2800x30 file in < 50 ms with the read_table / read_csv functions I
> > wrote myself recent in Cython (compared with loadtable taking > 1s as
> > quoted in the pull request), but I don't handle European decimal
> > formats and lots of other sources of unruliness. I personally don't
> > believe in sacrificing an order of magnitude of performance in the 90%
> > case for the 10% case-- so maybe it makes sense to have two functions
> > around: a superfast custom CSV reader for well-behaved data, and a
> > slower, but highly flexible, function like loadtable to fall back on.
> > I think R has two functions read.csv and read.csv2, where read.csv2 is
> > capable of dealing with things like European decimal format.
> >
> > - Wes
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> I do not agree with loadtable request simply because not wanting to
> have functions that do virtually the same thing - as the comments on
> the pull request (and Chris's email on 'Fast Reading of ASCII files').
> I would like to see a valid user space justification for including it
> because just using regex's is not a suitable justification (but I
> agree it is a interesting feature):
>

There's a number of features listed in the pull request message and Chris'
first comment, so I won't repeat those here. That it's close to being ready
is just my personal impression. There are seven participants in the pull
request including Pierre and Derek, who have both done significant work on
loadtxt / genfromtxt, and yourself. So loadtable certainly won't be merged
without the questions you raise here being resolved.


> If loadtable will be a complete replacement for genfromtxt then there
> needs a plan towards supporting all the features of genfromtxt like
> 'skip_footer' and then genfromtxt needs to be set on the path to be
> depreciated.
> If loadtable is an intermediate between loadttxt and genfromtxt, then
> loadtable needs to be clear exactly what loadtable does not do that
> genfromtxt does (anything that loadtable does and genfromtxt does not
> do, should be filed as bug against genfromtxt).
>
> Knowing the case makes it easier to provide help by directing users to
> the appropriate function and which function should have bug reports
> against. For example, loadtxt requires 'Each row in the text file must
> have the same number of values' so one can direct a user to genfromtxt
> for that case rather than filing a bug report against loadtxt.
>
> I am also somewhat concerned regarding the NA object because of the
> limited implementation available. For example, numpy.dot is not
> implemented.  Also there appears to be no plan to increase the
> implementation across numpy or support it long term.


I have the vague impression that there is such a plan, or at least the
intention to support it better over time. But it would be good if someone
could spell this out.

Ralf


> So while I have
> no problem with it being included, I do think there must be a serious
> commitment to having it fully supporting in the near future as well as
> providing a suitable long term roadmap. Otherwise it will just be a
> problematic code dump that will be difficult to support.
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to