Re: [Numpy-discussion] Missing data again

Mark Wiebe Tue, 06 Mar 2012 08:38:11 -0800

Hi Pierre,

On Tue, Mar 6, 2012 at 5:48 AM, Pierre Haessig <pierre.haes...@crans.org>wrote:


> Hi Mark,
>
> I went through the NA NEP a few days ago, but only too quickly so that
> my question is probably a rather dumb one. It's about the usability of
> bitpatter-based NAs, based on your recent post :
>
> Le 03/03/2012 22:46, Mark Wiebe a écrit :
> > Also, here's a thought for the usability of NA-float64. As much as
> > global state is a bad idea, something which determines whether
> > implicit float dtypes are NA-float64 or float64 could help. In
> > IPython, "pylab" mode would default to float64, and "statlab" or
> > "pystat" would default to NA-float64. One way to write this might be:
> >
> > >>> np.set_default_float(np.nafloat64)
> > >>> np.array([1.0, 2.0, 3.0])
> > array([ 1.,  2.,  3.], dtype=nafloat64)
> > >>> np.set_default_float(np.float64)
> > >>> np.array([1.0, 2.0, 3.0])
> > array([ 1.,  2.,  3.], dtype=float64)
>
> Q: Is is an *absolute* necessity to have two separate dtypes "nafloatNN"
> and "floatNN" to enable NA bitpattern storage ?
>
> From a potential user perspective, I feel it would be nice to have NA
> and non-NA cases look as similar as possible. Your code example is
> particularly striking : two different dtypes to store (from a user
> perspective) the exact same content ! If this *could* be avoided, it
> would be great...
>

The biggest reason to keep the two types separate is performance. The
straight float dtypes map directly to hardware floating-point operations,
which can be very fast. The NA-float dtypes have to use additional logic to
handle the NA values correctly. NA is treated as a particular NaN, and if
the hardware float operations were used directly, NA would turn into NaN.
This additional logic usually means more branches, so is slower.

One possibility we could consider is to automatically convert an array's
dtype from "float64" to "nafloat64" the first time an NA is assigned. This
would have good performance when there are no NAs, but would transparently
switch on NA support when it's needed.


> I don't know how the NA machinery is working R. Does it works with a
> kind of "nafloat64" all the time or is there some type inference
> mechanics involved in choosing the appropriate type ?
>

My understanding of R is that it works with the "nafloat64" for all its
operations, yes.

Cheers,
Mark


> Best,
> Pierre
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing data again

Reply via email to