On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig <pierre.haes...@crans.org>wrote:
> Hi, > > Thanks you very much for your lights ! > > Le 06/03/2012 21:59, Nathaniel Smith a écrit : > > Right -- R has a very impoverished type system as compared to numpy. > > There's basically four types: "numeric" (meaning double precision > > float), "integer", "logical" (boolean), and "character" (string). And > > in practice the integer type is essentially unused, because R parses > > numbers like "1" as being floating point, not integer; the only way to > > get an integer value is to explicitly cast to it. Each of these types > > has a specific bit-pattern set aside for representing NA. And... > > that's it. It's very simple when it works, but also very limited. > I also suspected R to be less powerful in terms of types. > However, I think the fact that "It's very simple when it works" is > important to take into account. At the end of the day, when using all > the fanciness it is not only about "can I have some NAs in my array ?" > but also "how *easily* can I have some NAs in my array ?". It's about > balancing the "how easy" and the "how powerful". > > The easyness-of-use is the reason of my concern about having separate > types "nafloatNN" and "floatNN". Of course, I won't argue that "not > breaking everything" is even more important !! > > Coming back to Travis proposition "bit-pattern approaches to missing > data (*at least* for float64 and int32) need to be implemented.", I > wonder what is the amount of extra work to go from nafloat64 to > nafloat32/16 ? Is there an hardware support NaN payloads with these > smaller floats ? If not, or if it is too complicated, I feel it is > acceptable to say "it's too complicated" and fall back to mask. One may > have to choose between fancy types and fancy NAs... > > I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an "ignored" value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion