Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Charles R Harris
2011/10/28 Stéfan van der Walt 

> On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris
>  wrote:
> > Memory use is a known problem. One way to start addressing it might be to
> > implement a "bit" arraytype. It might even be possible to prototype that
> on
> > top of the existing types. Views make bit arrays a bit more interesting
> ;)
>
> Since 1/8 can be represented exactly in floating point, I guess it's
> technically possible to support non-integer strides?
>

I think the same effect could be obtained with fixed point integers, i.e.,
the last three bits are the fractional part.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Stéfan van der Walt
On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris
 wrote:
> Memory use is a known problem. One way to start addressing it might be to
> implement a "bit" arraytype. It might even be possible to prototype that on
> top of the existing types. Views make bit arrays a bit more interesting ;)

Since 1/8 can be represented exactly in floating point, I guess it's
technically possible to support non-integer strides?

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Charles R Harris
On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker  wrote:

> On 10/28/11 11:37 AM, Matthew Brett wrote:
> > The main motivation for the alterNEP was our strong feeling that
> > separating ABSENT and IGNORE was easier to comprehend and cleaner.
>
> I don't know about easier to comprehend, or cleaner, but it is more
> feature-full.
>
> I see two issues here:
>
> 1) being able to distinguish between "ignore" and "not valid"
>   -- and being able to stop ignoring an ignored value.
>
> This could quite easily be accomplished with a mask approach -- indeed
> with 8 bits, you could have 8 different possible masked states (not that
> I'm suggesting that, at least not in core numpy.)
>
> However, with a bit-pattern approach, you simply can't implement
> "ignore". Once it's been set, the previous value is lost.
>
>
> 2) data size: A full mask takes extra space, sometimes a substantial
> amount -- so a bit-pattern approach would be nice.
>
>
> I like the idea (that I think Mark attempted to implement) that the
> implementation should be hidden from the user - not necessarily entirely
> hidden, but subtle enough that that casual user wouldn't need to care
> about it.
>
>
I believe the main reason it is hidden from the user is so that the
implementation can be changed without impacting existing applications.

What I would like to see at this point is folks trying out the software and
asking questions on the list like: "I want to do A and tried B, which didn't
work. Any suggestions?" In short, I want people to actually use the software
to see what issues arise so that we can fix things up.

Memory use is a known problem. One way to start addressing it might be to
implement a "bit" arraytype. It might even be possible to prototype that on
top of the existing types. Views make bit arrays a bit more interesting ;)

In that case, I think if we could decide that we want both "ignore" and
> "not valid" (and it seems there is a fair bit of interest in that), then
> we can proceed with a mask-based approach, and develop an API that makes
> as little reference to the mask as possible.
>
>
Then a bit-pattern approach could be developed that uses the same API --
> it would not have the "ignore" option at all, but would be the same for
> the "not valid" option.
>
> When I write this it seem entirely too complicated for both the
> developers and users, but maybe it's not -- it could be analogous to
> what we have now: arrays can be Fortran or C ordered, contiguous or not,
> be views on other arrays or not. To really make numpy dance, you need to
> understand all that, but you can also do a whole lot, and write a lot of
> generic code, without digging into that.
>
> If we do all that, maybe there could be a sparse mask implementation,
> etc. as well.
>
>
Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Chris.Barker
On 10/28/11 11:37 AM, Matthew Brett wrote:
> The main motivation for the alterNEP was our strong feeling that
> separating ABSENT and IGNORE was easier to comprehend and cleaner.

I don't know about easier to comprehend, or cleaner, but it is more 
feature-full.

I see two issues here:

1) being able to distinguish between "ignore" and "not valid"
   -- and being able to stop ignoring an ignored value.

This could quite easily be accomplished with a mask approach -- indeed 
with 8 bits, you could have 8 different possible masked states (not that 
I'm suggesting that, at least not in core numpy.)

However, with a bit-pattern approach, you simply can't implement 
"ignore". Once it's been set, the previous value is lost.


2) data size: A full mask takes extra space, sometimes a substantial 
amount -- so a bit-pattern approach would be nice.


I like the idea (that I think Mark attempted to implement) that the 
implementation should be hidden from the user - not necessarily entirely 
hidden, but subtle enough that that casual user wouldn't need to care 
about it.

In that case, I think if we could decide that we want both "ignore" and 
"not valid" (and it seems there is a fair bit of interest in that), then 
we can proceed with a mask-based approach, and develop an API that makes 
as little reference to the mask as possible.

Then a bit-pattern approach could be developed that uses the same API -- 
it would not have the "ignore" option at all, but would be the same for 
the "not valid" option.

When I write this it seem entirely too complicated for both the 
developers and users, but maybe it's not -- it could be analogous to 
what we have now: arrays can be Fortran or C ordered, contiguous or not, 
be views on other arrays or not. To really make numpy dance, you need to 
understand all that, but you can also do a whole lot, and write a lot of 
generic code, without digging into that.

If we do all that, maybe there could be a sparse mask implementation, 
etc. as well.

Maybe I'm dreaming, though...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Fri, Oct 28, 2011 at 1:52 PM, Benjamin Root  wrote:
>
>
> On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root  wrote:
>> >
>> >
>> > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett 
>> > wrote:
>> >>
>> >> You and I know that I've got an array with values [99, 100, 3] and a
>> >> mask with values [False, False, True].  So maybe I'd like to see what
>> >> happens if I take off the mask from the second value.   I know that's
>> >> what I want to do, but I don't know how to do it, because you won't
>> >> let me manipulate the mask, because I'm not allowed to know that the
>> >> NA values come from the mask.
>> >>
>> >> The alterNEP is just saying - please - be straight with me.   If
>> >> you're doing masking, show me the mask, and don't try and hide that
>> >> there are stored values underneath.
>> >>
>> >
>> > Considering that you have admitted before to not regularly using masked
>> > arrays, I seriously doubt that you would be able to judge whether this
>> > is a
>> > significant detriment or not.  My entire point that I have been making
>> > is
>> > that Mark's implementation is not the same as the current masked arrays.
>> > Instead, it is a cleaner, more mature implementation that gets rid of
>> > extraneous "features".
>>
>> This may explain why we don't seem to be getting anywhere.  I am sure
>> that Mark's implementation of masking is great.   We're not talking
>> about that.  We're talking about whether it's a good idea to make
>> masking look as though it is implementing the ABSENT idea.   That's
>> what I think is confusing, and that's the conversation I have been
>> trying to pursue.
>>
>> Best,
>>
>> Matthew
>
> Sorry if I came across too strongly there. No disrespect was intended.

I wasn't worried about the disrespect.  It's just I feel the
discussion has not been to the point.

> Personally, I think we are getting somewhere.  We have been whittling away
> what it is that we do agree upon, and have begun to specify *exactly* what
> it is that we disagree on.  I have understand your concern, and -- like I
> said in my previous email -- it makes sense from the perspective of numpy.ma
> users have had up to now.

But I'm not a numpy.ma user, I'm just someone who knows that what you
are doing is masking out values.  The fact that I do not use numpy.ma
points out that it's possible to find this highly counter-intuitive
without prior bias.

> But, I re-raise my point that I have been making
> about the need to re-think masked arrays.  If we consider masks as advanced
> slicing or boolean indexing, then being unable to access the underlying
> values actually makes a lot of sense.
>
> Consider it a contract when I pass a set of data with only certain values
> exposed.  Because I passed the data with only those values exposed, then it
> must have been entirely my intention to let the function know of only those
> values.  It would be a violation of that contract if the function obtained
> those masked values.  If I want to communicate both the original values and
> a particular mask, then I pass the array and a view with a particular mask.

This is the old discussion about what Python users expect.  I think
they expect to be treated as adults.  That is, breaking the contract
should not be easy to do by accident, but it should be allowed.

> Maybe it would be helpful that an array can never have its own mask, but
> rather, only views can carry masks?
>
> In conclusion, I submit that this is largely a problem that can be solved
> with the proper documentation.  New users who never used numpy.ma before do
> not have to concern themselves with the old way of thinking and are just
> simply taught what masked arrays "are".  Meanwhile, a special section of the
> documentation should be made that teaches numpy.ma users how masked arrays
> "should be".

I don't think documentation will solve it.  In a way, the ideal user
is someone who doesn't know what's going on, because, for a while,
they may not realize that when they thought they were doing
assignment, in fact they are doing masking.  Unfortunately, I suspect
almost everyone using these things will start to realize that, and
then they will start getting confused.  I find it confusing, and I
believe myself to understand the issues pretty well, and be of
numpy-user-range comprehension powers.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Benjamin Root
On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett wrote:

> Hi,
>
> On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root  wrote:
> >
> >
> > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett 
> > wrote:
> >>
> >> You and I know that I've got an array with values [99, 100, 3] and a
> >> mask with values [False, False, True].  So maybe I'd like to see what
> >> happens if I take off the mask from the second value.   I know that's
> >> what I want to do, but I don't know how to do it, because you won't
> >> let me manipulate the mask, because I'm not allowed to know that the
> >> NA values come from the mask.
> >>
> >> The alterNEP is just saying - please - be straight with me.   If
> >> you're doing masking, show me the mask, and don't try and hide that
> >> there are stored values underneath.
> >>
> >
> > Considering that you have admitted before to not regularly using masked
> > arrays, I seriously doubt that you would be able to judge whether this is
> a
> > significant detriment or not.  My entire point that I have been making is
> > that Mark's implementation is not the same as the current masked arrays.
> > Instead, it is a cleaner, more mature implementation that gets rid of
> > extraneous "features".
>
> This may explain why we don't seem to be getting anywhere.  I am sure
> that Mark's implementation of masking is great.   We're not talking
> about that.  We're talking about whether it's a good idea to make
> masking look as though it is implementing the ABSENT idea.   That's
> what I think is confusing, and that's the conversation I have been
> trying to pursue.
>
> Best,
>
> Matthew
>

Sorry if I came across too strongly there. No disrespect was intended.

Personally, I think we are getting somewhere.  We have been whittling away
what it is that we do agree upon, and have begun to specify *exactly* what
it is that we disagree on.  I have understand your concern, and -- like I
said in my previous email -- it makes sense from the perspective of
numpy.mausers have had up to now.  But, I re-raise my point that I
have been making
about the need to re-think masked arrays.  If we consider masks as advanced
slicing or boolean indexing, then being unable to access the underlying
values actually makes a lot of sense.

Consider it a contract when I pass a set of data with only certain values
exposed.  Because I passed the data with only those values exposed, then it
must have been entirely my intention to let the function know of only those
values.  It would be a violation of that contract if the function obtained
those masked values.  If I want to communicate both the original values and
a particular mask, then I pass the array and a view with a particular mask.

Maybe it would be helpful that an array can never have its own mask, but
rather, only views can carry masks?

In conclusion, I submit that this is largely a problem that can be solved
with the proper documentation.  New users who never used numpy.ma before do
not have to concern themselves with the old way of thinking and are just
simply taught what masked arrays "are".  Meanwhile, a special section of the
documentation should be made that teaches numpy.ma users how masked arrays
"should be".

Cheers!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Charles R Harris
2011/10/28 Stéfan van der Walt 

> On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root  wrote:
> >
> > 2011/10/28 Stéfan van der Walt 
> >> The
> >> implementation as it stands essentially gives us a faster and more
> >> integrated version of numpy.ma; but it has become clear from this
> >> conversation that such an approach overlooks a very common subset of
> >> masked-related problems.
> >>
> > Which are...? (given the history of this discussion, let's not assume
> > anything is clear).
>
> The case where the number of elements in the array vastly outnumbers
> the number of masked elements.  (Images, 3D volumes, large
> time-series, tables, etc.)
>
> E.g., if you are taking measurements from a sensor, but once in a blue
> moon the sensor messes up, you simply want to mark those values as
> missing, but you do not want to allocate a whole new chunk of memory
> to do so.
>
> I had a chat with JB Poline this morning, who mentioned that sparse
> matrix storage of the mask may also be an option.  Those containers
> typically trade off insertion vs. lookup speeds, so I'm not sure
> whether it'd be feasible, but I like the idea.
>
>
I think simple run length encoding might work well with masks.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root  wrote:
>
>
> On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett 
> wrote:
>>
>> You and I know that I've got an array with values [99, 100, 3] and a
>> mask with values [False, False, True].  So maybe I'd like to see what
>> happens if I take off the mask from the second value.   I know that's
>> what I want to do, but I don't know how to do it, because you won't
>> let me manipulate the mask, because I'm not allowed to know that the
>> NA values come from the mask.
>>
>> The alterNEP is just saying - please - be straight with me.   If
>> you're doing masking, show me the mask, and don't try and hide that
>> there are stored values underneath.
>>
>
> Considering that you have admitted before to not regularly using masked
> arrays, I seriously doubt that you would be able to judge whether this is a
> significant detriment or not.  My entire point that I have been making is
> that Mark's implementation is not the same as the current masked arrays.
> Instead, it is a cleaner, more mature implementation that gets rid of
> extraneous "features".

This may explain why we don't seem to be getting anywhere.  I am sure
that Mark's implementation of masking is great.   We're not talking
about that.  We're talking about whether it's a good idea to make
masking look as though it is implementing the ABSENT idea.   That's
what I think is confusing, and that's the conversation I have been
trying to pursue.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Stéfan van der Walt
On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root  wrote:
> Considering that you have admitted before to not regularly using masked
> arrays, I seriously doubt that you would be able to judge whether this is a
> significant detriment or not.

Let's not be unreasonable; Matthew has a valid concern (maybe from
experience in teaching numpy): once the machinery under the hood
becomes opaque, it becomes much harder to use numpy intuitively.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Benjamin Root
On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett wrote:

>
> You and I know that I've got an array with values [99, 100, 3] and a
> mask with values [False, False, True].  So maybe I'd like to see what
> happens if I take off the mask from the second value.   I know that's
> what I want to do, but I don't know how to do it, because you won't
> let me manipulate the mask, because I'm not allowed to know that the
> NA values come from the mask.
>
> The alterNEP is just saying - please - be straight with me.   If
> you're doing masking, show me the mask, and don't try and hide that
> there are stored values underneath.
>
>
Considering that you have admitted before to not regularly using masked
arrays, I seriously doubt that you would be able to judge whether this is a
significant detriment or not.  My entire point that I have been making is
that Mark's implementation is not the same as the current masked arrays.
Instead, it is a cleaner, more mature implementation that gets rid of
extraneous "features".  Instead of fussing around with a mask directly in
the array, the user of masked arrays should now consider the use of views as
the masks.  It works beautifully because it works off a well-documented and
well-understood feature of numpy.

Of course, when you look at the feature in your way, with those
expectations, then I would agree that it might be confusing.  But given that
this is a completely new feature, then we have the opportunity to properly
document and show how to rethink a user's pre-conceptions of masked arrays.

Users can keep the original array as a plain array and have mask1, mask2,
mask3, etc as being separate views.  It is a completely different way to
think of masked arrays, and considering that masked arrays are not widely
used in other toolkits, I think we can be free to change the paradigm.
Further, there is no reason why we can't keep numpy.ma around for backwards
compatibility and for those who "just don't get it".

Cheers,
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Stéfan van der Walt
On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root  wrote:
>
> 2011/10/28 Stéfan van der Walt 
>> The
>> implementation as it stands essentially gives us a faster and more
>> integrated version of numpy.ma; but it has become clear from this
>> conversation that such an approach overlooks a very common subset of
>> masked-related problems.
>>
> Which are...? (given the history of this discussion, let's not assume
> anything is clear).

The case where the number of elements in the array vastly outnumbers
the number of masked elements.  (Images, 3D volumes, large
time-series, tables, etc.)

E.g., if you are taking measurements from a sensor, but once in a blue
moon the sensor messes up, you simply want to mark those values as
missing, but you do not want to allocate a whole new chunk of memory
to do so.

I had a chat with JB Poline this morning, who mentioned that sparse
matrix storage of the mask may also be an option.  Those containers
typically trade off insertion vs. lookup speeds, so I'm not sure
whether it'd be feasible, but I like the idea.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Fri, Oct 28, 2011 at 12:15 PM, Lluís  wrote:

> Summarizing: let's forget for a moment that "mask" has a meaning in english:

This is at the core of the problem.  You and I know what's really
going on - there's a mask over the data.   But in what follows we're
going to try and pretend that is not what is going on.  The result is
something that is rather hard to understand, and, when you do
understand it, it's surprising and inconvenient.   This is all because
we tried to conceal what was really going on.

>             - "maskna" corresponds to ABSENT
>             - "ownmaskna" corresponds to IGNORED
>
> The problem here is that of the two implementation mechanisms (masks and
> bitpatterns), only the first can provide both semantics.

But let's be clear.   The current masked array implementation is made
so it looks like ABSENT, and makes IGNORED hard to get to.

> Let's start with an array that already supports NAs:
>
> In [1]: a = np.array([1, 2, 3], maskna = True)
>
>
>
> ABSENT (destructive NA assignment)
> --
>
> Once you assign NA, even if you're using NA masks, the value seems to be lost
> forever (i.e., the assignment is destructive regardless of the value):
>
> In [2]: b = a.view()
> In [3]: c = a.view(maskna = True)
> In [4]: b[0] = np.NA
> In [5]: a
> Out[5]: array([NA, 2, 3])
> In [6]: b
> Out[6]: array([NA, 2, 3])
> In [7]: c
> Out[7]: array([NA, 2, 3])

Right - the mask (fundamentally an IGNORED signal) is pretending to
implement ABSENT.  But - as you point out below - I'm pasting it here
- in fact it's IGNORED.

> In [21]: a = np.array([1, 2, 3])
> Out[21]: array([1, 2, 3])
> In [22]: b = a.view(maskna = True)
> In [23]: b[0] = np.NA
> In [24]: a
> Out[24]: array([1, 2, 3])
> In [25]: b
> Out[25]: array([NA, 2, 3])

But now - I've done this:

>>> a = np.array([99, 100, 3], maskna=True)
>>> a[0:2] = np.NA

You and I know that I've got an array with values [99, 100, 3] and a
mask with values [False, False, True].  So maybe I'd like to see what
happens if I take off the mask from the second value.   I know that's
what I want to do, but I don't know how to do it, because you won't
let me manipulate the mask, because I'm not allowed to know that the
NA values come from the mask.

The alterNEP is just saying - please - be straight with me.   If
you're doing masking, show me the mask, and don't try and hide that
there are stored values underneath.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Benjamin Root
2011/10/28 Stéfan van der Walt 

> On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root  wrote:
> > this by making missing data front-and-center.  However, my belief is that
> > Mark's approach is easier to comprehend and is cleaner.  Cleaner features
> > means that it is more likely to be used.
>
> Cleaner features may be easier to adopt, but whether they are used or
> not depends on whether they address the problem in hand.  The
> implementation as it stands essentially gives us a faster and more
> integrated version of numpy.ma; but it has become clear from this
> conversation that such an approach overlooks a very common subset of
> masked-related problems.
>
>
Which are...? (given the history of this discussion, let's not assume
anything is clear).


> We should be  concerned about memory use; we often don't have too much
> of it, and accessing it is slow.
>
> Would it be workable to store 8 mask bits per byte instead?  I don't
> think it should impact on the speed much, and we can always generate a
> full mask for the user on request.
>
>
I suggested such an idea a while back.  This is part of the reason why Mark
decided that the masks should not be exposed for direct access in case it is
decided that masks could be implemented that way.  I have a vague
recollection of him commenting about some tests he did along that route, but
I don't remember it.

Cheers,
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Stéfan van der Walt
On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root  wrote:
> this by making missing data front-and-center.  However, my belief is that
> Mark's approach is easier to comprehend and is cleaner.  Cleaner features
> means that it is more likely to be used.

Cleaner features may be easier to adopt, but whether they are used or
not depends on whether they address the problem in hand.  The
implementation as it stands essentially gives us a faster and more
integrated version of numpy.ma; but it has become clear from this
conversation that such an approach overlooks a very common subset of
masked-related problems.

We should be  concerned about memory use; we often don't have too much
of it, and accessing it is slow.

Would it be workable to store 8 mask bits per byte instead?  I don't
think it should impact on the speed much, and we can always generate a
full mask for the user on request.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Lluís
I haven't actually tested the code, but AFAIK the following is a short overview
with examples of how the two orthogonal feature axis (ABSENT/IGNORE and
PROPAGATE/SKIP) are related and how it all is supposed to work.

I have never talked to Mark or anybody else in this list (that is, outside of
this list), so I may well be mistaken. Thus, sorry if there are any inaccuracies
and/or if you are already aware of what I'm describing here.

So please tell me if this has helped clarify why I (and I hope others) think the
implementation mechanism is independent of the semantics.


Lluis



ABSENT vs IGNORE


Travis Oliphant writes:
> As I mentioned.   I find the ability to separate an ABSENT idea from an 
> IGNORED idea convincing.In other words, I think distinguishing between 
> masks
> and bit-patterns is not just an implementation detail, but provides a useful 
> concept for multiple use-cases.

I think it's an implementation detail as long as you have two clear ways of
separating them.

Summarizing: let's forget for a moment that "mask" has a meaning in english:
 - "maskna" corresponds to ABSENT
 - "ownmaskna" corresponds to IGNORED

The problem here is that of the two implementation mechanisms (masks and
bitpatterns), only the first can provide both semantics.


Let's start with an array that already supports NAs:

In [1]: a = np.array([1, 2, 3], maskna = True)



ABSENT (destructive NA assignment)
--

Once you assign NA, even if you're using NA masks, the value seems to be lost
forever (i.e., the assignment is destructive regardless of the value):

In [2]: b = a.view()
In [3]: c = a.view(maskna = True)
In [4]: b[0] = np.NA
In [5]: a
Out[5]: array([NA, 2, 3])
In [6]: b
Out[6]: array([NA, 2, 3])
In [7]: c
Out[7]: array([NA, 2, 3])


This is the default behaviour, and is probably what the regular user expects by
what has been learned from previous uses of the "view" method.

Note that here "maskna" acts as an idempotent operation. Once an array has the
"maskna" property, all its views will transitively (and destructively) use it.

Also note that an array copy will make a copy of both "regular" data and NA
values, as expected.



IGNORED (non-destructive NA assignment)
---

But you can also have non-destructuve NA assignments, although *only* if you
explicitly (and thus purposefully) ask for it -> ownmaskna

In [8]: b = a.view(ownmaskna = True)
In [9]: b[1] = np.NA
In [10]: a
Out[10]: array([NA, 2, 3])
In [11]: b
Out[11]: array([NA, NA, 3])
In [12]: a[2] = np.NA
In [13]: a
Out[13]: array([NA, 2, NA])
In [14]: b
Out[14]: array([NA, NA, 3])


The mask is a copy:

In [15]: a[0] = 1
In [16]: a
Out[16]: array([1, 2, 3], maskna = True)
In [17]: b
Out[17]: array([NA, NA, 3])


But the data itself is not (aka, non-NA values are *always* destructive, but I
think this is out of the scope of this discussion):

In [17]: a[0] = -10
In [18]: a[2] = -30
In [19]: a
Out[19]: array([-10, 2, -30], maskna = True)
In [20]: b
Out[20]: array([NA, NA, -30])



The dark corner
---

The only potential misunderstanding can be the creation of a NA-masked array
from a "regular" array.

This is precisely why I put this case at the end, as it seems to break the
intuition some people have about assignment being always destructive (unless you
explicitly ask for IGNORED, which is not the case):

In [21]: a = np.array([1, 2, 3])
Out[21]: array([1, 2, 3])
In [22]: b = a.view(maskna = True)
In [23]: b[0] = np.NA
In [24]: a
Out[24]: array([1, 2, 3])
In [25]: b
Out[25]: array([NA, 2, 3])


This is in fact a corner case, and there is no obvious (and efficient!) way to
handle it. As "a" is just a "regular" array, and has no support for any type of
NA values (neither masks nor bit-patterns), assignments to any of its views
cannot, in any case, be destructive.

Note that the previous holds true because it currently is a design decision to
forbid the in-flight conversion from "regular" to "NA-enabled" arrays.


In fact I forgot that, when reading the docs in [1], I thought that a slight
change could make it all feel more consistent: the view of a regular array can
have NA values only if "ownmaskna" is used (IGNORED/non-destructive NA
assignments), and will give an error if "maskna" is used in entry number 19.

[1] 
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views



PROPAGATE vs SKIP
=

I've also read some comments regarding this. Maybe I didn't explain myself
correctly in previous mails, or maybe I just misunderstood other people's mails
(which might not be about this at all).


PROPAGATE
-

All ufuncs in ndarray propagate NA values.

Note that ABSENT (destructive NA-assignment) is also a default, so we could say
that the default is R-like behaviour (AFAIK).


SKIP


You have a different array type (let's call it skip_array), where all ufuncs do
*not* propagate NA value

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root  wrote:
> On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root  wrote:
>> >
>> >
>> > On Thursday, October 27, 2011, Charles R Harris
>> > 
>> > wrote:
>> >>
>> >>
>> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant
>> >> 
>> >> wrote:
>> >>>
>> >>> That is a pretty good explanation.   I find myself convinced by
>> >>> Matthew's
>> >>> arguments.    I think that being able to separate ABSENT from IGNORED
>> >>> is a
>> >>> good idea.   I also like being able to control SKIP and PROPAGATE (but
>> >>> I
>> >>> think the current implementation allows this already).
>> >>>
>> >>> What is the counter-argument to this proposal?
>> >>>
>> >>
>> >> What exactly do you find convincing? The current masks propagate by
>> >> default:
>> >>
>> >> In [1]: a = ones(5, maskna=1)
>> >>
>> >> In [2]: a[2] = NA
>> >>
>> >> In [3]: a
>> >> Out[3]: array([ 1.,  1.,  NA,  1.,  1.])
>> >>
>> >> In [4]: a + 1
>> >> Out[4]: array([ 2.,  2.,  NA,  2.,  2.])
>> >>
>> >> In [5]: a[2] = 10
>> >>
>> >> In [5]: a
>> >> Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)
>> >>
>> >>
>> >> I don't see an essential difference between the implementation using
>> >> masks
>> >> and one using bit patterns, the mask when attached to the original
>> >> array
>> >> just adds a bit pattern by extending all the types by one byte, an
>> >> approach
>> >> that easily extends to all existing and future types, which is why Mark
>> >> went
>> >> that way for the first implementation given the time available. The
>> >> masks
>> >> are hidden because folks wanted something that behaved more like R and
>> >> also
>> >> because of the desire to combine the missing, ignore, and later
>> >> possibly bit
>> >> patterns in a unified manner. Note that the pseudo assignment was also
>> >> meant
>> >> to look like R. Adding true bit patterns to numpy isn't trivial and I
>> >> believe Mark was thinking of parametrized types for that.
>> >>
>> >> The main problems I see with masks are unified storage and possibly
>> >> memory
>> >> use. The rest is just behavor and desired API and that can be adjusted
>> >> within the current implementation. There is nothing essentially masky
>> >> about
>> >> masks.
>> >>
>> >> Chuck
>> >>
>> >>
>> >
>> > I  think chuck sums it up quite nicely.  The implementation detail about
>> > using mask versus bit patterns can still be discussed and addressed.
>> > Personally, I just don't see how parameterized dtypes would be easier to
>> > use
>> > than the pseudo assignment.
>> >
>> > The elegance of mark's solution was to consider the treatment of missing
>> > data in a unified manner.  This puts missing data in a more prominent
>> > spot
>> > for extension builders, which should greatly improve support throughout
>> > the
>> > ecosystem.
>>
>> Are extension builders then required to use the numpy C API to get
>> their data?  Speaking as an extension builder, I would rather you gave
>> me the mask and the bitpattern information and let me do that myself.
>>
>
> Forgive me, I wasn't clear.  What I am speaking of is more about a typical
> human failing.  If a programmer for a module never encounters masked arrays,
> then when they code up a function to operate on numpy data, it is quite
> likely that they would never take it into consideration.  Notice the
> prolific use of "np.asarray()" even within the numpy codebase, which
> destroys masked arrays.

Hmm - that sounds like it could cause some surprises.

So, what you were saying was just that it was good that masked arrays
were now closer to the core?   That's reasonable, but I don't think
it's relevant to the current discussion.  I think we all agree it is
nice to have masked arrays in the core.

> However, by making missing data support more integral into the core of
> numpy, then it is far more likely that a programmer would take it into
> consideration when designing their algorithm, or at least explicitly
> document that their module does not support missing data.  Both NEPs does
> this by making missing data front-and-center.  However, my belief is that
> Mark's approach is easier to comprehend and is cleaner.  Cleaner features
> means that it is more likely to be used.

The main motivation for the alterNEP was our strong feeling that
separating ABSENT and IGNORE was easier to comprehend and cleaner.  I
think it would be hard to argue that the aterNEP idea is not more
explicit.

>> > By letting there be a single missing data framework (instead of
>> > two) all that users need to figure out is when they want nan-like
>> > behavior
>> > (propagate) or to be more like masks (skip).  Numpy takes care of the
>> > rest.
>> >  There is a reason why I like using masked arrays because I don't have
>> > to
>> > use nansum in my library functions to guard against the possibility of
>> > receiving nans.  Duck-typing is a good thing.
>> >
>> > My argument against s

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Benjamin Root
On Fri, Oct 28, 2011 at 12:58 PM, Gary Strangman  wrote:

>
> >> I wonder if that might be handled as a scikits-image extension, rather
> >> than core numpy?
> >
> > I think Stefan and Nathaniel and Gary Strangman and others are saying
> > we don't want to pay the price of a large memory hike for masking.   I
> > suspect that Nathaniel is right, and that a large majority of those of
> > us who want 'missing data' functionality, also want what we've called
> > ABSENT missing values, and care about memory.
>
> FWIW, Matthew correctly interprets my concerns. I also have very large
> non-image datasets, so pushing the problem into a more custom extension
> (esp. one focused on images) doesn't help me much.
>
> -best
> Gary
>
>
I would wonder if the masks could benefit from the approach used for the
"carray" (compressed arrays) project?

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Benjamin Root
On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett wrote:

> Hi,
>
> On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root  wrote:
> >
> >
> > On Thursday, October 27, 2011, Charles R Harris <
> charlesr.har...@gmail.com>
> > wrote:
> >>
> >>
> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <
> oliph...@enthought.com>
> >> wrote:
> >>>
> >>> That is a pretty good explanation.   I find myself convinced by
> Matthew's
> >>> arguments.I think that being able to separate ABSENT from IGNORED
> is a
> >>> good idea.   I also like being able to control SKIP and PROPAGATE (but
> I
> >>> think the current implementation allows this already).
> >>>
> >>> What is the counter-argument to this proposal?
> >>>
> >>
> >> What exactly do you find convincing? The current masks propagate by
> >> default:
> >>
> >> In [1]: a = ones(5, maskna=1)
> >>
> >> In [2]: a[2] = NA
> >>
> >> In [3]: a
> >> Out[3]: array([ 1.,  1.,  NA,  1.,  1.])
> >>
> >> In [4]: a + 1
> >> Out[4]: array([ 2.,  2.,  NA,  2.,  2.])
> >>
> >> In [5]: a[2] = 10
> >>
> >> In [5]: a
> >> Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)
> >>
> >>
> >> I don't see an essential difference between the implementation using
> masks
> >> and one using bit patterns, the mask when attached to the original array
> >> just adds a bit pattern by extending all the types by one byte, an
> approach
> >> that easily extends to all existing and future types, which is why Mark
> went
> >> that way for the first implementation given the time available. The
> masks
> >> are hidden because folks wanted something that behaved more like R and
> also
> >> because of the desire to combine the missing, ignore, and later possibly
> bit
> >> patterns in a unified manner. Note that the pseudo assignment was also
> meant
> >> to look like R. Adding true bit patterns to numpy isn't trivial and I
> >> believe Mark was thinking of parametrized types for that.
> >>
> >> The main problems I see with masks are unified storage and possibly
> memory
> >> use. The rest is just behavor and desired API and that can be adjusted
> >> within the current implementation. There is nothing essentially masky
> about
> >> masks.
> >>
> >> Chuck
> >>
> >>
> >
> > I  think chuck sums it up quite nicely.  The implementation detail about
> > using mask versus bit patterns can still be discussed and addressed.
> > Personally, I just don't see how parameterized dtypes would be easier to
> use
> > than the pseudo assignment.
> >
> > The elegance of mark's solution was to consider the treatment of missing
> > data in a unified manner.  This puts missing data in a more prominent
> spot
> > for extension builders, which should greatly improve support throughout
> the
> > ecosystem.
>
> Are extension builders then required to use the numpy C API to get
> their data?  Speaking as an extension builder, I would rather you gave
> me the mask and the bitpattern information and let me do that myself.
>
>
Forgive me, I wasn't clear.  What I am speaking of is more about a typical
human failing.  If a programmer for a module never encounters masked arrays,
then when they code up a function to operate on numpy data, it is quite
likely that they would never take it into consideration.  Notice the
prolific use of "np.asarray()" even within the numpy codebase, which
destroys masked arrays.

However, by making missing data support more integral into the core of
numpy, then it is far more likely that a programmer would take it into
consideration when designing their algorithm, or at least explicitly
document that their module does not support missing data.  Both NEPs does
this by making missing data front-and-center.  However, my belief is that
Mark's approach is easier to comprehend and is cleaner.  Cleaner features
means that it is more likely to be used.



> > By letting there be a single missing data framework (instead of
> > two) all that users need to figure out is when they want nan-like
> behavior
> > (propagate) or to be more like masks (skip).  Numpy takes care of the
> rest.
> >  There is a reason why I like using masked arrays because I don't have to
> > use nansum in my library functions to guard against the possibility of
> > receiving nans.  Duck-typing is a good thing.
> >
> > My argument against separating IGNORE and PROPAGATE is that it becomes
> too
> > tempting to want to mix these in an array, but the desired behavior would
> > likely become ambiguous..
> >
> > There is one other proplem that I just thought of that I don't think has
> > been outlined in either NEP.  What if I perform an operation between an
> > array set up with propagate NAs and an array with skip NAs?
>
> These are explicitly covered in the alterNEP:
>
> https://gist.github.com/1056379/
>
>
Sort of.  You speak of reduction operations for a single array with a mix of
NA and IGNOREs.  I guess in that case, it wouldn't make a difference for
element-wise operations between two arrays (plus adding the NAs propagate
harder rule).  Alth

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Gary Strangman

>> I wonder if that might be handled as a scikits-image extension, rather
>> than core numpy?
>
> I think Stefan and Nathaniel and Gary Strangman and others are saying
> we don't want to pay the price of a large memory hike for masking.   I
> suspect that Nathaniel is right, and that a large majority of those of
> us who want 'missing data' functionality, also want what we've called
> ABSENT missing values, and care about memory.

FWIW, Matthew correctly interprets my concerns. I also have very large 
non-image datasets, so pushing the problem into a more custom extension 
(esp. one focused on images) doesn't help me much.

-best
Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Fri, Oct 28, 2011 at 9:21 AM, Chris.Barker  wrote:
> On 10/27/11 7:51 PM, Travis Oliphant wrote:
>> As I mentioned. I find the ability to separate an ABSENT idea from an
>> IGNORED idea convincing. In other words, I think distinguishing between
>> masks and bit-patterns is not just an implementation detail, but
>> provides a useful concept for multiple use-cases.
>
> Exactly -- while one can implement ABSENT with a mask, one can not
> implement IGNORE with a bit-pattern. So it is not an implementation detail.
>
> I also think bit-patterns are a bit of a dead end:
>
> - there is only a standard for one data type family: i.e. NaN for ieee
> float types
>
> - So we would be coming up with our own standard (or adopting an
> existing one, but I don't think there is one widely supported) for other
> types. This means:
>   1) a lot of work to do

Largest possible negative integer for ints / largest integer for uints
/ not allowed for bool?

>   2) a binary format incompatible with other code, compilers, etc. This
> is a BIG deal -- a major strength of numpy is that it serves as a
> wrapper for a data block that is compatible with C, Fortran or whatever
> code -- special bit patterns would make this a lot harder.

Extension code is going to get harder.   At the moment, as far as I
understand it, our extension code can receive a masked array and
(without an explicit check from us) ignore the mask and process all
the values.  Then you're in the unfortunate situation of caring what's
under the mask.

Bitpatterns would - I imagine - be safer in that respect in that they
would be new dtypes and thus extension code would by default reject
them as unknown.

> We also talked about the fact that a 8-bit mask provides the ability to
> carry other information in the mask -- not jsut "missing" or "ignored",
> but a handful of other possible reasons for masking. I think that has a
> lot of possibilities.
>
> On 10/28/11 2:11 AM, Stéfan van der Walt wrote:
>> Another data point:  I've been spending some time on scikits-image
>> recently, and although masked values would be highly useful in that
>> context, the cost of doubling memory use (for uint8 images, e.g.) is
>> too high.
>
>> 2) that we make a concerted effort to implement the bitmask mode of
>> operation as soon as possible.
>
> I wonder if that might be handled as a scikits-image extension, rather
> than core numpy?

I think Stefan and Nathaniel and Gary Strangman and others are saying
we don't want to pay the price of a large memory hike for masking.   I
suspect that Nathaniel is right, and that a large majority of those of
us who want 'missing data' functionality, also want what we've called
ABSENT missing values, and care about memory.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Matthew Brett
Hi,

On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root  wrote:
>
>
> On Thursday, October 27, 2011, Charles R Harris 
> wrote:
>>
>>
>> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant 
>> wrote:
>>>
>>> That is a pretty good explanation.   I find myself convinced by Matthew's
>>> arguments.    I think that being able to separate ABSENT from IGNORED is a
>>> good idea.   I also like being able to control SKIP and PROPAGATE (but I
>>> think the current implementation allows this already).
>>>
>>> What is the counter-argument to this proposal?
>>>
>>
>> What exactly do you find convincing? The current masks propagate by
>> default:
>>
>> In [1]: a = ones(5, maskna=1)
>>
>> In [2]: a[2] = NA
>>
>> In [3]: a
>> Out[3]: array([ 1.,  1.,  NA,  1.,  1.])
>>
>> In [4]: a + 1
>> Out[4]: array([ 2.,  2.,  NA,  2.,  2.])
>>
>> In [5]: a[2] = 10
>>
>> In [5]: a
>> Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)
>>
>>
>> I don't see an essential difference between the implementation using masks
>> and one using bit patterns, the mask when attached to the original array
>> just adds a bit pattern by extending all the types by one byte, an approach
>> that easily extends to all existing and future types, which is why Mark went
>> that way for the first implementation given the time available. The masks
>> are hidden because folks wanted something that behaved more like R and also
>> because of the desire to combine the missing, ignore, and later possibly bit
>> patterns in a unified manner. Note that the pseudo assignment was also meant
>> to look like R. Adding true bit patterns to numpy isn't trivial and I
>> believe Mark was thinking of parametrized types for that.
>>
>> The main problems I see with masks are unified storage and possibly memory
>> use. The rest is just behavor and desired API and that can be adjusted
>> within the current implementation. There is nothing essentially masky about
>> masks.
>>
>> Chuck
>>
>>
>
> I  think chuck sums it up quite nicely.  The implementation detail about
> using mask versus bit patterns can still be discussed and addressed.
> Personally, I just don't see how parameterized dtypes would be easier to use
> than the pseudo assignment.
>
> The elegance of mark's solution was to consider the treatment of missing
> data in a unified manner.  This puts missing data in a more prominent spot
> for extension builders, which should greatly improve support throughout the
> ecosystem.

Are extension builders then required to use the numpy C API to get
their data?  Speaking as an extension builder, I would rather you gave
me the mask and the bitpattern information and let me do that myself.

> By letting there be a single missing data framework (instead of
> two) all that users need to figure out is when they want nan-like behavior
> (propagate) or to be more like masks (skip).  Numpy takes care of the rest.
>  There is a reason why I like using masked arrays because I don't have to
> use nansum in my library functions to guard against the possibility of
> receiving nans.  Duck-typing is a good thing.
>
> My argument against separating IGNORE and PROPAGATE is that it becomes too
> tempting to want to mix these in an array, but the desired behavior would
> likely become ambiguous..
>
> There is one other proplem that I just thought of that I don't think has
> been outlined in either NEP.  What if I perform an operation between an
> array set up with propagate NAs and an array with skip NAs?

These are explicitly covered in the alterNEP:

https://gist.github.com/1056379/

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Chris.Barker
On 10/27/11 7:51 PM, Travis Oliphant wrote:
> As I mentioned. I find the ability to separate an ABSENT idea from an
> IGNORED idea convincing. In other words, I think distinguishing between
> masks and bit-patterns is not just an implementation detail, but
> provides a useful concept for multiple use-cases.

Exactly -- while one can implement ABSENT with a mask, one can not 
implement IGNORE with a bit-pattern. So it is not an implementation detail.

I also think bit-patterns are a bit of a dead end:

- there is only a standard for one data type family: i.e. NaN for ieee 
float types

- So we would be coming up with our own standard (or adopting an 
existing one, but I don't think there is one widely supported) for other 
types. This means:
   1) a lot of work to do
   2) a binary format incompatible with other code, compilers, etc. This 
is a BIG deal -- a major strength of numpy is that it serves as a 
wrapper for a data block that is compatible with C, Fortran or whatever 
code -- special bit patterns would make this a lot harder.

We also talked about the fact that a 8-bit mask provides the ability to 
carry other information in the mask -- not jsut "missing" or "ignored", 
but a handful of other possible reasons for masking. I think that has a 
lot of possibilities.

On 10/28/11 2:11 AM, Stéfan van der Walt wrote:
> Another data point:  I've been spending some time on scikits-image
> recently, and although masked values would be highly useful in that
> context, the cost of doubling memory use (for uint8 images, e.g.) is
> too high.

> 2) that we make a concerted effort to implement the bitmask mode of
> operation as soon as possible.

I wonder if that might be handled as a scikits-image extension, rather 
than core numpy?

Is there a standard bit pattern for missing data in images? -- it's 
presumable quite important to maintain binary compatibility with image 
formats, processing tools, etc.

I guess what I'm getting at is that special bit-pattern implementations 
may be domain specific.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Charles R Harris
On Thu, Oct 27, 2011 at 8:51 PM, Travis Oliphant wrote:

> As I mentioned.   I find the ability to separate an ABSENT idea from an
> IGNORED idea convincing.In other words, I think distinguishing between
> masks and bit-patterns is not just an implementation detail, but provides a
> useful concept for multiple use-cases.
>
> I understand exactly what it would take to add bit-patterns to NumPy.  I
> also understand what Mark did and agree that it is possible to add Matthew's
> idea to the current code-base.  I think it is worth exploring
>
>

A masked view can be considered as simply a mask on the viewed data. I agree
that in that case it might be nicer to have some operations that are only
allowed for views, such as taking a view with a mask from somewhere else
rather than having to set it up with assignments. It might also be useful if
masked values in a view could be exposed without assigning to the underlying
value, perhaps with a np.EXPOSE assignment. But I think these  operations
could be implemented on top of the current code, although we might want an
additional flag.

Space saving can be addressed with bit masks. Unified storage can be
addressed by bit patterns that get translated between stored data and numpy
arrays with NA. So on and so forth. As people begin to use the current
implementation I hope that they offer feedback as to what they discover so
that the API and implementation can mature into something widely useful.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Gael Varoquaux
On Fri, Oct 28, 2011 at 10:45:09AM +0200, Han Genuit wrote:
> Also, I like the short and concise abbreviation for 'Not Applicable',
> NA. It has more common uses than IGNORE.
> (See also here:
> http://www.johndcook.com/R_language_for_programmers.html#missing)

That's a very R centric point a view: you know what NA stands for, thus
you find it meaningful. I can tell you that when I work with naive users,
I keep having to explain that NA stands for 'not available', whereas
IGNORE is at least somewhat explicit.

Acronyms are a curse for communication, and they tend to be very domain
specific.

My two euro-cents (a rising currency, now that it has been saved by our
generous leaders)

G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Stéfan van der Walt
Hi all,

On Thu, Oct 27, 2011 at 7:51 PM, Travis Oliphant  wrote:
> I understand exactly what it would take to add bit-patterns to NumPy.  I
> also understand what Mark did and agree that it is possible to add Matthew's
> idea to the current code-base.  I think it is worth exploring

Another data point:  I've been spending some time on scikits-image
recently, and although masked values would be highly useful in that
context, the cost of doubling memory use (for uint8 images, e.g.) is
too high.  Many users with large data sets (and I think almost all
researchers working on >2D data would be included here as well) may
have the same problem.

So, while I applaud the efforts made to include a masked array
implementation, I'd like to ask that:

1) We are mindful that any design decisions taken before the next
release should not *preclude* the implementation of bit-masks (with,
hopefully, a shared interface) and
2) that we make a concerted effort to implement the bitmask mode of
operation as soon as possible.

The NEP stated that both would be implemented, and I understand that
due to lack of time a pragmatic call had to be made--but that was, in
my opinion, one of its strong features.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-28 Thread Han Genuit
Yes, to further iterate on that, you can also create multiple masked
views with each its own mask properties. It would be ambiguous to mix
a bit-pattern NA together with standard NA's in the same mask, but you
can make different specialized masked views on the same data.

Also, I like the short and concise abbreviation for 'Not Applicable',
NA. It has more common uses than IGNORE.
(See also here:
http://www.johndcook.com/R_language_for_programmers.html#missing)

Concerning the assignment, it is a bit implicit, I agree, but the
representation and application of masks is also implicit. I think you
only have to know that NA will be a mask assignment and not a data
assignment.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Benjamin Root
> It should be possible to remove a mask when copying an array.
>

This was a concession on the part of those pushing for masks.  Eventually, I
ended up realizing that it resulted in a stronger design.

Consider the following:

foo(a[4:10])

Should function foo be able to access the rest of array "a", even though it
has a part of it?  Of course not!

Now, if one considers masking as a form of advanced slicing, then it wouldnt
make sense for foo() to be able to access parts it wasn't given.

That being said, this is where NumPy array views come into play.  You can
create a view of the original data, add masks to the view, and still have
access to all of the original data, unmasked.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Han Genuit
There is a way to assign whole masks in the current implementation:

>>> a = np.arange(9, maskna=True).reshape((3,3))
>>> a
array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
>>> mask = np.array([[False, False, True],
   [False, True, False],
   [True, False, True]])
>>> np.copyto(a, np.NA, where=mask)
>>> a
array([[0, 1, NA],
 [3, NA, 5],
 [NA, 7, NA]])

I think the "ValueError: Cannot assign NA to an array which does not
support NAs" when trying to copy an array with a mask to an array
without a mask is a bug..

>>> a = np.arange(9, maskna=True).reshape((3,3))
>>> a.flags.maskna
True
>>> b = a.copy(maskna=False)
>>> b.flags.maskna
False

It should be possible to remove a mask when copying an array.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Benjamin Root
On Thursday, October 27, 2011, Charles R Harris 
wrote:
>
>
> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant 
wrote:
>>
>> That is a pretty good explanation.   I find myself convinced by Matthew's
arguments.I think that being able to separate ABSENT from IGNORED is a
good idea.   I also like being able to control SKIP and PROPAGATE (but I
think the current implementation allows this already).
>>
>> What is the counter-argument to this proposal?
>>
>
> What exactly do you find convincing? The current masks propagate by
default:
>
> In [1]: a = ones(5, maskna=1)
>
> In [2]: a[2] = NA
>
> In [3]: a
> Out[3]: array([ 1.,  1.,  NA,  1.,  1.])
>
> In [4]: a + 1
> Out[4]: array([ 2.,  2.,  NA,  2.,  2.])
>
> In [5]: a[2] = 10
>
> In [5]: a
> Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)
>
>
> I don't see an essential difference between the implementation using masks
and one using bit patterns, the mask when attached to the original array
just adds a bit pattern by extending all the types by one byte, an approach
that easily extends to all existing and future types, which is why Mark went
that way for the first implementation given the time available. The masks
are hidden because folks wanted something that behaved more like R and also
because of the desire to combine the missing, ignore, and later possibly bit
patterns in a unified manner. Note that the pseudo assignment was also meant
to look like R. Adding true bit patterns to numpy isn't trivial and I
believe Mark was thinking of parametrized types for that.
>
> The main problems I see with masks are unified storage and possibly memory
use. The rest is just behavor and desired API and that can be adjusted
within the current implementation. There is nothing essentially masky about
masks.
>
> Chuck
>
>

I  think chuck sums it up quite nicely.  The implementation detail about
using mask versus bit patterns can still be discussed and addressed.
Personally, I just don't see how parameterized dtypes would be easier to use
than the pseudo assignment.

The elegance of mark's solution was to consider the treatment of missing
data in a unified manner.  This puts missing data in a more prominent spot
for extension builders, which should greatly improve support throughout the
ecosystem.  By letting there be a single missing data framework (instead of
two) all that users need to figure out is when they want nan-like behavior
(propagate) or to be more like masks (skip).  Numpy takes care of the rest.
 There is a reason why I like using masked arrays because I don't have to
use nansum in my library functions to guard against the possibility of
receiving nans.  Duck-typing is a good thing.

My argument against separating IGNORE and PROPAGATE is that it becomes too
tempting to want to mix these in an array, but the desired behavior would
likely become ambiguous..

There is one other proplem that I just thought of that I don't think has
been outlined in either NEP.  What if I perform an operation between an
array set up with propagate NAs and an array with skip NAs?

cheers,
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Travis Oliphant
As I mentioned.   I find the ability to separate an ABSENT idea from an IGNORED 
idea convincing.In other words, I think distinguishing between masks and 
bit-patterns is not just an implementation detail, but provides a useful 
concept for multiple use-cases.

I understand exactly what it would take to add bit-patterns to NumPy.  I also 
understand what Mark did and agree that it is possible to add Matthew's idea to 
the current code-base.  I think it is worth exploring 
 
-Travis



On Oct 27, 2011, at 9:08 PM, Charles R Harris wrote:

> 
> 
> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant  
> wrote:
> That is a pretty good explanation.   I find myself convinced by Matthew's 
> arguments.I think that being able to separate ABSENT from IGNORED is a 
> good idea.   I also like being able to control SKIP and PROPAGATE (but I 
> think the current implementation allows this already).
> 
> What is the counter-argument to this proposal?
> 
> 
> What exactly do you find convincing? The current masks propagate by default:
> 
> In [1]: a = ones(5, maskna=1)
> 
> In [2]: a[2] = NA
> 
> In [3]: a
> Out[3]: array([ 1.,  1.,  NA,  1.,  1.])
> 
> In [4]: a + 1
> Out[4]: array([ 2.,  2.,  NA,  2.,  2.])
> 
> In [5]: a[2] = 10
> 
> In [5]: a
> Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)
> 
> 
> I don't see an essential difference between the implementation using masks 
> and one using bit patterns, the mask when attached to the original array just 
> adds a bit pattern by extending all the types by one byte, an approach that 
> easily extends to all existing and future types, which is why Mark went that 
> way for the first implementation given the time available. The masks are 
> hidden because folks wanted something that behaved more like R and also 
> because of the desire to combine the missing, ignore, and later possibly bit 
> patterns in a unified manner. Note that the pseudo assignment was also meant 
> to look like R. Adding true bit patterns to numpy isn't trivial and I believe 
> Mark was thinking of parametrized types for that. 
> 
> The main problems I see with masks are unified storage and possibly memory 
> use. The rest is just behavor and desired API and that can be adjusted within 
> the current implementation. There is nothing essentially masky about masks.
> 
> Chuck
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliph...@enthought.com
1-512-536-1057
http://www.enthought.com



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Charles R Harris
On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote:

> That is a pretty good explanation.   I find myself convinced by Matthew's
> arguments.I think that being able to separate ABSENT from IGNORED is a
> good idea.   I also like being able to control SKIP and PROPAGATE (but I
> think the current implementation allows this already).
>
> What is the counter-argument to this proposal?
>
>
What exactly do you find convincing? The current masks propagate by default:

In [1]: a = ones(5, maskna=1)

In [2]: a[2] = NA

In [3]: a
Out[3]: array([ 1.,  1.,  NA,  1.,  1.])

In [4]: a + 1
Out[4]: array([ 2.,  2.,  NA,  2.,  2.])

In [5]: a[2] = 10

In [5]: a
Out[5]: array([  1.,   1.,  10.,   1.,   1.], maskna=True)


I don't see an essential difference between the implementation using masks
and one using bit patterns, the mask when attached to the original array
just adds a bit pattern by extending all the types by one byte, an approach
that easily extends to all existing and future types, which is why Mark went
that way for the first implementation given the time available. The masks
are hidden because folks wanted something that behaved more like R and also
because of the desire to combine the missing, ignore, and later possibly bit
patterns in a unified manner. Note that the pseudo assignment was also meant
to look like R. Adding true bit patterns to numpy isn't trivial and I
believe Mark was thinking of parametrized types for that.

The main problems I see with masks are unified storage and possibly memory
use. The rest is just behavor and desired API and that can be adjusted
within the current implementation. There is nothing essentially masky about
masks.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Travis Oliphant
That is a pretty good explanation.   I find myself convinced by Matthew's 
arguments.I think that being able to separate ABSENT from IGNORED is a good 
idea.   I also like being able to control SKIP and PROPAGATE (but I think the 
current implementation allows this already). 

What is the counter-argument to this proposal?  

-Travis




On Oct 27, 2011, at 7:31 PM, Matthew Brett wrote:

> Hi,
> 
> On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant  
> wrote:
>> So, I am very interested in making sure I remember the details of the 
>> counterproposal.What I recall is that you wanted to be able to 
>> differentiate between a "bit-pattern" mask and a boolean-array mask in the 
>> API.   I believe currently even when bit-pattern masks are implemented the 
>> difference will be "hidden" from the user on the Python level.
>> 
>> I am sure to be missing other parts of the discussion as I have been in and 
>> out of it.
> 
> The ideas
> --
> 
> The question that we were addressing in the alter-NEP was: should
> missing values implemented as bitpatterns appear to be the same as
> missing values implemented with masks?  We said no, and Mark said yes.
> 
> To restate the argument in brief; Nathaniel and I and some others
> thought that there were two separable ideas in play:
> 
> 1) A value that is finally and completely missing. == ABSENT
> 2) A value that we would like to ignore for the moment but might want
> back at some future time == IGNORED
> 
> (I'm using the adjectives ABSENT and IGNORED here to be short for the
> objects 'absent value'  and 'ignored value'.  This is to distinguish
> from the verbs below).
> 
> We thought bitpatterns were a good match for the former, and masking
> was a good match for the latter.
> 
> We all agreed there were two things you might like to do with values
> that were missing in both senses above:
> 
> A) PROPAGATE; V + 1 == V
> B) SKIP; K + 1 == 1
> 
> (Note verbs for the behaviors).
> 
> I believe the original np.ma masked arrays always SKIP.
> 
> In [2]: a = np.ma.masked_array?
> In [3]: a = np.ma.masked_array([99, 2], mask=[True, False])
> In [4]: a
> Out[4]:
> masked_array(data = [-- 2],
> mask = [ True False],
>   fill_value = 99)
> In [5]: a.sum()
> Out[5]: 2
> 
> There was some discussion as to whether there was a reason to think
> that ABSENT should always or by default PROPAGATE, and IGNORED should
> always or by default SKIP.  Chuck is referring to this idea when he
> said further up this thread:
> 
>> For instance, I'm thinking skipna=1 is the natural default for the masked 
>> arrays.
> 
> The current implementation
> ---
> 
> What we have now is an implementation of masked arrays, but more
> tightly integrated into the numpy core.  In our language we have an
> implementation of IGNORED that is tuned to be nearly indistinguishable
> from the behavior we are expecting of ABSENT.
> 
> Specifically, once you have done this:
> 
> In [9]: a = np.array([99, 2], maskna=True)
> 
> you can get something representing the mask:
> 
> In [11]: np.isna(a)
> Out[11]: array([False, False], dtype=bool)
> 
> but I believe there is no way of setting the mask directly.  In order
> to set the mask, you have to do what looks like an assignment:
> 
> In [12]: a[0] = np.NA
> In [14]: a
> Out[14]: array([NA, 2])
> 
> In fact, what has happened is the mask has changed, but the underlying
> value has not:
> 
> In [18]: orig = np.array([99, 2])
> 
> In [19]: a = orig.view(maskna=True)
> 
> In [20]: a[0] = np.NA
> 
> In [21]: a
> Out[21]: array([NA, 2])
> 
> In [22]: orig
> Out[22]: array([99,  2])
> 
> This is different from real assignment:
> 
> In [23]: a[0] = 0
> 
> In [24]: a
> Out[24]: array([0, 2], maskna=True)
> 
> In [25]: orig
> Out[25]: array([0, 2])
> 
> Some effort has gone into making it difficult to pull off the mask:
> 
> In [30]: a.view(np.int64)
> Out[30]: array([NA, 2])
> 
> In [31]: a.view(np.int64).flags
> Out[31]:
>  C_CONTIGUOUS : True
>  F_CONTIGUOUS : True
>  OWNDATA : False
>  MASKNA : True
>  OWNMASKNA : False
>  WRITEABLE : True
>  ALIGNED : True
>  UPDATEIFCOPY : False
> 
> In [32]: a.astype(np.int64)
> ---
> ValueErrorTraceback (most recent call last)
> /home/mb312/ in ()
> > 1 a.astype(np.int64)
> 
> ValueError: Cannot assign NA to an array which does not support NAs
> 
> The default behavior of the masked values is PROPAGATE, but they can
> be individually made to SKIP:
> 
> In [28]: a.sum() # PROPAGATE
> Out[28]: NA(dtype='int64')
> 
> In [29]: a.sum(skipna=True) # SKIP
> Out[29]: 2
> 
> Where's the beef?
> -
> 
> I personally still think that it is confusing to fuse the concept of:
> 
> 1) Masked arrays
> 2) Arrays with bitpattern codes for missing
> 
> and the concepts of
> 
> A) ABSENT and
> B) IGNORED
> 
> Consequences for current code
> -

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-27 Thread Matthew Brett
Hi,

On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant  wrote:
> So, I am very interested in making sure I remember the details of the 
> counterproposal.    What I recall is that you wanted to be able to 
> differentiate between a "bit-pattern" mask and a boolean-array mask in the 
> API.   I believe currently even when bit-pattern masks are implemented the 
> difference will be "hidden" from the user on the Python level.
>
> I am sure to be missing other parts of the discussion as I have been in and 
> out of it.

The ideas
--

The question that we were addressing in the alter-NEP was: should
missing values implemented as bitpatterns appear to be the same as
missing values implemented with masks?  We said no, and Mark said yes.

To restate the argument in brief; Nathaniel and I and some others
thought that there were two separable ideas in play:

1) A value that is finally and completely missing. == ABSENT
2) A value that we would like to ignore for the moment but might want
back at some future time == IGNORED

(I'm using the adjectives ABSENT and IGNORED here to be short for the
objects 'absent value'  and 'ignored value'.  This is to distinguish
from the verbs below).

We thought bitpatterns were a good match for the former, and masking
was a good match for the latter.

We all agreed there were two things you might like to do with values
that were missing in both senses above:

A) PROPAGATE; V + 1 == V
B) SKIP; K + 1 == 1

(Note verbs for the behaviors).

I believe the original np.ma masked arrays always SKIP.

In [2]: a = np.ma.masked_array?
In [3]: a = np.ma.masked_array([99, 2], mask=[True, False])
In [4]: a
Out[4]:
masked_array(data = [-- 2],
 mask = [ True False],
   fill_value = 99)
In [5]: a.sum()
Out[5]: 2

There was some discussion as to whether there was a reason to think
that ABSENT should always or by default PROPAGATE, and IGNORED should
always or by default SKIP.  Chuck is referring to this idea when he
said further up this thread:

> For instance, I'm thinking skipna=1 is the natural default for the masked 
> arrays.

The current implementation
---

What we have now is an implementation of masked arrays, but more
tightly integrated into the numpy core.  In our language we have an
implementation of IGNORED that is tuned to be nearly indistinguishable
from the behavior we are expecting of ABSENT.

Specifically, once you have done this:

In [9]: a = np.array([99, 2], maskna=True)

you can get something representing the mask:

In [11]: np.isna(a)
Out[11]: array([False, False], dtype=bool)

but I believe there is no way of setting the mask directly.  In order
to set the mask, you have to do what looks like an assignment:

In [12]: a[0] = np.NA
In [14]: a
Out[14]: array([NA, 2])

In fact, what has happened is the mask has changed, but the underlying
value has not:

In [18]: orig = np.array([99, 2])

In [19]: a = orig.view(maskna=True)

In [20]: a[0] = np.NA

In [21]: a
Out[21]: array([NA, 2])

In [22]: orig
Out[22]: array([99,  2])

This is different from real assignment:

In [23]: a[0] = 0

In [24]: a
Out[24]: array([0, 2], maskna=True)

In [25]: orig
Out[25]: array([0, 2])

Some effort has gone into making it difficult to pull off the mask:

In [30]: a.view(np.int64)
Out[30]: array([NA, 2])

In [31]: a.view(np.int64).flags
Out[31]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  MASKNA : True
  OWNMASKNA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [32]: a.astype(np.int64)
---
ValueErrorTraceback (most recent call last)
/home/mb312/ in ()
> 1 a.astype(np.int64)

ValueError: Cannot assign NA to an array which does not support NAs

The default behavior of the masked values is PROPAGATE, but they can
be individually made to SKIP:

In [28]: a.sum() # PROPAGATE
Out[28]: NA(dtype='int64')

In [29]: a.sum(skipna=True) # SKIP
Out[29]: 2

Where's the beef?
-

I personally still think that it is confusing to fuse the concept of:

1) Masked arrays
2) Arrays with bitpattern codes for missing

and the concepts of

A) ABSENT and
B) IGNORED

Consequences for current code


Specifically, it still seems to me to make sense to prefer this:

>> a = np.array([99, 2[, masking=True)
>> a.mask
[ True, True ]
>> a.sum()
101
>> a.mask[0] = False
>> a.sum()
2

It might make sense, as Chuck suggests, to change the default to
'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED
and 'skipna' to skipignored' for clarity.

I still think the pseudo-assignment:

In [20]: a[0] = np.NA

is confusing, and should be removed.

Later, should we ever have bitpatterns, there would be something like
np.ABSENT.  This of course would make sense for assignment:

In [20]: a[0] = np.ABSENT

There would be another keyword argument 'skipabsent=F

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-26 Thread Matthew Brett
Hi,

On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant  wrote:
> So, I am very interested in making sure I remember the details of the 
> counterproposal.    What I recall is that you wanted to be able to 
> differentiate between a "bit-pattern" mask and a boolean-array mask in the 
> API.   I believe currently even when bit-pattern masks are implemented the 
> difference will be "hidden" from the user on the Python level.
>
> I am sure to be missing other parts of the discussion as I have been in and 
> out of it.

Nathaniel - are you online today?  Do you have time to review the
current implementation and see if it affects the initial discussion?

I'm running around most of today but I should have time to do some
thinking later this afternoon CA time.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Han Genuit
There is also:

Missing/accumulating data
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057406.html

An NA compromise idea -- many-NA
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057408.html

NEPaNEP lessons - was: alterNEP
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057435.html

NA/Missing Data Conference Call Summary
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057474.html

HPC missing data - was: NA/Missing Data Conference Call Summary
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057482.html

using the same vocabulary for missing value ideas
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057485.html

towards a more productive missing values/masked arrays discussion...
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057511.html

miniNEP1: where= argument for ufuncs
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057513.html

miniNEP 2: NA support via special dtypes
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057542.html

Missing Data development plan
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057567.html

Missing Values Discussion
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057579.html

NA masks for NumPy are ready to test
http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058103.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Eric Firing
On 10/25/2011 04:56 PM, Travis Oliphant wrote:
> So, I am very interested in making sure I remember the details of the
> counterproposal.What I recall is that you wanted to be able to
> differentiate between a "bit-pattern" mask and a boolean-array mask
> in the API.   I believe currently even when bit-pattern masks are
> implemented the difference will be "hidden" from the user on the
> Python level.
>
> I am sure to be missing other parts of the discussion as I have been
> in and out of it.
>
> Thanks,
>
> -Travis

The alternative-NEP is here: https://gist.github.com/1056379/

One thread of discussion is here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg32268.html

and continued here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg32371.html

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Travis Oliphant
So, I am very interested in making sure I remember the details of the 
counterproposal.What I recall is that you wanted to be able to 
differentiate between a "bit-pattern" mask and a boolean-array mask in the API. 
  I believe currently even when bit-pattern masks are implemented the 
difference will be "hidden" from the user on the Python level.  

I am sure to be missing other parts of the discussion as I have been in and out 
of it. 

Thanks,

-Travis





On Oct 25, 2011, at 7:02 PM, Matthew Brett wrote:

> Hi,
> 
> Thank you for your gracious email.
> 
> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant  
> wrote:
>> It is a shame that Nathaniel and perhaps Matthew do not feel like their
>> voice was heard.   I wish I could have participated more fully in some of
>> the discussions.  I don't know if I could have really helped, but I would
>> have liked to have tried to perhaps work alongside Mark to integrate some of
>> the other ideas that had been expressed during the discussion.
>> Unfortunately,  I was traveling in NYC most of the time that Mark was
>> working on this project and did not get a chance to interact with him as
>> much as I would have liked.
>> My view is that we didn't get quite to where I thought we would get, nor
>> where I think we could be.  I think Nathaniel and Matthew provided very
>> specific feedback that was helpful in understanding other perspectives of a
>> difficult problem. In particular, I really wanted bit-patterns
>> implemented.However, I also understand that Mark did quite a bit of work
>> and altered his original designs quite a bit in response to community
>> feedback.   I wasn't a major part of the pull request discussion, nor did I
>> merge the changes, but I support Charles if he reviewed the code and felt
>> like it was the right thing to do.  I likely would have done the same thing
>> rather than let Mark Wiebe's work languish.
>> Merging Mark's code does not mean there is not more work to be done, but it
>> is consistent with the reality that currently development on NumPy happens
>> when people have the time to do it.I have not seen anything to convince
>> me that there is not still time to make specific API changes that address
>> some of the concerns.
>> Perhaps, Nathaniel and or Matthew could summarize their concerns again and
>> if desired submit a pull request to revert the changes.   However, there is
>> a definite bias against removing working code unless the arguments are very
>> strong and receive a lot of support from others.
> 
> Honestly - I am not sure whether there is any interest now, in the
> arguments we made before.   If there is, who is interested?  I mean,
> past politeness.
> 
> I wasn't trying to restart that discussion, because I didn't know what
> good it could do.   At first I was hoping that we could ask whether
> there was a better way of dealing with disagreements like this.
> Later it seemed to me that the atmosphere was getting bad, and I
> wanted to say that because I thought it was important.
> 
>> Thank you for continuing to voice your opinions even when it may feel that
>> the tide is against you.   My view is that we only learn from people who
>> disagree with us.
> 
> Thank you for saying that.   I hope that y'all will tell me if I am
> making it harder for you to disagree,  and I am sorry if I did so
> here.
> 
> Best,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliph...@enthought.com
1-512-536-1057
http://www.enthought.com



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Matthew Brett
Hi,

Thank you for your gracious email.

On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant  wrote:
> It is a shame that Nathaniel and perhaps Matthew do not feel like their
> voice was heard.   I wish I could have participated more fully in some of
> the discussions.  I don't know if I could have really helped, but I would
> have liked to have tried to perhaps work alongside Mark to integrate some of
> the other ideas that had been expressed during the discussion.
> Unfortunately,  I was traveling in NYC most of the time that Mark was
> working on this project and did not get a chance to interact with him as
> much as I would have liked.
> My view is that we didn't get quite to where I thought we would get, nor
> where I think we could be.  I think Nathaniel and Matthew provided very
> specific feedback that was helpful in understanding other perspectives of a
> difficult problem.     In particular, I really wanted bit-patterns
> implemented.    However, I also understand that Mark did quite a bit of work
> and altered his original designs quite a bit in response to community
> feedback.   I wasn't a major part of the pull request discussion, nor did I
> merge the changes, but I support Charles if he reviewed the code and felt
> like it was the right thing to do.  I likely would have done the same thing
> rather than let Mark Wiebe's work languish.
> Merging Mark's code does not mean there is not more work to be done, but it
> is consistent with the reality that currently development on NumPy happens
> when people have the time to do it.    I have not seen anything to convince
> me that there is not still time to make specific API changes that address
> some of the concerns.
> Perhaps, Nathaniel and or Matthew could summarize their concerns again and
> if desired submit a pull request to revert the changes.   However, there is
> a definite bias against removing working code unless the arguments are very
> strong and receive a lot of support from others.

Honestly - I am not sure whether there is any interest now, in the
arguments we made before.   If there is, who is interested?  I mean,
past politeness.

I wasn't trying to restart that discussion, because I didn't know what
good it could do.   At first I was hoping that we could ask whether
there was a better way of dealing with disagreements like this.
Later it seemed to me that the atmosphere was getting bad, and I
wanted to say that because I thought it was important.

> Thank you for continuing to voice your opinions even when it may feel that
> the tide is against you.   My view is that we only learn from people who
> disagree with us.

Thank you for saying that.   I hope that y'all will tell me if I am
making it harder for you to disagree,  and I am sorry if I did so
here.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Travis Oliphant
It is a shame that Nathaniel and perhaps Matthew do not feel like their voice 
was heard.   I wish I could have participated more fully in some of the 
discussions.  I don't know if I could have really helped, but I would have 
liked to have tried to perhaps work alongside Mark to integrate some of the 
other ideas that had been expressed during the discussion.   Unfortunately,  I 
was traveling in NYC most of the time that Mark was working on this project and 
did not get a chance to interact with him as much as I would have liked. 

My view is that we didn't get quite to where I thought we would get, nor where 
I think we could be.  I think Nathaniel and Matthew provided very specific 
feedback that was helpful in understanding other perspectives of a difficult 
problem. In particular, I really wanted bit-patterns implemented.
However, I also understand that Mark did quite a bit of work and altered his 
original designs quite a bit in response to community feedback.   I wasn't a 
major part of the pull request discussion, nor did I merge the changes, but I 
support Charles if he reviewed the code and felt like it was the right thing to 
do.  I likely would have done the same thing rather than let Mark Wiebe's work 
languish. 

Merging Mark's code does not mean there is not more work to be done, but it is 
consistent with the reality that currently development on NumPy happens when 
people have the time to do it.I have not seen anything to convince me that 
there is not still time to make specific API changes that address some of the 
concerns.   

Perhaps, Nathaniel and or Matthew could summarize their concerns again and if 
desired submit a pull request to revert the changes.   However, there is a 
definite bias against removing working code unless the arguments are very 
strong and receive a lot of support from others. 

Thank you for continuing to voice your opinions even when it may feel that the 
tide is against you.   My view is that we only learn from people who disagree 
with us.  

Best regards,

-Travis
 

On Oct 25, 2011, at 1:24 PM, Benjamin Root wrote:

> On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett  
> wrote:
> Hi,
> 
> On Tue, Oct 25, 2011 at 8:04 AM, Lluís  wrote:
> > Matthew Brett writes:
> >> I'm afraid I find this whole thread very unpleasant.
> >
> >> I have the odd impression of being back at high school.  Some of the
> >> big kids are pushing me around and then the other kids join in.
> >
> >> It didn't have to be this way.
> >
> >> Someone could have replied like this to Nathaniel:
> >
> >> "Oh - yes - I'm sorry -  we actually had the discussion on the pull
> >> request.  Looking back, I see that we didn't flag this up on the
> >> mailing list and maybe we should have.  Thanks for pointing that out.
> >>  Maybe we could start another discussion of the API in view of the
> >> changes that have gone in".
> >
> >> But that didn't happen.
> >
> > Well, I really thought that all the interested parties would take a look at 
> > [1].
> >
> > While it's true that the pull requests are not obvious if you're not using 
> > the
> > functionalities of the github web (or unless announced in this list), I 
> > think
> > that Mark's announcement was precisely directed at having a new round of
> > discussions after having some code to play around with and see how 
> > intuitive or
> > counter-intuitive the implemented concepts could be.
> 
> I just wanted to be clear what I meant.
> 
> The key point is not whether or not the pull-request or request for
> testing was in fact the right place for the discussion that Travis
> suggested.   I guess you can argue that either way.   I'd say no, but
> I can see how you would disagree on that.
> 
> 
> This is getting very meta... a disagreement about the disagreement.
>  
> The key point is - how much do we value constructive disagreement?
> 
> 
> Personally, I value it very much.  My impression of the discussion we all had 
> at the beginning was that the needs of the two distinct communities (R-users 
> and masked array users) were both heard and largely addressed.  Aspects of 
> both approaches were used, and the final result is, IMHO, inspired and 
> elegant.  Is it perfect? No.  Are there ways to improve it? Absolutely, and I 
> fully expect that to happen.
>  
> If we do value constructive disagreement then we'll go out of our way
> to talk through the points of contention, and make sure that the
> people who disagree, especially the minority, feel that they have been
> fully heard.
> 
> If we don't value constructive disagreement then we'll let the other
> side know that further disagreement will be taken as a sign of bad
> faith.
> 
> Now - what do you see here?  I see the second and that worries me.
> 
> 
> It is disappointing that you choose not to participate in the thread linked 
> above or in the pull request itself.  If I remember correctly, you were 
> working on finishing up your dissertation, so I fully understand the time 
> 

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Lluís
Matthew Brett writes:
[...]
>>> If we do value constructive disagreement then we'll go out of our way
>>> to talk through the points of contention, and make sure that the
>>> people who disagree, especially the minority, feel that they have been
>>> fully heard.
>>> 
>>> If we don't value constructive disagreement then we'll let the other
>>> side know that further disagreement will be taken as a sign of bad
>>> faith.
>>> 
>>> Now - what do you see here?  I see the second and that worries me.
>>> 
>> 
>> It is disappointing that you choose not to participate in the thread linked
>> above or in the pull request itself.  If I remember correctly, you were
>> working on finishing up your dissertation, so I fully understand the time
>> constraints involved there.  However, the pull request and the email
>> notification is the de facto method of staging and discussing changes in any
>> development project.  No objections were raised in that pull request, so it
>> went in after some time passed.  To hold off the merge, all one would need
>> to do is fire off a quick comment requesting a delay to have a chance to
>> review the pull request.

> I think the pull-request was not the right vehicle for the discussion,
> you think it was, that's fine, I don't think we need to rehearse that.

> My question (if you are answering my question) is: if you put yourself
> in my or Nathaniel's shoes, would you feel that you had been warmly
> encouraged to express disagreement, or would you feel something else.

I sense (bear with me, my senses are not very sharp) that you feel your concerns
have not been addressed, and thus the sensation that features you disagreed upon
were sneaked through a silent pull request.

And yes, the initial discussions were too heated on some moments (me included),
but that does not imply that the current state is ignoring the concerns
everybody raised.


>> Luckily, git is a VCS, so we are fully capable of reverting any necessary
>> changes if warranted.  If you have any concerns or suggestions for changes
>> in the current implementation, feel free to raise them and open additional
>> pull requests.  There is no "ganging up" here or any other subterfuge.  Tell
>> us exactly what are your issues with the current setup, provide example code
>> demonstrating the issues, and we can certainly discuss ways to improve this.

> Has the situation changed since the counter-NEP that Nathaniel and I wrote up?

I couldn't find the link, but AFAIR the main concerns were:

- Using bit patterns as a more efficient missing data mechanism that is
  compatible with third-party binary libraries.

  As the NEP says, although not implemented (due to lack of time), bit patterns
  are a desirable extension that will be able to coexist with masks while
  providing a single and consistent Python and C API for both bit patterns and
  masks.

- Being able to expose the non-destructive nature of masks.

  There is only one very specific path leading to such behaviour [1], so users
  not interested in it should never inadvertently fall into its use (aka, they
  don't even need to know about it).

[1] 
http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views


If we agree that it is reasonable to think that the concerns in the
"counter-NEP" have been addressed in the current implementation, then I think it
is not unreasonable to take the silence to Mark's mail and the pull request as a
green light.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Matthew Brett
Hi,

On Tue, Oct 25, 2011 at 11:24 AM, Benjamin Root  wrote:
> On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Tue, Oct 25, 2011 at 8:04 AM, Lluís  wrote:
>> > Matthew Brett writes:
>> >> I'm afraid I find this whole thread very unpleasant.
>> >
>> >> I have the odd impression of being back at high school.  Some of the
>> >> big kids are pushing me around and then the other kids join in.
>> >
>> >> It didn't have to be this way.
>> >
>> >> Someone could have replied like this to Nathaniel:
>> >
>> >> "Oh - yes - I'm sorry -  we actually had the discussion on the pull
>> >> request.  Looking back, I see that we didn't flag this up on the
>> >> mailing list and maybe we should have.  Thanks for pointing that out.
>> >>  Maybe we could start another discussion of the API in view of the
>> >> changes that have gone in".
>> >
>> >> But that didn't happen.
>> >
>> > Well, I really thought that all the interested parties would take a look
>> > at [1].
>> >
>> > While it's true that the pull requests are not obvious if you're not
>> > using the
>> > functionalities of the github web (or unless announced in this list), I
>> > think
>> > that Mark's announcement was precisely directed at having a new round of
>> > discussions after having some code to play around with and see how
>> > intuitive or
>> > counter-intuitive the implemented concepts could be.
>>
>> I just wanted to be clear what I meant.
>>
>> The key point is not whether or not the pull-request or request for
>> testing was in fact the right place for the discussion that Travis
>> suggested.   I guess you can argue that either way.   I'd say no, but
>> I can see how you would disagree on that.
>>
>
> This is getting very meta... a disagreement about the disagreement.

Yes, the important point is a social one.  The other points are details.

>> The key point is - how much do we value constructive disagreement?
>>
>
> Personally, I value it very much.

Well - I think everyone believes that that they value constructive
discussion, but the question is, what happens when people really
disagree?

> My impression of the discussion we all
> had at the beginning was that the needs of the two distinct communities
> (R-users and masked array users) were both heard and largely addressed.
> Aspects of both approaches were used, and the final result is, IMHO,
> inspired and elegant.  Is it perfect? No.  Are there ways to improve it?
> Absolutely, and I fully expect that to happen.

To be clear once more, I personally feel we don't need to discuss:

1) Whether Mark did a good job on the code (I have high bias to imagine so).
2) Whether something along these lines would be good to have in numpy

>> If we do value constructive disagreement then we'll go out of our way
>> to talk through the points of contention, and make sure that the
>> people who disagree, especially the minority, feel that they have been
>> fully heard.
>>
>> If we don't value constructive disagreement then we'll let the other
>> side know that further disagreement will be taken as a sign of bad
>> faith.
>>
>> Now - what do you see here?  I see the second and that worries me.
>>
>
> It is disappointing that you choose not to participate in the thread linked
> above or in the pull request itself.  If I remember correctly, you were
> working on finishing up your dissertation, so I fully understand the time
> constraints involved there.  However, the pull request and the email
> notification is the de facto method of staging and discussing changes in any
> development project.  No objections were raised in that pull request, so it
> went in after some time passed.  To hold off the merge, all one would need
> to do is fire off a quick comment requesting a delay to have a chance to
> review the pull request.

I think the pull-request was not the right vehicle for the discussion,
you think it was, that's fine, I don't think we need to rehearse that.

My question (if you are answering my question) is: if you put yourself
in my or Nathaniel's shoes, would you feel that you had been warmly
encouraged to express disagreement, or would you feel something else.

> Luckily, git is a VCS, so we are fully capable of reverting any necessary
> changes if warranted.  If you have any concerns or suggestions for changes
> in the current implementation, feel free to raise them and open additional
> pull requests.  There is no "ganging up" here or any other subterfuge.  Tell
> us exactly what are your issues with the current setup, provide example code
> demonstrating the issues, and we can certainly discuss ways to improve this.

Has the situation changed since the counter-NEP that Nathaniel and I wrote up?

> Remember, we *all* have a common agreement here.  NumPy needs better support
> for missing data (in whatever form).  Let's work from that assumption and
> make NumPy a better library to use for everybody!

I remember walking past a church in a small town in the California
desert.  It ha

Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Benjamin Root
On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett wrote:

> Hi,
>
> On Tue, Oct 25, 2011 at 8:04 AM, Lluís  wrote:
> > Matthew Brett writes:
> >> I'm afraid I find this whole thread very unpleasant.
> >
> >> I have the odd impression of being back at high school.  Some of the
> >> big kids are pushing me around and then the other kids join in.
> >
> >> It didn't have to be this way.
> >
> >> Someone could have replied like this to Nathaniel:
> >
> >> "Oh - yes - I'm sorry -  we actually had the discussion on the pull
> >> request.  Looking back, I see that we didn't flag this up on the
> >> mailing list and maybe we should have.  Thanks for pointing that out.
> >>  Maybe we could start another discussion of the API in view of the
> >> changes that have gone in".
> >
> >> But that didn't happen.
> >
> > Well, I really thought that all the interested parties would take a look
> at [1].
> >
> > While it's true that the pull requests are not obvious if you're not
> using the
> > functionalities of the github web (or unless announced in this list), I
> think
> > that Mark's announcement was precisely directed at having a new round of
> > discussions after having some code to play around with and see how
> intuitive or
> > counter-intuitive the implemented concepts could be.
>
> I just wanted to be clear what I meant.
>
> The key point is not whether or not the pull-request or request for
> testing was in fact the right place for the discussion that Travis
> suggested.   I guess you can argue that either way.   I'd say no, but
> I can see how you would disagree on that.
>
>
This is getting very meta... a disagreement about the disagreement.


> The key point is - how much do we value constructive disagreement?
>
>
Personally, I value it very much.  My impression of the discussion we all
had at the beginning was that the needs of the two distinct communities
(R-users and masked array users) were both heard and largely addressed.
Aspects of both approaches were used, and the final result is, IMHO,
inspired and elegant.  Is it perfect? No.  Are there ways to improve it?
Absolutely, and I fully expect that to happen.


> If we do value constructive disagreement then we'll go out of our way
> to talk through the points of contention, and make sure that the
> people who disagree, especially the minority, feel that they have been
> fully heard.
>
> If we don't value constructive disagreement then we'll let the other
> side know that further disagreement will be taken as a sign of bad
> faith.
>
> Now - what do you see here?  I see the second and that worries me.
>
>
It is disappointing that you choose not to participate in the thread linked
above or in the pull request itself.  If I remember correctly, you were
working on finishing up your dissertation, so I fully understand the time
constraints involved there.  However, the pull request and the email
notification is the de facto method of staging and discussing changes in any
development project.  No objections were raised in that pull request, so it
went in after some time passed.  To hold off the merge, all one would need
to do is fire off a quick comment requesting a delay to have a chance to
review the pull request.

Luckily, git is a VCS, so we are fully capable of reverting any necessary
changes if warranted.  If you have any concerns or suggestions for changes
in the current implementation, feel free to raise them and open additional
pull requests.  There is no "ganging up" here or any other subterfuge.  Tell
us exactly what are your issues with the current setup, provide example code
demonstrating the issues, and we can certainly discuss ways to improve this.

Remember, we *all* have a common agreement here.  NumPy needs better support
for missing data (in whatever form).  Let's work from that assumption and
make NumPy a better library to use for everybody!

Cheers!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Matthew Brett
Hi,

On Tue, Oct 25, 2011 at 8:04 AM, Lluís  wrote:
> Matthew Brett writes:
>> I'm afraid I find this whole thread very unpleasant.
>
>> I have the odd impression of being back at high school.  Some of the
>> big kids are pushing me around and then the other kids join in.
>
>> It didn't have to be this way.
>
>> Someone could have replied like this to Nathaniel:
>
>> "Oh - yes - I'm sorry -  we actually had the discussion on the pull
>> request.  Looking back, I see that we didn't flag this up on the
>> mailing list and maybe we should have.  Thanks for pointing that out.
>>  Maybe we could start another discussion of the API in view of the
>> changes that have gone in".
>
>> But that didn't happen.
>
> Well, I really thought that all the interested parties would take a look at 
> [1].
>
> While it's true that the pull requests are not obvious if you're not using the
> functionalities of the github web (or unless announced in this list), I think
> that Mark's announcement was precisely directed at having a new round of
> discussions after having some code to play around with and see how intuitive 
> or
> counter-intuitive the implemented concepts could be.

I just wanted to be clear what I meant.

The key point is not whether or not the pull-request or request for
testing was in fact the right place for the discussion that Travis
suggested.   I guess you can argue that either way.   I'd say no, but
I can see how you would disagree on that.

The key point is - how much do we value constructive disagreement?

If we do value constructive disagreement then we'll go out of our way
to talk through the points of contention, and make sure that the
people who disagree, especially the minority, feel that they have been
fully heard.

If we don't value constructive disagreement then we'll let the other
side know that further disagreement will be taken as a sign of bad
faith.

Now - what do you see here?  I see the second and that worries me.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Lluís
Matthew Brett writes:
> I'm afraid I find this whole thread very unpleasant.

> I have the odd impression of being back at high school.  Some of the
> big kids are pushing me around and then the other kids join in.

> It didn't have to be this way.

> Someone could have replied like this to Nathaniel:

> "Oh - yes - I'm sorry -  we actually had the discussion on the pull
> request.  Looking back, I see that we didn't flag this up on the
> mailing list and maybe we should have.  Thanks for pointing that out.
>  Maybe we could start another discussion of the API in view of the
> changes that have gone in".

> But that didn't happen.

Well, I really thought that all the interested parties would take a look at [1].

While it's true that the pull requests are not obvious if you're not using the
functionalities of the github web (or unless announced in this list), I think
that Mark's announcement was precisely directed at having a new round of
discussions after having some code to play around with and see how intuitive or
counter-intuitive the implemented concepts could be.


[1] http://old.nabble.com/NA-masks-for-NumPy-are-ready-to-test-td32291024.html


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-25 Thread Matthew Brett
Hi,

On Mon, Oct 24, 2011 at 11:44 PM, Han Genuit  wrote:
> Well, if I may have a say, I think that an open source project is
> especially open when users as developers can contribute to the code
> base and can participate in discussions on how to improve the existing
> designs and ideas. I do not think a project is open when it crumbles
> down into politics.. I have seen a lot of work done by Mark especially
> to ensure that everyone had a say in what he was doing, up to the
> point where this might not be fun anymore. And from what I can see at
> the time, which was back in August, everyone has had plenty of
> opportunity to discuss or contribute to the specific changes that were
> made.
>
> This was an open contribution to the NumPy code, not some cooked up
> shady business by high and mighty developers and I, for one, am happy
> with how it turned out.

I'm afraid I find this whole thread very unpleasant.

I have the odd impression of being back at high school.  Some of the
big kids are pushing me around and then the other kids join in.

It didn't have to be this way.

Someone could have replied like this to Nathaniel:

"Oh - yes - I'm sorry -  we actually had the discussion on the pull
request.  Looking back, I see that we didn't flag this up on the
mailing list and maybe we should have.  Thanks for pointing that out.
 Maybe we could start another discussion of the API in view of the
changes that have gone in".

But that didn't happen.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Han Genuit
Well, if I may have a say, I think that an open source project is
especially open when users as developers can contribute to the code
base and can participate in discussions on how to improve the existing
designs and ideas. I do not think a project is open when it crumbles
down into politics.. I have seen a lot of work done by Mark especially
to ensure that everyone had a say in what he was doing, up to the
point where this might not be fun anymore. And from what I can see at
the time, which was back in August, everyone has had plenty of
opportunity to discuss or contribute to the specific changes that were
made.

This was an open contribution to the NumPy code, not some cooked up
shady business by high and mighty developers and I, for one, am happy
with how it turned out.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Lluís
Charles R Harris writes:
[...]
> It might useful to have a way of setting global defaults, or something like a
> with statement. These are the sort of things that can be adjusted based on
> experience. For instance, I'm thinking skipna=1 is the natural default for the
> masked arrays.

I already raised this concern during the initial discussions, and Mark came up
with nice solution.

Instead of having an additional stateful global interface that code would have
to check in addition to the "skipna" argument, you can have a simple function
that takes and/or constructs an ndarray and redefines its ufunc wrapper to
always set the "skipna = True" argument.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Charles R Harris
On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney  wrote:

> On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris
>  wrote:
> >
> >
> > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris
> >  wrote:
> >>
> >>
> >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney 
> >> wrote:
> >>>
> >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing 
> wrote:
> >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
> >>> >
> >>> >> like. And in this case I do think we can come up with an API that
> will
> >>> >> make everyone happy, but that Mark's current API probably can't be
> >>> >> incrementally evolved to become that API.)
> >>> >>
> >>> >
> >>> > No one could object to coming up with an API that makes everyone
> happy,
> >>> > provided that it actually gets coded up, tested, and is found to be
> >>> > fast
> >>> > and maintainable.  When you say the API probably can't be evolved, do
> >>> > you mean that the underlying implementation also has to be redone?
>  And
> >>> > if so, who will do it, and when?
> >>> >
> >>> > Eric
> >>> > ___
> >>> > NumPy-Discussion mailing list
> >>> > NumPy-Discussion@scipy.org
> >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>> >
> >>>
> >>> I personally am a bit apprehensive as I am worried about the masked
> >>> array abstraction "leaking" through to users of pandas, something
> >>> which I simply will not accept (why I decided against using numpy.ma
> >>> early on, that + performance problems). Basically if having an
> >>> understanding of masked arrays is a prerequisite for using pandas, the
> >>> whole thing is DOA to me as it undermines the usability arguments I've
> >>> been making about switching to Python (from R) for data analysis and
> >>> statistical computing.
> >>
> >> The missing data functionality looks far more like R than numpy.ma.
> >>
> >
> > For instance
> >
> > In [8]: a = arange(5, maskna=1)
> >
> > In [9]: a[2] = np.NA
> >
> > In [10]: a.mean()
> > Out[10]: NA(dtype='float64')
> >
> > In [11]: a.mean(skipna=1)
> > Out[11]: 2.0
> >
> > In [12]: a = arange(5)
> >
> > In [13]: b = a.view(maskna=1)
> >
> > In [14]: a.mean()
> > Out[14]: 2.0
> >
> > In [15]: b[2] = np.NA
> >
> > In [16]: b.mean()
> > Out[16]: NA(dtype='float64')
> >
> > In [17]: b.mean(skipna=1)
> > Out[17]: 2.0
> >
> > Chuck
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> I don't really agree with you.
>
> some sample R code
>
> > arr <- rnorm(10)
> > arr[5:8] <- NA
> > arr
>  [1]  0.6451460 -1.1285552  0.6869828  0.4018868 NA NA
>  [7] NA NA  0.3322803 -1.9201257
>
> In your examples you had to pass maskna=True-- I suppose that my only
> recourse would be to make sure that every array inside a DataFrame,
> for example, has maskna=True set. I'll have to look in more detail and
> see if it's feasible/desirable. There's a memory cost to pay, but you
> can't get the functionality for free. I may just end up sticking with
> NaN as it's worked pretty well so far the last few years-- it's an
> impure solution but one with reasonably good performance
> characteristics in the places that matter.
>

It might useful to have a way of setting global defaults, or something like
a with statement. These are the sort of things that can be adjusted based on
experience. For instance, I'm thinking skipna=1 is the natural default for
the masked arrays.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Wes McKinney
On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris
 wrote:
>
>
> On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris
>  wrote:
>>
>>
>> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney 
>> wrote:
>>>
>>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing  wrote:
>>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
>>> >
>>> >> like. And in this case I do think we can come up with an API that will
>>> >> make everyone happy, but that Mark's current API probably can't be
>>> >> incrementally evolved to become that API.)
>>> >>
>>> >
>>> > No one could object to coming up with an API that makes everyone happy,
>>> > provided that it actually gets coded up, tested, and is found to be
>>> > fast
>>> > and maintainable.  When you say the API probably can't be evolved, do
>>> > you mean that the underlying implementation also has to be redone?  And
>>> > if so, who will do it, and when?
>>> >
>>> > Eric
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>>
>>> I personally am a bit apprehensive as I am worried about the masked
>>> array abstraction "leaking" through to users of pandas, something
>>> which I simply will not accept (why I decided against using numpy.ma
>>> early on, that + performance problems). Basically if having an
>>> understanding of masked arrays is a prerequisite for using pandas, the
>>> whole thing is DOA to me as it undermines the usability arguments I've
>>> been making about switching to Python (from R) for data analysis and
>>> statistical computing.
>>
>> The missing data functionality looks far more like R than numpy.ma.
>>
>
> For instance
>
> In [8]: a = arange(5, maskna=1)
>
> In [9]: a[2] = np.NA
>
> In [10]: a.mean()
> Out[10]: NA(dtype='float64')
>
> In [11]: a.mean(skipna=1)
> Out[11]: 2.0
>
> In [12]: a = arange(5)
>
> In [13]: b = a.view(maskna=1)
>
> In [14]: a.mean()
> Out[14]: 2.0
>
> In [15]: b[2] = np.NA
>
> In [16]: b.mean()
> Out[16]: NA(dtype='float64')
>
> In [17]: b.mean(skipna=1)
> Out[17]: 2.0
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

I don't really agree with you.

some sample R code

> arr <- rnorm(10)
> arr[5:8] <- NA
> arr
 [1]  0.6451460 -1.1285552  0.6869828  0.4018868 NA NA
 [7] NA NA  0.3322803 -1.9201257

In your examples you had to pass maskna=True-- I suppose that my only
recourse would be to make sure that every array inside a DataFrame,
for example, has maskna=True set. I'll have to look in more detail and
see if it's feasible/desirable. There's a memory cost to pay, but you
can't get the functionality for free. I may just end up sticking with
NaN as it's worked pretty well so far the last few years-- it's an
impure solution but one with reasonably good performance
characteristics in the places that matter.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Pauli Virtanen
24.10.2011 16:40, Charles R Harris kirjoitti:
[clip]
> The missing data functionality looks far more like R than numpy.ma

... and masked arrays must be explicitly requested by the user [1].

The MA stuff can "leak through" only if the user makes use of a library
that returns masked results (or explicitly creates masked arrays), but
as far as I understand that's about the same situation as with np.ma.


.. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Charles R Harris
On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris  wrote:

>
>
> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney wrote:
>
>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing  wrote:
>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
>> >
>> >> like. And in this case I do think we can come up with an API that will
>> >> make everyone happy, but that Mark's current API probably can't be
>> >> incrementally evolved to become that API.)
>> >>
>> >
>> > No one could object to coming up with an API that makes everyone happy,
>> > provided that it actually gets coded up, tested, and is found to be fast
>> > and maintainable.  When you say the API probably can't be evolved, do
>> > you mean that the underlying implementation also has to be redone?  And
>> > if so, who will do it, and when?
>> >
>> > Eric
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>>
>> I personally am a bit apprehensive as I am worried about the masked
>> array abstraction "leaking" through to users of pandas, something
>> which I simply will not accept (why I decided against using numpy.ma
>> early on, that + performance problems). Basically if having an
>> understanding of masked arrays is a prerequisite for using pandas, the
>> whole thing is DOA to me as it undermines the usability arguments I've
>> been making about switching to Python (from R) for data analysis and
>> statistical computing.
>>
>
> The missing data functionality looks far more like R than numpy.ma.
>
>
For instance

In [8]: a = arange(5, maskna=1)

In [9]: a[2] = np.NA

In [10]: a.mean()
Out[10]: NA(dtype='float64')

In [11]: a.mean(skipna=1)
Out[11]: 2.0

In [12]: a = arange(5)

In [13]: b = a.view(maskna=1)

In [14]: a.mean()
Out[14]: 2.0

In [15]: b[2] = np.NA

In [16]: b.mean()
Out[16]: NA(dtype='float64')

In [17]: b.mean(skipna=1)
Out[17]: 2.0

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Charles R Harris
On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney  wrote:

> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing  wrote:
> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
> >
> >> like. And in this case I do think we can come up with an API that will
> >> make everyone happy, but that Mark's current API probably can't be
> >> incrementally evolved to become that API.)
> >>
> >
> > No one could object to coming up with an API that makes everyone happy,
> > provided that it actually gets coded up, tested, and is found to be fast
> > and maintainable.  When you say the API probably can't be evolved, do
> > you mean that the underlying implementation also has to be redone?  And
> > if so, who will do it, and when?
> >
> > Eric
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> I personally am a bit apprehensive as I am worried about the masked
> array abstraction "leaking" through to users of pandas, something
> which I simply will not accept (why I decided against using numpy.ma
> early on, that + performance problems). Basically if having an
> understanding of masked arrays is a prerequisite for using pandas, the
> whole thing is DOA to me as it undermines the usability arguments I've
> been making about switching to Python (from R) for data analysis and
> statistical computing.
>

The missing data functionality looks far more like R than numpy.ma.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-24 Thread Lluís
Nathaniel Smith writes:
[...]
> Is the idea to continue the discussion and rework the API while it is in
> master, delaying the next release for as long as it takes to achieve
> consensus?

Well, for those who missed it, I think the first thing to do should be to
carefully read and discuss the contents of:

   https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Wes McKinney
On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing  wrote:
> On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
>
>> like. And in this case I do think we can come up with an API that will
>> make everyone happy, but that Mark's current API probably can't be
>> incrementally evolved to become that API.)
>>
>
> No one could object to coming up with an API that makes everyone happy,
> provided that it actually gets coded up, tested, and is found to be fast
> and maintainable.  When you say the API probably can't be evolved, do
> you mean that the underlying implementation also has to be redone?  And
> if so, who will do it, and when?
>
> Eric
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I personally am a bit apprehensive as I am worried about the masked
array abstraction "leaking" through to users of pandas, something
which I simply will not accept (why I decided against using numpy.ma
early on, that + performance problems). Basically if having an
understanding of masked arrays is a prerequisite for using pandas, the
whole thing is DOA to me as it undermines the usability arguments I've
been making about switching to Python (from R) for data analysis and
statistical computing.

Performance is also a concern, but based on prior discussions it seems
a great deal can be done there.

- Wes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Eric Firing
On 10/23/2011 12:34 PM, Nathaniel Smith wrote:

> like. And in this case I do think we can come up with an API that will
> make everyone happy, but that Mark's current API probably can't be
> incrementally evolved to become that API.)
>

No one could object to coming up with an API that makes everyone happy, 
provided that it actually gets coded up, tested, and is found to be fast 
and maintainable.  When you say the API probably can't be evolved, do 
you mean that the underlying implementation also has to be redone?  And 
if so, who will do it, and when?

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Nathaniel Smith
On Sun, Oct 23, 2011 at 2:29 PM, Eric Firing  wrote:
> Ultimately, though, the numpy core developers must decide what goes in
> and what does not.  Consensus is desirable but may not always be
> possible or optimal, especially if "consensus" is interpreted as
> "unanimity".  There is a risk in deciding to accept a major change, but
> it is mitigated by the ability to make future changes, and it is a risk
> that must be taken if progress is to be made.  As a numpy user, I was
> pleased to see Travis make the decision that Mark should get on with the
> coding, and I was pleased to see Charles make the decision to merge the
> pull request.

Well, let's not jump to conclusions -- this is why I wrote an email
asking questions in the first place :-).

Consensus certainly does not mean unanimity, but yes, of course,
sometimes disagreements are irreconcilable. As a benevolent
dictator[1] on other projects I've been stuck dealing with some of
these myself. But of the two core numpy developers who have been most
involved in this, Charles has just stated that he thought there had
been more discussion than had actually occurred, and Travis described
a "reasonable opposition", so it's not at all clear to me that the
core developers have decided that no consensus is possible and they
simply have to step in. (And in general, irreconcilable differences
are quite rare in FOSS projects... e.g., I remember the Subversion
folks set up a voting procedure to handle these cases, and then the
only time they used it in like a 5 year period was to settle an
argument about code formatting. Insisting on consensus really does
mostly work, even though it does often take longer than one would
like. And in this case I do think we can come up with an API that will
make everyone happy, but that Mark's current API probably can't be
incrementally evolved to become that API.)

So if there's been an executive decision then I can live with it, but
I'd like to see someone say that before I assume it's true. It's just
as likely that there was confusion, or Charles jumped the gun, or
whatever, and that consensus is still useful and desired in this case.
I hope so.

[1] https://secure.wikimedia.org/wikipedia/en/wiki/Benevolent_Dictator_For_Life

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Charles R Harris
On Sun, Oct 23, 2011 at 3:28 PM, Benjamin Root  wrote:

>
>
> On Sunday, October 23, 2011, Nathaniel Smith  wrote:
> > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
> >  wrote:
> >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett <
> matthew.br...@gmail.com>
> >> wrote:
> >>> I think this email might be a plea to the numpy steering group, and to
> >>> Travis in particular, to see if we can use a discussion of this series
> >>> of events to decide on a good way to proceed in future.
> >>
> >> Oh come, people had plenty to say, you and Nathaniel in particular.
> Mark
> >> pointed to the pull request, anyone who was interested could comment on
> it,
> >
> > Ah, this helps answer my initial question -- I can see how you might
> > have thought things were more resolved if you thought that we were
> > aware of the pull request and chose not to participate. That's a
> > reasonable source of confusion.
> >
> > But I (and presumably others) were unaware of the pull request,
> > because it turns out that actually Mark did *not* point to the pull
> > request, at least in email to either me or numpy-discussion. As far as
> > I can tell, the first time that pull request has ever been mentioned
> > on the list is in Pauli's email today. (I did worry I might have
> > missed it, so I just double-checked the archives for August 18-August
> > 27, which is the time period the pull request was open, and couldn't
> > find anything there.)
> >
> > (Also, for the record, I'd ask that next time you want to make sure
> > that there has been sufficient discussion on a controversial feature
> > that has "strong and reasonable opposition", you make more of an
> > effort to make sure that the relevant stakeholders are aware...?)
> >
> >> Benjamin Root did so, for instance. The fact things didn't go the way
> you
> >> wanted doesn't indicate insufficient discussion. And you are certainly
> >> welcome to put together an alternative and put up a pull request.
> >
> > In the interests of not turning this into a game of procedural
> > brinksmanship, can we agree that the point of pull requests and such
> > is to make sure that code which ends up in numpy releases generally
> > matches what the community wants? Obviously the community has not
> > reached a consensus on this code and API, so I'll prepare a pull
> > request to temporarily revert the change, and we can work from there.
> >
> > -- Nathaniel
> >
>
> The discussion started on mark's branches, which was referred to several
> times in emails (that's how I started).  When it reached a particular level
> of maturity, a pull request was made and additional work went into it.  The
> initial discussion happened for quite a while.
>
> Plus, my understanding is that it isnt the full Nep, but the core parts
> (but I haven't checked in a while).
>
>
In its current state, it is a working implementation that can be used to
explore the API. Bit patterns are missing and the masks are handled at the
iterator level rather than in the low level ufunc loops, so it isn't
particularly fast. IIRC, Mark was careful to leave some hooks for further
development and also set things up so that in the future masks could be
adapted to allow different mask values with different interpretations.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Eric Firing
On 10/23/2011 10:49 AM, Nathaniel Smith wrote:
> But I (and presumably others) were unaware of the pull request,
> because it turns out that actually Mark did*not*  point to the pull
> request, at least in email to either me or numpy-discussion. As far as
> I can tell, the first time that pull request has ever been mentioned
> on the list is in Pauli's email today. (I did worry I might have
> missed it, so I just double-checked the archives for August 18-August
> 27, which is the time period the pull request was open, and couldn't
> find anything there.)

Ideally, Mark's message announcing that his branch was ready for testing 
(a message that started a thread of constructive comment) would have 
mentioned the pull request:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg33151.html

Ultimately, though, the numpy core developers must decide what goes in 
and what does not.  Consensus is desirable but may not always be 
possible or optimal, especially if "consensus" is interpreted as 
"unanimity".  There is a risk in deciding to accept a major change, but 
it is mitigated by the ability to make future changes, and it is a risk 
that must be taken if progress is to be made.  As a numpy user, I was 
pleased to see Travis make the decision that Mark should get on with the 
coding, and I was pleased to see Charles make the decision to merge the 
pull request.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Benjamin Root
On Sunday, October 23, 2011, Nathaniel Smith  wrote:
> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
>  wrote:
>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett 
>> wrote:
>>> I think this email might be a plea to the numpy steering group, and to
>>> Travis in particular, to see if we can use a discussion of this series
>>> of events to decide on a good way to proceed in future.
>>
>> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
>> pointed to the pull request, anyone who was interested could comment on
it,
>
> Ah, this helps answer my initial question -- I can see how you might
> have thought things were more resolved if you thought that we were
> aware of the pull request and chose not to participate. That's a
> reasonable source of confusion.
>
> But I (and presumably others) were unaware of the pull request,
> because it turns out that actually Mark did *not* point to the pull
> request, at least in email to either me or numpy-discussion. As far as
> I can tell, the first time that pull request has ever been mentioned
> on the list is in Pauli's email today. (I did worry I might have
> missed it, so I just double-checked the archives for August 18-August
> 27, which is the time period the pull request was open, and couldn't
> find anything there.)
>
> (Also, for the record, I'd ask that next time you want to make sure
> that there has been sufficient discussion on a controversial feature
> that has "strong and reasonable opposition", you make more of an
> effort to make sure that the relevant stakeholders are aware...?)
>
>> Benjamin Root did so, for instance. The fact things didn't go the way you
>> wanted doesn't indicate insufficient discussion. And you are certainly
>> welcome to put together an alternative and put up a pull request.
>
> In the interests of not turning this into a game of procedural
> brinksmanship, can we agree that the point of pull requests and such
> is to make sure that code which ends up in numpy releases generally
> matches what the community wants? Obviously the community has not
> reached a consensus on this code and API, so I'll prepare a pull
> request to temporarily revert the change, and we can work from there.
>
> -- Nathaniel
>

The discussion started on mark's branches, which was referred to several
times in emails (that's how I started).  When it reached a particular level
of maturity, a pull request was made and additional work went into it.  The
initial discussion happened for quite a while.

Plus, my understanding is that it isnt the full Nep, but the core parts (but
I haven't checked in a while).

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Nathaniel Smith
On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
 wrote:
> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett 
> wrote:
>> I think this email might be a plea to the numpy steering group, and to
>> Travis in particular, to see if we can use a discussion of this series
>> of events to decide on a good way to proceed in future.
>
> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
> pointed to the pull request, anyone who was interested could comment on it,

Ah, this helps answer my initial question -- I can see how you might
have thought things were more resolved if you thought that we were
aware of the pull request and chose not to participate. That's a
reasonable source of confusion.

But I (and presumably others) were unaware of the pull request,
because it turns out that actually Mark did *not* point to the pull
request, at least in email to either me or numpy-discussion. As far as
I can tell, the first time that pull request has ever been mentioned
on the list is in Pauli's email today. (I did worry I might have
missed it, so I just double-checked the archives for August 18-August
27, which is the time period the pull request was open, and couldn't
find anything there.)

(Also, for the record, I'd ask that next time you want to make sure
that there has been sufficient discussion on a controversial feature
that has "strong and reasonable opposition", you make more of an
effort to make sure that the relevant stakeholders are aware...?)

> Benjamin Root did so, for instance. The fact things didn't go the way you
> wanted doesn't indicate insufficient discussion. And you are certainly
> welcome to put together an alternative and put up a pull request.

In the interests of not turning this into a game of procedural
brinksmanship, can we agree that the point of pull requests and such
is to make sure that code which ends up in numpy releases generally
matches what the community wants? Obviously the community has not
reached a consensus on this code and API, so I'll prepare a pull
request to temporarily revert the change, and we can work from there.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Scott Ransom
On 10/23/2011 04:07 PM, Robert Kern wrote:
> On Sun, Oct 23, 2011 at 20:58, Matthew Brett  wrote:
>> Hi,
>>
>> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
>>   wrote:
>>>
>>>
>>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett
>>> wrote:

 Hi,

 On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
> Hi all,
>
> I was surprised today to notice that Mark's NA mask support appears to
> have been merged into numpy master and is described in the draft
> release notes[1]. My surprise is because merging it to mainline
> without any discussion on the list seems to contradict what what
> Travis wrote in July, that it was being developed as an experiment and
> explicitly *not* intended to be merged without further discussion:
>
> "Basically, because there is not consensus and in fact a strong and
> reasonable opposition to specific points, Mark's NEP as proposed
> cannot be accepted in its entirety right now. However,  I believe an
> implementation of his NEP is useful and will be instructive in
> resolving the issues and so I have instructed him to spend Enthought
> time on the implementation. Any changes that need to be made to the
> API before it is accepted into a released form of NumPy can still be
> made even after most of the implementation is completed as far as I
> understand it."[2]
>
> Can anyone explain what the plan is here? Is the idea to continue the
> discussion and rework the API while it is in master, delaying the next
> release for as long as it takes to achieve consensus? Or is there some
> mysterious git thing going on where "master" is actually an
> experimental branch and the real mainline development is happening
> somewhere else? Or something else I'm not thinking of? Please help me
> understand.

 I don't know about you, but watching the development from a distance
 it became increasingly clear to me that this would happen.  I"m sure
 you've had the experience as I have, of mixing several desirable
 changes into the same set of commits, and it's hard work to avoid
 this.  I imagine this is what happened with Mark's MA changes.

 The result is actually an extension of the problems of the original
 discussion, which is a feeling that we the community do not have a say
 in the development.

 I think this email might be a plea to the numpy steering group, and to
 Travis in particular, to see if we can use a discussion of this series
 of events to decide on a good way to proceed in future.

>>>
>>> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
>>> pointed to the pull request, anyone who was interested could comment on it,
>>> Benjamin Root did so, for instance. The fact things didn't go the way you
>>> wanted doesn't indicate insufficient discussion. And you are certainly
>>> welcome to put together an alternative and put up a pull request.
>>
>> I was also guessing that something like this would be the reply to
>> Nathaniel's post.
>
> But it wasn't. It was a reply to your message.
>
>> I think this reply is rude because it implies some sort of sour-grapes
>> from Nathaniel, when he is politely referring back to an explicit
>> reassurance from Travis.
>
> What Travis assured did happen, just on the pull request (on which
> everyone's input was requested and where most "should this be merged?"
> discussions are *meant* to happen) rather than on the mailing list.

Except that for a project with a large user community (like numpy), you 
will _not_ get the feedback you are looking for on github pull-request 
pages.  That's because most users do not look at detailed developer 
related things like pull requests.  But they do read the mailing list.

I don't use these features so I don't have a dog in this fight.  But 
potentially controversial changes really should be discussed on the 
mailing list rather than on pull requests (and yes, I know that there 
was a lot of discussion about this stuff some months ago).

Scott



-- 
Scott M. RansomAddress:  NRAO
Phone:  (434) 296-0320   520 Edgemont Rd.
email:  sran...@nrao.edu Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Matthew Brett
Hi,

On Sun, Oct 23, 2011 at 1:07 PM, Robert Kern  wrote:
> On Sun, Oct 23, 2011 at 20:58, Matthew Brett  wrote:
>> Hi,
>>
>> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
>>  wrote:
>>>
>>>
>>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett 
>>> wrote:

 Hi,

 On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
 > Hi all,
 >
 > I was surprised today to notice that Mark's NA mask support appears to
 > have been merged into numpy master and is described in the draft
 > release notes[1]. My surprise is because merging it to mainline
 > without any discussion on the list seems to contradict what what
 > Travis wrote in July, that it was being developed as an experiment and
 > explicitly *not* intended to be merged without further discussion:
 >
 > "Basically, because there is not consensus and in fact a strong and
 > reasonable opposition to specific points, Mark's NEP as proposed
 > cannot be accepted in its entirety right now. However,  I believe an
 > implementation of his NEP is useful and will be instructive in
 > resolving the issues and so I have instructed him to spend Enthought
 > time on the implementation. Any changes that need to be made to the
 > API before it is accepted into a released form of NumPy can still be
 > made even after most of the implementation is completed as far as I
 > understand it."[2]
 >
 > Can anyone explain what the plan is here? Is the idea to continue the
 > discussion and rework the API while it is in master, delaying the next
 > release for as long as it takes to achieve consensus? Or is there some
 > mysterious git thing going on where "master" is actually an
 > experimental branch and the real mainline development is happening
 > somewhere else? Or something else I'm not thinking of? Please help me
 > understand.

 I don't know about you, but watching the development from a distance
 it became increasingly clear to me that this would happen.  I"m sure
 you've had the experience as I have, of mixing several desirable
 changes into the same set of commits, and it's hard work to avoid
 this.  I imagine this is what happened with Mark's MA changes.

 The result is actually an extension of the problems of the original
 discussion, which is a feeling that we the community do not have a say
 in the development.

 I think this email might be a plea to the numpy steering group, and to
 Travis in particular, to see if we can use a discussion of this series
 of events to decide on a good way to proceed in future.

>>>
>>> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
>>> pointed to the pull request, anyone who was interested could comment on it,
>>> Benjamin Root did so, for instance. The fact things didn't go the way you
>>> wanted doesn't indicate insufficient discussion. And you are certainly
>>> welcome to put together an alternative and put up a pull request.
>>
>> I was also guessing that something like this would be the reply to
>> Nathaniel's post.
>
> But it wasn't. It was a reply to your message.

If you read the message again I think you will see that, although it
is addressed to me, it is referring to Nathaniel's question which was,
'Why was this not discussed as promised'.  My post was 'This was
obviously going to happen and that is a problem, do you all agree and
what can we do about it?'.

>> I think this reply is rude because it implies some sort of sour-grapes
>> from Nathaniel, when he is politely referring back to an explicit
>> reassurance from Travis.
>
> What Travis assured did happen, just on the pull request (on which
> everyone's input was requested and where most "should this be merged?"
> discussions are *meant* to happen) rather than on the mailing list.

It just isn't reasonable to ask for high-level API discussions on the
pull-request in this situation.  Unless Travis tells me he did mean
that, I can only assume that he didn't and he meant that we would
revisit the high-level mailing list discussions - on the mailing list.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Robert Kern
On Sun, Oct 23, 2011 at 20:58, Matthew Brett  wrote:
> Hi,
>
> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
>  wrote:
>>
>>
>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett 
>> wrote:
>>>
>>> Hi,
>>>
>>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
>>> > Hi all,
>>> >
>>> > I was surprised today to notice that Mark's NA mask support appears to
>>> > have been merged into numpy master and is described in the draft
>>> > release notes[1]. My surprise is because merging it to mainline
>>> > without any discussion on the list seems to contradict what what
>>> > Travis wrote in July, that it was being developed as an experiment and
>>> > explicitly *not* intended to be merged without further discussion:
>>> >
>>> > "Basically, because there is not consensus and in fact a strong and
>>> > reasonable opposition to specific points, Mark's NEP as proposed
>>> > cannot be accepted in its entirety right now. However,  I believe an
>>> > implementation of his NEP is useful and will be instructive in
>>> > resolving the issues and so I have instructed him to spend Enthought
>>> > time on the implementation. Any changes that need to be made to the
>>> > API before it is accepted into a released form of NumPy can still be
>>> > made even after most of the implementation is completed as far as I
>>> > understand it."[2]
>>> >
>>> > Can anyone explain what the plan is here? Is the idea to continue the
>>> > discussion and rework the API while it is in master, delaying the next
>>> > release for as long as it takes to achieve consensus? Or is there some
>>> > mysterious git thing going on where "master" is actually an
>>> > experimental branch and the real mainline development is happening
>>> > somewhere else? Or something else I'm not thinking of? Please help me
>>> > understand.
>>>
>>> I don't know about you, but watching the development from a distance
>>> it became increasingly clear to me that this would happen.  I"m sure
>>> you've had the experience as I have, of mixing several desirable
>>> changes into the same set of commits, and it's hard work to avoid
>>> this.  I imagine this is what happened with Mark's MA changes.
>>>
>>> The result is actually an extension of the problems of the original
>>> discussion, which is a feeling that we the community do not have a say
>>> in the development.
>>>
>>> I think this email might be a plea to the numpy steering group, and to
>>> Travis in particular, to see if we can use a discussion of this series
>>> of events to decide on a good way to proceed in future.
>>>
>>
>> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
>> pointed to the pull request, anyone who was interested could comment on it,
>> Benjamin Root did so, for instance. The fact things didn't go the way you
>> wanted doesn't indicate insufficient discussion. And you are certainly
>> welcome to put together an alternative and put up a pull request.
>
> I was also guessing that something like this would be the reply to
> Nathaniel's post.

But it wasn't. It was a reply to your message.

> I think this reply is rude because it implies some sort of sour-grapes
> from Nathaniel, when he is politely referring back to an explicit
> reassurance from Travis.

What Travis assured did happen, just on the pull request (on which
everyone's input was requested and where most "should this be merged?"
discussions are *meant* to happen) rather than on the mailing list.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Charles R Harris
On Sun, Oct 23, 2011 at 12:21 PM, Nathaniel Smith  wrote:

> Hi all,
>
> I was surprised today to notice that Mark's NA mask support appears to
> have been merged into numpy master and is described in the draft
> release notes[1]. My surprise is because merging it to mainline
> without any discussion on the list seems to contradict what what
> Travis wrote in July, that it was being developed as an experiment and
> explicitly *not* intended to be merged without further discussion:
>
> "Basically, because there is not consensus and in fact a strong and
> reasonable opposition to specific points, Mark's NEP as proposed
> cannot be accepted in its entirety right now. However,  I believe an
> implementation of his NEP is useful and will be instructive in
> resolving the issues and so I have instructed him to spend Enthought
> time on the implementation. Any changes that need to be made to the
> API before it is accepted into a released form of NumPy can still be
> made even after most of the implementation is completed as far as I
> understand it."[2]
>
> Can anyone explain what the plan is here? Is the idea to continue the
> discussion and rework the API while it is in master, delaying the next
> release for as long as it takes to achieve consensus? Or is there some
> mysterious git thing going on where "master" is actually an
> experimental branch and the real mainline development is happening
> somewhere else? Or something else I'm not thinking of? Please help me
> understand.
>
>
No, it's in and has been for a while. You should spend some time with it and
make specific suggestion for improvement.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Matthew Brett
Hi,

On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris
 wrote:
>
>
> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett 
> wrote:
>>
>> Hi,
>>
>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
>> > Hi all,
>> >
>> > I was surprised today to notice that Mark's NA mask support appears to
>> > have been merged into numpy master and is described in the draft
>> > release notes[1]. My surprise is because merging it to mainline
>> > without any discussion on the list seems to contradict what what
>> > Travis wrote in July, that it was being developed as an experiment and
>> > explicitly *not* intended to be merged without further discussion:
>> >
>> > "Basically, because there is not consensus and in fact a strong and
>> > reasonable opposition to specific points, Mark's NEP as proposed
>> > cannot be accepted in its entirety right now. However,  I believe an
>> > implementation of his NEP is useful and will be instructive in
>> > resolving the issues and so I have instructed him to spend Enthought
>> > time on the implementation. Any changes that need to be made to the
>> > API before it is accepted into a released form of NumPy can still be
>> > made even after most of the implementation is completed as far as I
>> > understand it."[2]
>> >
>> > Can anyone explain what the plan is here? Is the idea to continue the
>> > discussion and rework the API while it is in master, delaying the next
>> > release for as long as it takes to achieve consensus? Or is there some
>> > mysterious git thing going on where "master" is actually an
>> > experimental branch and the real mainline development is happening
>> > somewhere else? Or something else I'm not thinking of? Please help me
>> > understand.
>>
>> I don't know about you, but watching the development from a distance
>> it became increasingly clear to me that this would happen.  I"m sure
>> you've had the experience as I have, of mixing several desirable
>> changes into the same set of commits, and it's hard work to avoid
>> this.  I imagine this is what happened with Mark's MA changes.
>>
>> The result is actually an extension of the problems of the original
>> discussion, which is a feeling that we the community do not have a say
>> in the development.
>>
>> I think this email might be a plea to the numpy steering group, and to
>> Travis in particular, to see if we can use a discussion of this series
>> of events to decide on a good way to proceed in future.
>>
>
> Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
> pointed to the pull request, anyone who was interested could comment on it,
> Benjamin Root did so, for instance. The fact things didn't go the way you
> wanted doesn't indicate insufficient discussion. And you are certainly
> welcome to put together an alternative and put up a pull request.

I was also guessing that something like this would be the reply to
Nathaniel's post.

I think this reply is rude because it implies some sort of sour-grapes
from Nathaniel, when he is politely referring back to an explicit
reassurance from Travis.

I was trying to avoid this sort of thing by concentrating on thinking
about what to do in future.

Best,

Matthew

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Charles R Harris
On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett wrote:

> Hi,
>
> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
> > Hi all,
> >
> > I was surprised today to notice that Mark's NA mask support appears to
> > have been merged into numpy master and is described in the draft
> > release notes[1]. My surprise is because merging it to mainline
> > without any discussion on the list seems to contradict what what
> > Travis wrote in July, that it was being developed as an experiment and
> > explicitly *not* intended to be merged without further discussion:
> >
> > "Basically, because there is not consensus and in fact a strong and
> > reasonable opposition to specific points, Mark's NEP as proposed
> > cannot be accepted in its entirety right now. However,  I believe an
> > implementation of his NEP is useful and will be instructive in
> > resolving the issues and so I have instructed him to spend Enthought
> > time on the implementation. Any changes that need to be made to the
> > API before it is accepted into a released form of NumPy can still be
> > made even after most of the implementation is completed as far as I
> > understand it."[2]
> >
> > Can anyone explain what the plan is here? Is the idea to continue the
> > discussion and rework the API while it is in master, delaying the next
> > release for as long as it takes to achieve consensus? Or is there some
> > mysterious git thing going on where "master" is actually an
> > experimental branch and the real mainline development is happening
> > somewhere else? Or something else I'm not thinking of? Please help me
> > understand.
>
> I don't know about you, but watching the development from a distance
> it became increasingly clear to me that this would happen.  I"m sure
> you've had the experience as I have, of mixing several desirable
> changes into the same set of commits, and it's hard work to avoid
> this.  I imagine this is what happened with Mark's MA changes.
>
> The result is actually an extension of the problems of the original
> discussion, which is a feeling that we the community do not have a say
> in the development.
>
> I think this email might be a plea to the numpy steering group, and to
> Travis in particular, to see if we can use a discussion of this series
> of events to decide on a good way to proceed in future.
>
>
Oh come, people had plenty to say, you and Nathaniel in particular.  Mark
pointed to the pull request, anyone who was interested could comment on it,
Benjamin Root did so, for instance. The fact things didn't go the way you
wanted doesn't indicate insufficient discussion. And you are certainly
welcome to put together an alternative and put up a pull request.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Matthew Brett
Hi,

On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith  wrote:
> Hi all,
>
> I was surprised today to notice that Mark's NA mask support appears to
> have been merged into numpy master and is described in the draft
> release notes[1]. My surprise is because merging it to mainline
> without any discussion on the list seems to contradict what what
> Travis wrote in July, that it was being developed as an experiment and
> explicitly *not* intended to be merged without further discussion:
>
> "Basically, because there is not consensus and in fact a strong and
> reasonable opposition to specific points, Mark's NEP as proposed
> cannot be accepted in its entirety right now. However,  I believe an
> implementation of his NEP is useful and will be instructive in
> resolving the issues and so I have instructed him to spend Enthought
> time on the implementation. Any changes that need to be made to the
> API before it is accepted into a released form of NumPy can still be
> made even after most of the implementation is completed as far as I
> understand it."[2]
>
> Can anyone explain what the plan is here? Is the idea to continue the
> discussion and rework the API while it is in master, delaying the next
> release for as long as it takes to achieve consensus? Or is there some
> mysterious git thing going on where "master" is actually an
> experimental branch and the real mainline development is happening
> somewhere else? Or something else I'm not thinking of? Please help me
> understand.

I don't know about you, but watching the development from a distance
it became increasingly clear to me that this would happen.  I"m sure
you've had the experience as I have, of mixing several desirable
changes into the same set of commits, and it's hard work to avoid
this.  I imagine this is what happened with Mark's MA changes.

The result is actually an extension of the problems of the original
discussion, which is a feeling that we the community do not have a say
in the development.

I think this email might be a plea to the numpy steering group, and to
Travis in particular, to see if we can use a discussion of this series
of events to decide on a good way to proceed in future.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA masks in the next numpy release?

2011-10-23 Thread Pauli Virtanen
23.10.2011 20:21, Nathaniel Smith kirjoitti:
> I was surprised today to notice that Mark's NA mask support appears to
> have been merged into numpy master and is described in the draft
> release notes[1]. My surprise is because merging it to mainline
> without any discussion on the list seems to contradict what what
> Travis wrote in July, that it was being developed as an experiment and
> explicitly *not* intended to be merged without further discussion:

FWIW, the changes did not go in through a "back door", but through
a pull request:

 https://github.com/numpy/numpy/pull/141

Whether issues with the API were resolved or not before merging, I don't 
know.

(One can also ask whether it would be a good idea to forward noise from 
the pull requests to the ML.)

[clip]
> Can anyone explain what the plan is here? Is the idea to continue the
> discussion and rework the API while it is in master, delaying the next
> release for as long as it takes to achieve consensus? Or is there some
> mysterious git thing going on where "master" is actually an
> experimental branch and the real mainline development is happening
> somewhere else? Or something else I'm not thinking of? Please help me
> understand.

No, master is supposed to be the integration branch with only finished 
stuff in it.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion