Re: [Numpy-discussion] What is consensus anyway
I've given several talks on the subject, but I don't think I've ever written a blog-post about it. A reasonable history does exist in the beginning of the "Guide to NumPy" which is still available for free at http://www.tramy.us/numpybook.pdf -Travis On Apr 25, 2012, at 12:18 AM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 10:02 PM, wrote: >> Sorry that I missed this part of numpy history, I always had the >> impression that numpy is run by a community led by Chuck and the young >> guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . >> (But I came late, and am just a balcony muppet.) > > Travis, when you have a free minute (ha :) it would be very nice if > you wrote up a blog post with some of the history from say the 2000s > with Numeric, through Numarray and into Numpy. Some of us saw all > that happen first hand and know it well, but since most of it simply > happened on mailing lists, conferences and assorted meetings, it's > actually quite hard to understand that history if you arrive now. > It's not really written up anywhere, and nobody is going to read 10 > years' worth of email archives :) > > Guido a while back wrote a fantastic set of posts on the history of > python itself that I've greatly enjoyed: > > http://python-history.blogspot.com/ > > something similar for numpy would be nice to have... > > Though thinking more about it, perhaps a better alternative could be a > 'history of the scipy world' where multiple people could write guest > posts about each project they've had a part of. I think something > like that could be a lot of fun, and also useful :) > > Cheers, > > f > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 05:59:09PM -0600, Charles R Harris wrote: > Travis, if you are playing the BDFL role, then just make the darn decision > and remove the code so we can get on with life. As it is you go back and > forth and that does none of us any good, you're a big guy and you're > rocking the boat. I don't agree with that decision, I'd rather evolve the > code we have, but I'm willing to compromise with your decision in this > matter. I think that Chuck's point here, in a thread on consensus, is very important: sometimes design discussions stall. If, in such situation, a BDFL makes a decision, acknowledging that he has no divine power to see the best of all option but needs to move on, it can help the project go forward. As long as nobody's feelings are hurt, a bit of dictatorship well used moves a project forward. Of course, as with any leadership, it only works because we as a community trust the leader. My 2 cents, Gael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 10:02 PM, wrote: > Sorry that I missed this part of numpy history, I always had the > impression that numpy is run by a community led by Chuck and the young > guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . > (But I came late, and am just a balcony muppet.) Travis, when you have a free minute (ha :) it would be very nice if you wrote up a blog post with some of the history from say the 2000s with Numeric, through Numarray and into Numpy. Some of us saw all that happen first hand and know it well, but since most of it simply happened on mailing lists, conferences and assorted meetings, it's actually quite hard to understand that history if you arrive now. It's not really written up anywhere, and nobody is going to read 10 years' worth of email archives :) Guido a while back wrote a fantastic set of posts on the history of python itself that I've greatly enjoyed: http://python-history.blogspot.com/ something similar for numpy would be nice to have... Though thinking more about it, perhaps a better alternative could be a 'history of the scipy world' where multiple people could write guest posts about each project they've had a part of. I think something like that could be a lot of fun, and also useful :) Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Apr 25, 2012, at 12:02 AM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 10:25 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > >> >> >> On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> >> >> Perhaps it was a bit colored, but even so, I'd like to know some specifics >> of his experience. Monotone was one of the projects that sprang up after >> Linus started using Bitkeeper as an open alternative, but that is actually >> fairly recent (2003 or so) and much of the discussion seems to have been >> carried on over IRC, rather than a mailing list. I'm guessing that some >> other projects could have taken place in the 90's, but things have changed >> so much since then that it is hard to know what was going on in that decade. >> There was certainly work on the C++ Template library, Linux, Python, and >> various utilities. But it is hard to know. In any case, I'd guess that >> Monotone was a fairly tight knit community, and about 2007 most of the >> developers left. I'd guess it was mostly a case of git and mercurial >> becoming dominant, and possibly they also lost interest in DVCS and moved on >> to other things. >> >> Numpy itself has gone through several of those transitions, and looking >> back, I think one of the problems was that when Travis left for Enthought he >> didn't officially hand off maintenance. The whole transition was a bit >> lucky, with David, Pauli, and myself unofficially continuing the work for >> the 1.3 and 1.4 releases. At that point I was hoping David could more or >> less take over, but he graduated, and Pauli would have been an excellent >> choice, but he took up his graduate studies. Turnover is a problem with open >> source, and no matter how much discussion there is, if people aren't doing >> the work the whole thing sort of peters out. > > Thanks for explaining yourself.The tone you used could earlier have been > mis-interpreted (though I would hope that people would look at your record of > contribution and give you the benefit of the doubt). Your last sentence is > very true. In this particular case, however, there is enough interest that > the whole thing will not peter out, but there is a strong chance that there > will be competing groups with divergent needs and interests vying for how the > project should develop. > > There are many people who rely on NumPy and are concerned about its progress. > NumFocus was created to fight for resources to further the whole ecosystem > and not just rely on volunteers that are available. I fundamentally do not > believe that model can scale.There are, however, ways to keep things open > source and allow people to work on NumPy as their day-job. Several companies > now exist that benefit from the NumPy code base and will be interested in > seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a > "hand-off". I never handed off the project because I never left it. I was > very busy at Enthought. I will still be busy now. But, NumPy is very > important to me and has remained so. I have spent a great deal of mental > effort trying to figure out how to contribute to its growth. Yes, I allowed > other people to contribute significantly to the project and was very > receptive to their pull requests (even when I didn't think it was the most > urgent thing or something I actually disagreed with). > > Well then, let's say you should have handed off, because you no longer had > the time to devote to it. You made the 1.2.1 release, and after that you > weren't really involved until recently. Now I'm sure that you didn't lose > interest, but you did lose the time, and I think i
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 10:25 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> >> > Perhaps it was a bit colored, but even so, I'd like to know some specifics > of his experience. Monotone was one of the projects that sprang up after > Linus started using Bitkeeper as an open alternative, but that is actually > fairly recent (2003 or so) and much of the discussion seems to have been > carried on over IRC, rather than a mailing list. I'm guessing that some > other projects could have taken place in the 90's, but things have changed > so much since then that it is hard to know what was going on in that > decade. There was certainly work on the C++ Template library, Linux, > Python, and various utilities. But it is hard to know. In any case, I'd > guess that Monotone was a fairly tight knit community, and about 2007 most > of the developers left. I'd guess it was mostly a case of git and mercurial > becoming dominant, and possibly they also lost interest in DVCS and moved > on to other things. > > Numpy itself has gone through several of those transitions, and looking > back, I think one of the problems was that when Travis left for Enthought > he didn't officially hand off maintenance. The whole transition was a bit > lucky, with David, Pauli, and myself unofficially continuing the work for > the 1.3 and 1.4 releases. At that point I was hoping David could more or > less take over, but he graduated, and Pauli would have been an excellent > choice, but he took up his graduate studies. Turnover is a problem with > open source, and no matter how much discussion there is, if people aren't > doing the work the whole thing sort of peters out. > > > Thanks for explaining yourself.The tone you used could earlier have > been mis-interpreted (though I would hope that people would look at your > record of contribution and give you the benefit of the doubt). Your last > sentence is very true. In this particular case, however, there is enough > interest that the whole thing will not peter out, but there is a strong > chance that there will be competing groups with divergent needs and > interests vying for how the project should develop. > > There are many people who rely on NumPy and are concerned about its > progress. NumFocus was created to fight for resources to further the > whole ecosystem and not just rely on volunteers that are available. I > fundamentally do not believe that model can scale.There are, however, > ways to keep things open source and allow people to work on NumPy as their > day-job. Several companies now exist that benefit from the NumPy code base > and will be interested in seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a > "hand-off". I never handed off the project because I never left it. I > was very busy at Enthought. I will still be busy now. But, NumPy is very > important to me and has remained so. I have spent a great deal of mental > effort trying to figure out how to contribute to its growth. Yes, I > allowed other people to contribute significantly to the project and was > very receptive to their pull requests (even when I didn't think it was the > most urgent thing or something I actually disagreed with). > Well then, let's say you should have handed off, because you no longer had the time to devote to it. You made the 1.2.1 release, and after that you weren't really involved until recently. Now I'm sure that you didn't lose interest, but you did lose the time, and I think it would have been better if you had realized that fact up front. As it was, I suggested to David that it was time for a 1.3 release, and we preceded without pe
Re: [Numpy-discussion] What is consensus anyway
On Wed, Apr 25, 2012 at 12:25 AM, Travis Oliphant wrote: > > On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez > wrote: >> >> On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris >> wrote: >> > Fernando, I'm not checking credentials, I'm curious. >> >> Well, at least I think that an inquisitive query about someone's >> background, phrased like that, can be very easily misread. I can only >> speak for myself, but I immediately had the impression that you were >> indeed trying to validate his background as a proxy for the >> discussion, and suggesting that others had the same curiosity... >> >> Had the question been something more like "Hey Nathaniel, what other >> projects do you think could inform our current view, maybe from stuff >> you've done in the past or lists you've lurked on?", I would have a >> very different reaction. But this sentence: >> >> """ >> I admit to a certain curiosity about your own involvement in FOSS >> projects, and I know I'm not alone in this. >> """ >> >> definitely reads to me with a rather dark and unpleasant angle. Upon >> rereading it again now, I still don't like the tone. I trust you when >> you indicate that your intent was different; perhaps it's a matter of >> phrasing, or the fact that English is not my native language and I may >> miss subtleties of native speakers. >> > > Perhaps it was a bit colored, but even so, I'd like to know some specifics > of his experience. Monotone was one of the projects that sprang up after > Linus started using Bitkeeper as an open alternative, but that is actually > fairly recent (2003 or so) and much of the discussion seems to have been > carried on over IRC, rather than a mailing list. I'm guessing that some > other projects could have taken place in the 90's, but things have changed > so much since then that it is hard to know what was going on in that decade. > There was certainly work on the C++ Template library, Linux, Python, and > various utilities. But it is hard to know. In any case, I'd guess that > Monotone was a fairly tight knit community, and about 2007 most of the > developers left. I'd guess it was mostly a case of git and mercurial > becoming dominant, and possibly they also lost interest in DVCS and moved on > to other things. > > Numpy itself has gone through several of those transitions, and looking > back, I think one of the problems was that when Travis left for Enthought he > didn't officially hand off maintenance. The whole transition was a bit > lucky, with David, Pauli, and myself unofficially continuing the work for > the 1.3 and 1.4 releases. At that point I was hoping David could more or > less take over, but he graduated, and Pauli would have been an excellent > choice, but he took up his graduate studies. Turnover is a problem with open > source, and no matter how much discussion there is, if people aren't doing > the work the whole thing sort of peters out. > > > Thanks for explaining yourself. The tone you used could earlier have been > mis-interpreted (though I would hope that people would look at your record > of contribution and give you the benefit of the doubt). Your last sentence > is very true. In this particular case, however, there is enough interest > that the whole thing will not peter out, but there is a strong chance that > there will be competing groups with divergent needs and interests vying for > how the project should develop. > > There are many people who rely on NumPy and are concerned about its > progress. NumFocus was created to fight for resources to further the whole > ecosystem and not just rely on volunteers that are available. I > fundamentally do not believe that model can scale. There are, however, > ways to keep things open source and allow people to work on NumPy as their > day-job. Several companies now exist that benefit from the NumPy code base > and will be interested in seeing it grow. > > It is a mis-characterization to imply that I "left the project" without a > "hand-off". I never handed off the project because I never left it. I > was very busy at Enthought. I will still be busy now. But, NumPy is very > important to me and has remained so. I have spent a great deal of mental > effort trying to figure out how to contribute to its growth. Yes, I > allowed other people to contribute significantly to the project and was very > receptive to their pull requests (even when I didn't think it was the most > urgent thing or something I actually disagreed with). Sorry that I missed this part of numpy history, I always had the impression that numpy is run by a community led by Chuck and the young guys, David, Pauli, Stefan, Pierre; and Robert on the mailing list . (But I came late, and am just a balcony muppet.) Josef > > That should not be interpreted as having "left". NumPy grew because it > solved a useful problem and people were willing to tolerate its problems to > make a diff
Re: [Numpy-discussion] What is consensus anyway
On Apr 24, 2012, at 10:50 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: > > Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. > > > Perhaps it was a bit colored, but even so, I'd like to know some specifics of > his experience. Monotone was one of the projects that sprang up after Linus > started using Bitkeeper as an open alternative, but that is actually fairly > recent (2003 or so) and much of the discussion seems to have been carried on > over IRC, rather than a mailing list. I'm guessing that some other projects > could have taken place in the 90's, but things have changed so much since > then that it is hard to know what was going on in that decade. There was > certainly work on the C++ Template library, Linux, Python, and various > utilities. But it is hard to know. In any case, I'd guess that Monotone was a > fairly tight knit community, and about 2007 most of the developers left. I'd > guess it was mostly a case of git and mercurial becoming dominant, and > possibly they also lost interest in DVCS and moved on to other things. > > Numpy itself has gone through several of those transitions, and looking back, > I think one of the problems was that when Travis left for Enthought he didn't > officially hand off maintenance. The whole transition was a bit lucky, with > David, Pauli, and myself unofficially continuing the work for the 1.3 and 1.4 > releases. At that point I was hoping David could more or less take over, but > he graduated, and Pauli would have been an excellent choice, but he took up > his graduate studies. Turnover is a problem with open source, and no matter > how much discussion there is, if people aren't doing the work the whole thing > sort of peters out. Thanks for explaining yourself.The tone you used could earlier have been mis-interpreted (though I would hope that people would look at your record of contribution and give you the benefit of the doubt). Your last sentence is very true. In this particular case, however, there is enough interest that the whole thing will not peter out, but there is a strong chance that there will be competing groups with divergent needs and interests vying for how the project should develop. There are many people who rely on NumPy and are concerned about its progress. NumFocus was created to fight for resources to further the whole ecosystem and not just rely on volunteers that are available. I fundamentally do not believe that model can scale.There are, however, ways to keep things open source and allow people to work on NumPy as their day-job. Several companies now exist that benefit from the NumPy code base and will be interested in seeing it grow. It is a mis-characterization to imply that I "left the project" without a "hand-off". I never handed off the project because I never left it. I was very busy at Enthought. I will still be busy now. But, NumPy is very important to me and has remained so. I have spent a great deal of mental effort trying to figure out how to contribute to its growth. Yes, I allowed other people to contribute significantly to the project and was very receptive to their pull requests (even when I didn't think it was the most urgent thing or something I actually disagreed with). That should not be interpreted as having "left". NumPy grew because it solved a useful problem and people were willing to tolerate its problems to make a difference by contributing. None of us matter as much to NumPy as the problems it helps people solve. To the degree it does that we are "lucky" to be able to contribute to the project. I hope all NumPy developers continue to be "lucky" enough to have people actually care about the problems NumPy solves now and can solve in the future. -Travis
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 8:50 PM, Charles R Harris wrote: > Turnover is a problem with open source, and no matter how much discussion > there is, if people aren't doing the work the whole thing sort of peters > out. That's very true, and I hope that by building a friendly and welcoming environment, we'll raise the chances of getting sufficient new contributors to help with this issue. For my talk at Euroscipy last year [1] I made some plots collecting git statistics that show how badly loaded most scientific python projects are on the shoulders of very, very few. I really hope we can find ways of spreading the load a bit wider, and everything we can do to make projects more appealing to new contributors is an effort worth making. Cheers, f http://fperez.org/talks/1108_euroscipy_keynote.pdf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Apr 24, 2012, at 9:41 PM, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris > wrote: >> >> >> On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: >>> >>> On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris >>> wrote: On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez wrote: > > On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt > wrote: >> If you are referring to the traditional concept of a fork, and not to >> the type we frequently make on GitHub, then I'm surprised that no one >> has objected already. What would a fork solve? To paraphrase the >> regexp saying: after forking, we'll simply have two problems. > > I concur with you here: github 'forks', yes, as many as possible! > Hopefully every one of those will produce one or more PRs :) But a > fork in the sense of a divergent parallel project? I think that would > only be indicative of a complete failure to find a way to make > progress here, and I doubt we're anywhere near that state. > > That forks are *possible* is indeed a valuable and important option in > open source software, because it means that a truly dysfunctional > original project team/direction can't hold a community hostage > forever. But that doesn't mean that full-blown forks should be > considered lightly, as they also carry enormous costs. > > I see absolutely nothing in the current scenario to even remotely > consider that a full-blown fork would be a good idea, and I hope I'm > right. It seems to me we're making progress on problems that led to > real difficulties last year, but from multiple parties I see signs > that give me reason to be optimistic that the project is getting > better, not worse. > We certainly aren't there at the moment, but I can see us heading that way. But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since then datetime, NA, polynomial work, and various other enhancements have gone in along with some 280 bug fixes. The major technical problem blocking a 1.7 release is getting datetime working reliably on windows. So I think that is where the short term effort needs to be. Meanwhile, we are spending effort to get out a 1.6.2 just so people can work with a stable version with some of the bug fixes, and potentially we will spend more time and effort to pull out the NA code. In the future there may be a transition to C++ and eventually a break with the current ABI. Or not. There are at least two motivations that get folks to write code for open source projects, scratching an itch and money. Money hasn't been a big part of the Numpy picture so far, so that leaves scratching an itch. One of the attractions of Numpy is that it is a small project, BSD licensed, and not overburdened with governance and process. This makes scratching an itch not as difficult as it would be in a large project. If Numpy remains a small project but acquires the encumbrances of a big project much of that attraction will be lost. Momentum and direction also attracts people, but numpy is stalled at the moment as the whole NA thing circles around once again. >>> >>> I don't think we need a fork, or to start maintaining separate stable >>> and unstable trees, or any of the other complicated process changes >>> that have been suggested. There are tons of projects that routinely >>> make much bigger changes than we're talking about, and they do it >>> without needing that kind of overhead. I know that these suggestions >>> are all made in good faith, but they remind me of a line from that >>> Apache page I linked earlier: "People tend to avoid conflict and >>> thrash around looking for something to substitute - somebody in >>> charge, a rule, a process, stagnation. None of these tend to be very >>> good substitutes for doing the hard work of resolving the conflict." >>> >>> I also think if you talk to potential contributors, you'll find that >>> clear, simple processes and a history of respecting everyone's input >>> are much more attractive than a no-rules free-for-all. Good >>> engineering practices are not an "encumbrance". Resolving conflicts >>> before merging is a good engineering practice. >>> >>> What happened with the NA discussion is this: >>> - There was substantial disagreement about whether NEP-style masks, >>> or indeed, focusing on a mask-based implementation *at all*, was the >>> best way forward. >>> - There was also a perceived time constraint, that we had to either >>> implement something immediately while Mark was there, or have nothing. >>> >>> So in the end, the latter concern outweighed the former, the >>> discussion was cut off, and Mark's best guess at an API was merged >>> into master.
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 9:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: > > Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. > > Perhaps it was a bit colored, but even so, I'd like to know some specifics of his experience. Monotone was one of the projects that sprang up after Linus started using Bitkeeper as an open alternative, but that is actually fairly recent (2003 or so) and much of the discussion seems to have been carried on over IRC, rather than a mailing list. I'm guessing that some other projects could have taken place in the 90's, but things have changed so much since then that it is hard to know what was going on in that decade. There was certainly work on the C++ Template library, Linux, Python, and various utilities. But it is hard to know. In any case, I'd guess that Monotone was a fairly tight knit community, and about 2007 most of the developers left. I'd guess it was mostly a case of git and mercurial becoming dominant, and possibly they also lost interest in DVCS and moved on to other things. Numpy itself has gone through several of those transitions, and looking back, I think one of the problems was that when Travis left for Enthought he didn't officially hand off maintenance. The whole transition was a bit lucky, with David, Pauli, and myself unofficially continuing the work for the 1.3 and 1.4 releases. At that point I was hoping David could more or less take over, but he graduated, and Pauli would have been an excellent choice, but he took up his graduate studies. Turnover is a problem with open source, and no matter how much discussion there is, if people aren't doing the work the whole thing sort of peters out. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 11:28 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris > wrote: >> Fernando, I'm not checking credentials, I'm curious. > > Well, at least I think that an inquisitive query about someone's > background, phrased like that, can be very easily misread. I can only > speak for myself, but I immediately had the impression that you were > indeed trying to validate his background as a proxy for the > discussion, and suggesting that others had the same curiosity... > > Had the question been something more like "Hey Nathaniel, what other > projects do you think could inform our current view, maybe from stuff > you've done in the past or lists you've lurked on?", I would have a > very different reaction. But this sentence: > > """ > I admit to a certain curiosity about your own involvement in FOSS > projects, and I know I'm not alone in this. > """ > > definitely reads to me with a rather dark and unpleasant angle. Upon > rereading it again now, I still don't like the tone. I trust you when > you indicate that your intent was different; perhaps it's a matter of > phrasing, or the fact that English is not my native language and I may > miss subtleties of native speakers. I agree with the interpretation, however, whenever I look at this thread with google gmail, then I see the first line "If you hang around big FOSS projects, you'll see the word "consensus"" I'm only hanging around in this neighborhood (9 mailing lists), so I have no idea about big FOSS projects. Josef > > Cheers, > > f > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 8:02 PM, Charles R Harris wrote: > Fernando, I'm not checking credentials, I'm curious. Well, at least I think that an inquisitive query about someone's background, phrased like that, can be very easily misread. I can only speak for myself, but I immediately had the impression that you were indeed trying to validate his background as a proxy for the discussion, and suggesting that others had the same curiosity... Had the question been something more like "Hey Nathaniel, what other projects do you think could inform our current view, maybe from stuff you've done in the past or lists you've lurked on?", I would have a very different reaction. But this sentence: """ I admit to a certain curiosity about your own involvement in FOSS projects, and I know I'm not alone in this. """ definitely reads to me with a rather dark and unpleasant angle. Upon rereading it again now, I still don't like the tone. I trust you when you indicate that your intent was different; perhaps it's a matter of phrasing, or the fact that English is not my native language and I may miss subtleties of native speakers. Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 8:56 PM, Fernando Perez wrote: > On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris > wrote: > > I admit to a certain curiosity about your own involvement in FOSS > projects, > > and I know I'm not alone in this. Google shows several years of > discussion > > on Monotone, but I have no idea what your contributions were > > Seriously??? > > Please, let's rise above this. We discuss people's opinions *on their > technical merit alone*, regardless of the background of the person > presenting them. I don't care if Linus himself shows up on the list > with a bad idea, it should be shot down; and if someone we'd never > heard of brings up a valid point, we should respect it. > > The day we start "checking credentials at the door" is the day this > project will die as an open source effort. Or at least I think so, > but perhaps I don't have enough 'commit credits' in my account for my > opinion to matter... > > Fernando, I'm not checking credentials, I'm curious. Nathaniel has experience with FOSS projects, unlike us first timers, and I'd like to know what that experience was and what he learned from it. He has also mentioned Graydon Hoare in connection with RUST, and since Graydon was the prime mover in Monotone I'd like to know the story of the project. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris wrote: > I admit to a certain curiosity about your own involvement in FOSS projects, > and I know I'm not alone in this. Google shows several years of discussion > on Monotone, but I have no idea what your contributions were Seriously??? Please, let's rise above this. We discuss people's opinions *on their technical merit alone*, regardless of the background of the person presenting them. I don't care if Linus himself shows up on the list with a bad idea, it should be shot down; and if someone we'd never heard of brings up a valid point, we should respect it. The day we start "checking credentials at the door" is the day this project will die as an open source effort. Or at least I think so, but perhaps I don't have enough 'commit credits' in my account for my opinion to matter... Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
Hi, On Tue, Apr 24, 2012 at 6:12 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: >> >> On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez >> > wrote: >> >> >> >> On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt >> >> wrote: >> >> > If you are referring to the traditional concept of a fork, and not to >> >> > the type we frequently make on GitHub, then I'm surprised that no one >> >> > has objected already. What would a fork solve? To paraphrase the >> >> > regexp saying: after forking, we'll simply have two problems. >> >> >> >> I concur with you here: github 'forks', yes, as many as possible! >> >> Hopefully every one of those will produce one or more PRs :) But a >> >> fork in the sense of a divergent parallel project? I think that would >> >> only be indicative of a complete failure to find a way to make >> >> progress here, and I doubt we're anywhere near that state. >> >> >> >> That forks are *possible* is indeed a valuable and important option in >> >> open source software, because it means that a truly dysfunctional >> >> original project team/direction can't hold a community hostage >> >> forever. But that doesn't mean that full-blown forks should be >> >> considered lightly, as they also carry enormous costs. >> >> >> >> I see absolutely nothing in the current scenario to even remotely >> >> consider that a full-blown fork would be a good idea, and I hope I'm >> >> right. It seems to me we're making progress on problems that led to >> >> real difficulties last year, but from multiple parties I see signs >> >> that give me reason to be optimistic that the project is getting >> >> better, not worse. >> >> >> > >> > We certainly aren't there at the moment, but I can see us heading that >> > way. >> > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. >> > Since >> > then datetime, NA, polynomial work, and various other enhancements have >> > gone >> > in along with some 280 bug fixes. The major technical problem blocking a >> > 1.7 >> > release is getting datetime working reliably on windows. So I think that >> > is >> > where the short term effort needs to be. Meanwhile, we are spending >> > effort >> > to get out a 1.6.2 just so people can work with a stable version with >> > some >> > of the bug fixes, and potentially we will spend more time and effort to >> > pull >> > out the NA code. In the future there may be a transition to C++ and >> > eventually a break with the current ABI. Or not. >> > >> > There are at least two motivations that get folks to write code for open >> > source projects, scratching an itch and money. Money hasn't been a big >> > part >> > of the Numpy picture so far, so that leaves scratching an itch. One of >> > the >> > attractions of Numpy is that it is a small project, BSD licensed, and >> > not >> > overburdened with governance and process. This makes scratching an itch >> > not >> > as difficult as it would be in a large project. If Numpy remains a small >> > project but acquires the encumbrances of a big project much of that >> > attraction will be lost. Momentum and direction also attracts people, >> > but >> > numpy is stalled at the moment as the whole NA thing circles around once >> > again. >> >> I don't think we need a fork, or to start maintaining separate stable >> and unstable trees, or any of the other complicated process changes >> that have been suggested. There are tons of projects that routinely >> make much bigger changes than we're talking about, and they do it >> without needing that kind of overhead. I know that these suggestions >> are all made in good faith, but they remind me of a line from that >> Apache page I linked earlier: "People tend to avoid conflict and >> thrash around looking for something to substitute - somebody in >> charge, a rule, a process, stagnation. None of these tend to be very >> good substitutes for doing the hard work of resolving the conflict." >> >> I also think if you talk to potential contributors, you'll find that >> clear, simple processes and a history of respecting everyone's input >> are much more attractive than a no-rules free-for-all. Good >> engineering practices are not an "encumbrance". Resolving conflicts >> before merging is a good engineering practice. >> >> What happened with the NA discussion is this: >> - There was substantial disagreement about whether NEP-style masks, >> or indeed, focusing on a mask-based implementation *at all*, was the >> best way forward. >> - There was also a perceived time constraint, that we had to either >> implement something immediately while Mark was there, or have nothing. >> >> So in the end, the latter concern outweighed the former, the >> discussion was cut off, and Mark's best guess at an API was merged >> into master. I totally understand how this decision made sense at the >> time, but the result is what we see now: it's
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 6:56 PM, Nathaniel Smith wrote: > On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris > wrote: > > > > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > > wrote: > >> > >> On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt > >> wrote: > >> > If you are referring to the traditional concept of a fork, and not to > >> > the type we frequently make on GitHub, then I'm surprised that no one > >> > has objected already. What would a fork solve? To paraphrase the > >> > regexp saying: after forking, we'll simply have two problems. > >> > >> I concur with you here: github 'forks', yes, as many as possible! > >> Hopefully every one of those will produce one or more PRs :) But a > >> fork in the sense of a divergent parallel project? I think that would > >> only be indicative of a complete failure to find a way to make > >> progress here, and I doubt we're anywhere near that state. > >> > >> That forks are *possible* is indeed a valuable and important option in > >> open source software, because it means that a truly dysfunctional > >> original project team/direction can't hold a community hostage > >> forever. But that doesn't mean that full-blown forks should be > >> considered lightly, as they also carry enormous costs. > >> > >> I see absolutely nothing in the current scenario to even remotely > >> consider that a full-blown fork would be a good idea, and I hope I'm > >> right. It seems to me we're making progress on problems that led to > >> real difficulties last year, but from multiple parties I see signs > >> that give me reason to be optimistic that the project is getting > >> better, not worse. > >> > > > > We certainly aren't there at the moment, but I can see us heading that > way. > > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. > Since > > then datetime, NA, polynomial work, and various other enhancements have > gone > > in along with some 280 bug fixes. The major technical problem blocking a > 1.7 > > release is getting datetime working reliably on windows. So I think that > is > > where the short term effort needs to be. Meanwhile, we are spending > effort > > to get out a 1.6.2 just so people can work with a stable version with > some > > of the bug fixes, and potentially we will spend more time and effort to > pull > > out the NA code. In the future there may be a transition to C++ and > > eventually a break with the current ABI. Or not. > > > > There are at least two motivations that get folks to write code for open > > source projects, scratching an itch and money. Money hasn't been a big > part > > of the Numpy picture so far, so that leaves scratching an itch. One of > the > > attractions of Numpy is that it is a small project, BSD licensed, and not > > overburdened with governance and process. This makes scratching an itch > not > > as difficult as it would be in a large project. If Numpy remains a small > > project but acquires the encumbrances of a big project much of that > > attraction will be lost. Momentum and direction also attracts people, but > > numpy is stalled at the moment as the whole NA thing circles around once > > again. > > I don't think we need a fork, or to start maintaining separate stable > and unstable trees, or any of the other complicated process changes > that have been suggested. There are tons of projects that routinely > make much bigger changes than we're talking about, and they do it > without needing that kind of overhead. I know that these suggestions > are all made in good faith, but they remind me of a line from that > Apache page I linked earlier: "People tend to avoid conflict and > thrash around looking for something to substitute - somebody in > charge, a rule, a process, stagnation. None of these tend to be very > good substitutes for doing the hard work of resolving the conflict." > > I also think if you talk to potential contributors, you'll find that > clear, simple processes and a history of respecting everyone's input > are much more attractive than a no-rules free-for-all. Good > engineering practices are not an "encumbrance". Resolving conflicts > before merging is a good engineering practice. > > What happened with the NA discussion is this: > - There was substantial disagreement about whether NEP-style masks, > or indeed, focusing on a mask-based implementation *at all*, was the > best way forward. > - There was also a perceived time constraint, that we had to either > implement something immediately while Mark was there, or have nothing. > > So in the end, the latter concern outweighed the former, the > discussion was cut off, and Mark's best guess at an API was merged > into master. I totally understand how this decision made sense at the > time, but the result is what we see now: it's left numpy stalled, > rifts on the mailing list, boring discussions about process, and still > no agreement about whether NEP-style masks will actually solve our > users' problems. > > Getting past
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 2:14 PM, Charles R Harris wrote: > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > wrote: >> >> On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt >> wrote: >> > If you are referring to the traditional concept of a fork, and not to >> > the type we frequently make on GitHub, then I'm surprised that no one >> > has objected already. What would a fork solve? To paraphrase the >> > regexp saying: after forking, we'll simply have two problems. >> >> I concur with you here: github 'forks', yes, as many as possible! >> Hopefully every one of those will produce one or more PRs :) But a >> fork in the sense of a divergent parallel project? I think that would >> only be indicative of a complete failure to find a way to make >> progress here, and I doubt we're anywhere near that state. >> >> That forks are *possible* is indeed a valuable and important option in >> open source software, because it means that a truly dysfunctional >> original project team/direction can't hold a community hostage >> forever. But that doesn't mean that full-blown forks should be >> considered lightly, as they also carry enormous costs. >> >> I see absolutely nothing in the current scenario to even remotely >> consider that a full-blown fork would be a good idea, and I hope I'm >> right. It seems to me we're making progress on problems that led to >> real difficulties last year, but from multiple parties I see signs >> that give me reason to be optimistic that the project is getting >> better, not worse. >> > > We certainly aren't there at the moment, but I can see us heading that way. > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since > then datetime, NA, polynomial work, and various other enhancements have gone > in along with some 280 bug fixes. The major technical problem blocking a 1.7 > release is getting datetime working reliably on windows. So I think that is > where the short term effort needs to be. Meanwhile, we are spending effort > to get out a 1.6.2 just so people can work with a stable version with some > of the bug fixes, and potentially we will spend more time and effort to pull > out the NA code. In the future there may be a transition to C++ and > eventually a break with the current ABI. Or not. > > There are at least two motivations that get folks to write code for open > source projects, scratching an itch and money. Money hasn't been a big part > of the Numpy picture so far, so that leaves scratching an itch. One of the > attractions of Numpy is that it is a small project, BSD licensed, and not > overburdened with governance and process. This makes scratching an itch not > as difficult as it would be in a large project. If Numpy remains a small > project but acquires the encumbrances of a big project much of that > attraction will be lost. Momentum and direction also attracts people, but > numpy is stalled at the moment as the whole NA thing circles around once > again. I don't think we need a fork, or to start maintaining separate stable and unstable trees, or any of the other complicated process changes that have been suggested. There are tons of projects that routinely make much bigger changes than we're talking about, and they do it without needing that kind of overhead. I know that these suggestions are all made in good faith, but they remind me of a line from that Apache page I linked earlier: "People tend to avoid conflict and thrash around looking for something to substitute - somebody in charge, a rule, a process, stagnation. None of these tend to be very good substitutes for doing the hard work of resolving the conflict." I also think if you talk to potential contributors, you'll find that clear, simple processes and a history of respecting everyone's input are much more attractive than a no-rules free-for-all. Good engineering practices are not an "encumbrance". Resolving conflicts before merging is a good engineering practice. What happened with the NA discussion is this: - There was substantial disagreement about whether NEP-style masks, or indeed, focusing on a mask-based implementation *at all*, was the best way forward. - There was also a perceived time constraint, that we had to either implement something immediately while Mark was there, or have nothing. So in the end, the latter concern outweighed the former, the discussion was cut off, and Mark's best guess at an API was merged into master. I totally understand how this decision made sense at the time, but the result is what we see now: it's left numpy stalled, rifts on the mailing list, boring discussions about process, and still no agreement about whether NEP-style masks will actually solve our users' problems. Getting past this isn't *complicated* -- it's just "hard work". > What would I suggest as a way forward with the NA option. Let's take the > issues. > > 1) Adding slots to PyArrayObject_fields. I don't think this is likely to be > a problem unless someone's code
Re: [Numpy-discussion] What is consensus anyway
On Apr 24, 2012, at 7:16 PM, Stéfan van der Walt wrote: > On Tue, Apr 24, 2012 at 4:49 PM, Charles R Harris > wrote: >> But a right to veto doesn't automatically extend to everyone who happens to >> have >> an interest in a topic. This is not my view, but it is Charles view and as he is an active developer in the NumPy community so this carries weight.I hope he can be convinced that active users are an important part of the community. Charles has made tremendous contributions to this community starting with significant code in Numarray that now lives in NumPy, significant commitment to code quality, significant effort on responding to pull requests, diligence in triaging and applying bug-fixes in tickets, and even responding to people who disagree with him. > > The time has long gone when we simply hacked on NumPy for our own > benefit; if you will, NumPy users are our customers, and they have a > stake in its development (or, to phrase it differently, I think we > have a commitment to them). > > If we strongly encourage people to discuss, but still give them an > avenue to object, we keep ourselves honest (both w.r.t. expectations > on numpy and our own insight into problems and their solutions). +1 > > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Removal of mask arrays? [was consensus]
On Apr 24, 2012, at 6:59 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:01 PM, Stéfan van der Walt wrote: > > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > >>> Why are we having a discussion on NAN's in a thread on consensus? > >>> This is a strong indicator of the problem we're facing. > >> > >> We seem to have a consensus regarding interest in the topic. > > > > For the benefit of those of us interested in both discussions, would > > you kindly start a new thread on the MA topic? > > > > In response to Travis's suggestion of writing up a short summary of > > community principles, as well as Matthew's initial formulation, I > > agree that this would be helpful in enshrining the values we cherish > > here, as well as in communicating those values to the next generation > > of developers. > > > >> From observing the community, I would guess that these values include: > > > > - That any party with an interest in NumPy is given the opportunity to > > speak and to be heard on the list. > > - That discussions that influence the course of the project take place > > openly, for anyone to observe. > > - That decisions are made once consensus is reached, i.e., if everyone > > agrees that they can live with the outcome. > > This is well stated. Thank you Stefan. > > Some will argue about what "consensus" means or who "everyone" is.But, if > we are really worrying about that, then we have stopped listening to each > other which is the number one community value that we should be promoting, > demonstrating, and living by. > > Consensus to me means that anyone who can produce a well-reasoned argument > and demonstrates by their persistence that they are actually using the code > and are aware of the issues has veto power on pull requests. At times > people with commit rights to NumPy might perform a pull request anyway, but > they should acknowledge at least in the comment (but for major changes --- > on this list) that they are doing so and provide their reasons. > > If I decide later that I think the pull request was made inappropriately in > the face of objections and the reasons were not justified, then I will > reserve the right to revert the pull request.I would like core developers > of NumPy to have the same ability to check me as well.But, if there is a > disagreement at that level, then I will reserve the right to decide. > > Basically, what we have in this situation is that the masked arrays were > added to NumPy master with serious objections to the API. What I'm trying > to decide right now is can we move forward and satisfy the objections without > removing the ndarrayobject changes entirely (I do think the concerns warrant > removal of the changes). The discussion around that is the most helpful > right now, but should take place on another thread. > > > Travis, if you are playing the BDFL role, then just make the darn decision > and remove the code so we can get on with life. As it is you go back and > forth and that does none of us any good, you're a big guy and you're rocking > the boat. I don't agree with that decision, I'd rather evolve the code we > have, but I'm willing to compromise with your decision in this matter. I'm > not willing to compromise with Nathaniel's, nor it seems vice-versa. > Nathaniel has volunteered to do the work, just ask him to submit a patch. I would like to see Nathaniel and Mark work out a document that they can both agree to and co-author that is presented to this list before doing something like that.At the very least this should summarize the feature from both perspectives. I have been encouraged by Nathaniel's willingness to contribute code, and I know Mark is looking for acceptable solutions that are still consistent with his view of things. These are all positive signs to me.We need to give this another week or two. I would prefer a solution that evolves the code as well. But, I also don't want yet another masked array implementation that gets little use but has real and long-lasting implications on the ndarray structure. There is both the effect of the growth of the ndarray structure (most uses should not worry about this at all), but also the growth of the *concept* of an ndarray --- this is a little more subtle but also has real downstream implications. Some of these implications have been pointed out already by consumers of the C-API who are unsure about how code that was not built with masks in mind will respond (I believe it will raise an error if they are using the standard APIs -- It probably should if it doesn't). Long term, I agree that the NumPy array should not be so tied to a particular *implementation* as it is now. I also don't think it should be tied so deeply to ABI compatibility.I think that was a mistake to be so devoted to this concept that we created
Re: [Numpy-discussion] What is consensus anyway
On Wed, Apr 25, 2012 at 12:49 AM, Charles R Harris wrote: > I think we adhere to these pretty well already, the problem is with the word > 'everyone'. I grew up in Massachusetts where town meetings were a tradition. > At those meetings the townsfolk voted on the budget, zoning, construction of > public buildings, use of public spaces and other such topics. A quorum of > voters was needed to make the votes binding, and apart from that the meeting > was limited to people who lived in the town, they, after all, paid the taxes > and had to live with the decisions. Outsiders could sit in by invitation, > but had to sit in a special area and were not expected to speak unless > called upon and certainly couldn't vote. So that is one tradition, a > democratic tradition with a history of success. We are a much smaller > community, physically separated, and don't need that sort of exclusivity, > but even so we have our version of resident and taxes, which consists of > hanging out on the list and contributing work. I think everyone is welcome > to express an opinion and make an argument, but not everyone has a veto. I > think a veto is a privilege, not a right, and to have that privilege I think > one needs to demonstrate an investment in the project, consisting in this > case of code contributions, code review, and other such mundane tasks that > demonstrate a larger interest and a willingness to work. Anyone can do this, > it doesn't require permission or special dispensation, Numpy is very open in > that regard. Folks working in related projects, such as ipython and pandas, > are also going to be listened to because they have made that investment in > time and work and the popularity of Numpy depends on keeping them happy. But > a right to veto doesn't automatically extend to everyone who happens to have > an interest in a topic. Consensus-seeking isn't about privilege or moral rights. It's about ruthless pragmatism. The end of your message actually gets very close to the position I'm advocating -- except that I'm saying, instead of trying to judge which people are worth keeping happy by looking up their commit record on projects you've heard of, you're safer erroring on the side of assuming that anyone taking the time to show up probably has some good reason for doing so, and that their concerns are probably shared by a larger group. You wouldn't refuse to try a chef's cooking until she's proven herself by washing dishes -- why the heck would you demand that people perform "mundane tasks" before you're willing to trust they have some insight? Acting as maintainer isn't a privilege -- it's a gift you give. So is feedback. Ignoring it is just a way of shooting your own project in the foot. - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tuesday, April 24, 2012, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > > > > > > 2012/4/24 Stéfan van der Walt > > >> > >> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > >> > wrote: > >> > The advantage of nans, I suppose, is that they are in the hardware and > >> > so > >> > >> Why are we having a discussion on NAN's in a thread on consensus? > >> This is a strong indicator of the problem we're facing. > >> > > > > We seem to have a consensus regarding interest in the topic. > > This email is mainly to Travis. > > This thread seems to be dying, condemning us to keep repeating the > same conversation with no result. > > Chuck has made it clear he is not interested in this conversation. > Until it is clear you are interested in this conversation, it will > keep dying. As you know, I think that will be very bad for numpy, > and, as you know, I care a great deal about that. > > So, please, if you care about this, and agree that something should be > done, please, say so, and if you don't agree something should be done, > say so. It can't better without your help, > > See you, > > Matthew > Matthew, I agree with the general idea of consensus, and I think many of us here agree with the ideal in principle. Quite frankly, I am not sure what more you want from us. You are only going to get so much leeway on a philosophical discussion on goverance on a numerical computation mail list. The thread keeps "dying" (i say it is getting distracted) because coders are champing at the bit to get stuff done. In a sense, i think there is a consensus, if you will, to move on. All in favor, say "Aye!" Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 4:49 PM, Charles R Harris wrote: > But a right to veto doesn't automatically extend to everyone who happens to > have > an interest in a topic. The time has long gone when we simply hacked on NumPy for our own benefit; if you will, NumPy users are our customers, and they have a stake in its development (or, to phrase it differently, I think we have a commitment to them). If we strongly encourage people to discuss, but still give them an avenue to object, we keep ourselves honest (both w.r.t. expectations on numpy and our own insight into problems and their solutions). Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 5:24 PM, Travis Oliphant wrote: > > On Apr 24, 2012, at 6:01 PM, Stéfan van der Walt wrote: > > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > > wrote: > >>> Why are we having a discussion on NAN's in a thread on consensus? > >>> This is a strong indicator of the problem we're facing. > >> > >> We seem to have a consensus regarding interest in the topic. > > > > For the benefit of those of us interested in both discussions, would > > you kindly start a new thread on the MA topic? > > > > In response to Travis's suggestion of writing up a short summary of > > community principles, as well as Matthew's initial formulation, I > > agree that this would be helpful in enshrining the values we cherish > > here, as well as in communicating those values to the next generation > > of developers. > > > >> From observing the community, I would guess that these values include: > > > > - That any party with an interest in NumPy is given the opportunity to > > speak and to be heard on the list. > > - That discussions that influence the course of the project take place > > openly, for anyone to observe. > > - That decisions are made once consensus is reached, i.e., if everyone > > agrees that they can live with the outcome. > > This is well stated. Thank you Stefan. > > Some will argue about what "consensus" means or who "everyone" is.But, > if we are really worrying about that, then we have stopped listening to > each other which is the number one community value that we should be > promoting, demonstrating, and living by. > > Consensus to me means that anyone who can produce a well-reasoned argument > and demonstrates by their persistence that they are actually using the code > and are aware of the issues has veto power on pull requests. At times > people with commit rights to NumPy might perform a pull request anyway, but > they should acknowledge at least in the comment (but for major changes --- > on this list) that they are doing so and provide their reasons. > > If I decide later that I think the pull request was made inappropriately > in the face of objections and the reasons were not justified, then I will > reserve the right to revert the pull request.I would like core > developers of NumPy to have the same ability to check me as well.But, > if there is a disagreement at that level, then I will reserve the right to > decide. > > Basically, what we have in this situation is that the masked arrays were > added to NumPy master with serious objections to the API. What I'm trying > to decide right now is can we move forward and satisfy the objections > without removing the ndarrayobject changes entirely (I do think the > concerns warrant removal of the changes). The discussion around that is > the most helpful right now, but should take place on another thread. > > Travis, if you are playing the BDFL role, then just make the darn decision and remove the code so we can get on with life. As it is you go back and forth and that does none of us any good, you're a big guy and you're rocking the boat. I don't agree with that decision, I'd rather evolve the code we have, but I'm willing to compromise with your decision in this matter. I'm not willing to compromise with Nathaniel's, nor it seems vice-versa. Nathaniel has volunteered to do the work, just ask him to submit a patch. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
2012/4/24 Stéfan van der Walt > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: > >> Why are we having a discussion on NAN's in a thread on consensus? > >> This is a strong indicator of the problem we're facing. > > > > We seem to have a consensus regarding interest in the topic. > > For the benefit of those of us interested in both discussions, would > you kindly start a new thread on the MA topic? > > In response to Travis's suggestion of writing up a short summary of > community principles, as well as Matthew's initial formulation, I > agree that this would be helpful in enshrining the values we cherish > here, as well as in communicating those values to the next generation > of developers. > > I think we adhere to these pretty well already, the problem is with the word 'everyone'. I grew up in Massachusetts where town meetings were a tradition. At those meetings the townsfolk voted on the budget, zoning, construction of public buildings, use of public spaces and other such topics. A quorum of voters was needed to make the votes binding, and apart from that the meeting was limited to people who lived in the town, they, after all, paid the taxes and had to live with the decisions. Outsiders could sit in by invitation, but had to sit in a special area and were not expected to speak unless called upon and certainly couldn't vote. So that is one tradition, a democratic tradition with a history of success. We are a much smaller community, physically separated, and don't need that sort of exclusivity, but even so we have our version of resident and taxes, which consists of hanging out on the list and contributing work. I think everyone is welcome to express an opinion and make an argument, but not everyone has a veto. I think a veto is a privilege, not a right, and to have that privilege I think one needs to demonstrate an investment in the project, consisting in this case of code contributions, code review, and other such mundane tasks that demonstrate a larger interest and a willingness to work. Anyone can do this, it doesn't require permission or special dispensation, Numpy is very open in that regard. Folks working in related projects, such as ipython and pandas, are also going to be listened to because they have made that investment in time and work and the popularity of Numpy depends on keeping them happy. But a right to veto doesn't automatically extend to everyone who happens to have an interest in a topic. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Apr 24, 2012, at 5:52 PM, Matthew Brett wrote: > Hi, > > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: >> >> >> 2012/4/24 Stéfan van der Walt >>> >>> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris >>> wrote: The advantage of nans, I suppose, is that they are in the hardware and so >>> >>> Why are we having a discussion on NAN's in a thread on consensus? >>> This is a strong indicator of the problem we're facing. >>> >> >> We seem to have a consensus regarding interest in the topic. > > This email is mainly to Travis. > > This thread seems to be dying, condemning us to keep repeating the > same conversation with no result. > > Chuck has made it clear he is not interested in this conversation. > Until it is clear you are interested in this conversation, it will > keep dying. As you know, I think that will be very bad for numpy, > and, as you know, I care a great deal about that. I am interested in the conversation, but I think I've already stated my views as well as I know how. I'm not sure what else I should do at this point. We do need consensus (defined as the absence of serious objectors) for me to agree to a NumPy 1.X release. I don't think it helps us get to a consensus to further discuss non-technical issues at this point. There is much interest in ideas for finding common ground in the masked array situation, but that should happen on another thread. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Apr 24, 2012, at 6:01 PM, Stéfan van der Walt wrote: > On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris > wrote: >>> Why are we having a discussion on NAN's in a thread on consensus? >>> This is a strong indicator of the problem we're facing. >> >> We seem to have a consensus regarding interest in the topic. > > For the benefit of those of us interested in both discussions, would > you kindly start a new thread on the MA topic? > > In response to Travis's suggestion of writing up a short summary of > community principles, as well as Matthew's initial formulation, I > agree that this would be helpful in enshrining the values we cherish > here, as well as in communicating those values to the next generation > of developers. > >> From observing the community, I would guess that these values include: > > - That any party with an interest in NumPy is given the opportunity to > speak and to be heard on the list. > - That discussions that influence the course of the project take place > openly, for anyone to observe. > - That decisions are made once consensus is reached, i.e., if everyone > agrees that they can live with the outcome. This is well stated. Thank you Stefan. Some will argue about what "consensus" means or who "everyone" is.But, if we are really worrying about that, then we have stopped listening to each other which is the number one community value that we should be promoting, demonstrating, and living by. Consensus to me means that anyone who can produce a well-reasoned argument and demonstrates by their persistence that they are actually using the code and are aware of the issues has veto power on pull requests. At times people with commit rights to NumPy might perform a pull request anyway, but they should acknowledge at least in the comment (but for major changes --- on this list) that they are doing so and provide their reasons. If I decide later that I think the pull request was made inappropriately in the face of objections and the reasons were not justified, then I will reserve the right to revert the pull request.I would like core developers of NumPy to have the same ability to check me as well.But, if there is a disagreement at that level, then I will reserve the right to decide. Basically, what we have in this situation is that the masked arrays were added to NumPy master with serious objections to the API. What I'm trying to decide right now is can we move forward and satisfy the objections without removing the ndarrayobject changes entirely (I do think the concerns warrant removal of the changes). The discussion around that is the most helpful right now, but should take place on another thread. Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
Thanks for the reminder, Stefan and keeping us on track. It is very helpful to those trying to sort through the messages to keep the discussions to one subject per thread. -Travis On Apr 24, 2012, at 2:23 PM, Stéfan van der Walt wrote: > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: >> The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > Stéfan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris wrote: >> Why are we having a discussion on NAN's in a thread on consensus? >> This is a strong indicator of the problem we're facing. > > We seem to have a consensus regarding interest in the topic. For the benefit of those of us interested in both discussions, would you kindly start a new thread on the MA topic? In response to Travis's suggestion of writing up a short summary of community principles, as well as Matthew's initial formulation, I agree that this would be helpful in enshrining the values we cherish here, as well as in communicating those values to the next generation of developers. >From observing the community, I would guess that these values include: - That any party with an interest in NumPy is given the opportunity to speak and to be heard on the list. - That discussions that influence the course of the project take place openly, for anyone to observe. - That decisions are made once consensus is reached, i.e., if everyone agrees that they can live with the outcome. To summarize: NumPy development that is free & fair, open and unified. We'll sometimes mess up and not follow our own guidelines, but with them in place at least we'll have something to refer back to as a reminder. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
Hi, On Tue, Apr 24, 2012 at 2:25 PM, Charles R Harris wrote: > > > 2012/4/24 Stéfan van der Walt >> >> On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris >> wrote: >> > The advantage of nans, I suppose, is that they are in the hardware and >> > so >> >> Why are we having a discussion on NAN's in a thread on consensus? >> This is a strong indicator of the problem we're facing. >> > > We seem to have a consensus regarding interest in the topic. This email is mainly to Travis. This thread seems to be dying, condemning us to keep repeating the same conversation with no result. Chuck has made it clear he is not interested in this conversation. Until it is clear you are interested in this conversation, it will keep dying. As you know, I think that will be very bad for numpy, and, as you know, I care a great deal about that. So, please, if you care about this, and agree that something should be done, please, say so, and if you don't agree something should be done, say so. It can't better without your help, See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
2012/4/24 Stéfan van der Walt > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: > > The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > We seem to have a consensus regarding interest in the topic. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
On Tue, Apr 24, 2012 at 4:03 PM, Charles R Harris wrote: > > Should be 6 in 1.6 > > # Binary compatibility version number. This number is increased whenever the > # C-API is changed such that binary compatibility is broken, i.e. whenever a > # recompile of extension modules is needed. > C_ABI_VERSION = 0x0109 > > # Minor API version. This number is increased whenever a change is made to > the > # C-API -- whether it breaks binary compatibility or not. Some changes, > such > # as adding a function pointer to the end of the function table, can be made > # without breaking binary compatibility. In this case, only the > C_API_VERSION > # (*not* C_ABI_VERSION) would be increased. Whenever binary compatibility > is > # broken, both C_API_VERSION and C_ABI_VERSION should be increased. > C_API_VERSION = 0x0006 > > It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing > 7 for 1.6? My bad, when I grepped, I found this line: ./build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h:#define NPY_API_VERSION 0x0007 That tell the version 0x0007. But this is in a file in the build directory. As I my last build was with a later version, it isn't the right number! Fred ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
2012/4/24 Frédéric Bastien > Hi, > > I finished reading the doc I listed in the other thread. As the NA > stuff will be marked as Experimental in numpy 1.7, why not define a > new macro like NPY_NA_VERSION that will give the version of the NA > C-api? That way, people will be able to detect if there is change in > the c-api of NA when they write it. So this will allow to break this > interface more easily. We would just need to make a big warning to do > this check it. > > This sounds like a good thing to do. > The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow > removing feature. Probably a function like PyArray_GetNACVersion would > be useful too.[1] > > Continuing on my previous post, old code need to be changed to don't > accept NA inputs. With the current trunk, this can be done like this: > > PyObject* an_input = ; > > if (!PyArray_Check(an_input) { >PyErr_SetString(PyExc_ValueError, "expected an ndarray"); >%(fail)s > } > > if (NPY_FEATURE_VERSION >= 0x0008){ > if(PyArray_HasNASupport((PyArrayObject*) an_input )){ > PyErr_SetString(PyExc_ValueError, "masked array are not > supported by this function"); > %(fail)s > } > } > > In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This > value wasn't changed in the trunk. I suppose it will be raised to > 0x0008 for numpy 1.7. > > Can we suppose that old code check input with PyArray_Check()? I think > so, but it would be really helpful if people that are here for longer > them me can confirm/deny this? > > Should be 6 in 1.6 # Binary compatibility version number. This number is increased whenever the # C-API is changed such that binary compatibility is broken, i.e. whenever a # recompile of extension modules is needed. C_ABI_VERSION = 0x0109 # Minor API version. This number is increased whenever a change is made to the # C-API -- whether it breaks binary compatibility or not. Some changes, such # as adding a function pointer to the end of the function table, can be made # without breaking binary compatibility. In this case, only the C_API_VERSION # (*not* C_ABI_VERSION) would be increased. Whenever binary compatibility is # broken, both C_API_VERSION and C_ABI_VERSION should be increased. C_API_VERSION = 0x0006 It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing 7 for 1.6? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 3:23 PM, Stéfan van der Walt wrote: > On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris > wrote: > > The advantage of nans, I suppose, is that they are in the hardware and so > > Why are we having a discussion on NAN's in a thread on consensus? > This is a strong indicator of the problem we're facing. > > Stéfan > Good catch! Looks like we got off-track when the discussion talked about forks. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 11:12 AM, Charles R Harris wrote: > The advantage of nans, I suppose, is that they are in the hardware and so Why are we having a discussion on NAN's in a thread on consensus? This is a strong indicator of the problem we're facing. Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 2:35 PM, Benjamin Root wrote: > On Tue, Apr 24, 2012 at 2:12 PM, Charles R Harris > wrote: >> >> >> >> On Tue, Apr 24, 2012 at 9:25 AM, wrote: >>> >>> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >>> wrote: >>> > Hi, >>> > >>> > Le 24/04/2012 15:14, Charles R Harris a écrit : >>> >> >>> >> a) All arrays should be implicitly masked, even if the mask isn't >>> >> initially allocated. The maskna keyword can then be removed, taking >>> >> with it the sense that there are two kinds of arrays. >>> >> >>> > >>> > From my lazy user perspective, having masked and non-masked arrays >>> > share >>> > the same "look and feel" would be a number one advantage over the >>> > existing numpy.ma arrays. I would like masked array to be as >>> > transparent >>> > as possible. >>> >>> I don't have any opinion about internal implementation. >>> >>> But users needs to be aware of whether they have masked arrays or not. >>> Since many functions (most of scipy) wouldn't know how to handle NA >>> and don't do any checks, (and shouldn't in my opinion if the NA check >>> is costly). The result might be silently wrong numbers depending on >>> the implementation. >> >> >> There should be a flag saying whether or not NA has been allocated and >> allocation happens when NA is assigned to an array item, so that should be >> fast. I don't think scipy currently deals with masked arrays in all areas,, >> so I believe that the same problem exists there and would also exist for >> missing data types. I think this sort of compatibility problem is worth a >> whole discussion by itself. >> >>> >>> >>> > >>> >> b) There needs to be a distinction between missing and ignore. The >>> >> mechanism for this is already in place in the payload type, although >>> >> it isn't clear to me that that is uniformly used in all the NA code. >>> >> There is also a place for missing *and* ignored. Which leads to >>> > >>> > If the idea of having two payloads is to avoid a maximum of "skipna & >>> > friends" extra keywords, I would like it much. My feeling with my small >>> > experience with R is that I end up calling every function with a >>> > different magical set of keywords (na.rm, na.action, ... and I forgot). >>> >>> There is a reason for requiring the user to decide what to do about NA's. >>> Either we have utility functions/methods to help the user change the >>> arrays and treat NA's before calling a function, or the function needs >>> to ask the user what should be done about possible NAs. >>> Doing it automatically might only be useful for specialised packages. >>> >> >> That's what the different payloads would do. I think the common use case >> would always have the ignore bit set. What are the other sorts of actions >> you are interested in, and should they be part of the functions in Numpy, >> such as mean and std, or should they rather implemented in stats packages >> that may be more specialized? I see numpy.ma currently used in the following >> spots in scipy: I think most functions that operate on an axis are mostly unambiguous ignore, std, mean, var, histogram, should stay in numpy, np.cov might have pairwise or row/column wise deletion option (but I don't know what other packages are doing). (While I had to run off, Nathaniel explained this.) The main cases in stats (or statsmodels) for handling NaNs or NAs would be rowwise ignore or pretend temporarily that they are zero or some other neutral value. >> > > Like you said, this whole issue probably should be in a separate discussion, > but I would like to point out here with my thoughts on default payload. If > we don't have some sort of mechanism for flagging which functions are > NA-friendly or not, then it would be wise to have NA default to NaN > behavior. If only to prevent bugs that mess up data from being undetected. In scipy.stats it's currently the responsibility of the user, unless explicitly mentioned that a function knows how to handle nans or masked arrays, the default is "we don't check" and what you get returned might be anything. If there is a flag (and a cheap way to verify whether there are NaNs or NAs), then we could just add a check in every function. Josef > > That being said, the determination of NA payload is tricky. Some functions > may need to react differently to an NA. One that comes to mind is > np.gradient(). However, other functions may not need to do anything because > they depend entirely upon other functions that have already been updated to > support NA. > > Cheers! > Ben Root > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 2:43 PM, Pierre Haessig wrote: > If the idea of having two payloads is to avoid a maximum of "skipna & > friends" extra keywords, I would like it much. My feeling with my small > experience with R is that I end up calling every function with a > different magical set of keywords (na.rm, na.action, ... and I forgot). While I can't in general defend R on consistency grounds, there is a logic to this particular case. Most basic R functions like 'sum' take the na.rm= argument, which can be True or False and is equivalent to the skipna argument we've talked about for ufuncs. The functions that take other arguments (like na.action= for model fitting functions, or use= for their equivalent of np.corrcoef) are the ones that have *more* than 2 ways to handle NAs. E.g. model fitting functions given NAs can raise an error, skip the NA cases, or pass the NA cases through, and the correlation matrix function has different options for what to do with cases where one column has an NA but there are two others that don't. Having a distinction between missing and ignored values doesn't really affect whether you need such options. (If anything I guess it could make such options even more complicated -- what if I want my regression function to error out on missing but skip over ignored values, etc.) - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 2:12 PM, Charles R Harris wrote: > > > On Tue, Apr 24, 2012 at 9:25 AM, wrote: > >> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >> wrote: >> > Hi, >> > >> > Le 24/04/2012 15:14, Charles R Harris a écrit : >> >> >> >> a) All arrays should be implicitly masked, even if the mask isn't >> >> initially allocated. The maskna keyword can then be removed, taking >> >> with it the sense that there are two kinds of arrays. >> >> >> > >> > From my lazy user perspective, having masked and non-masked arrays share >> > the same "look and feel" would be a number one advantage over the >> > existing numpy.ma arrays. I would like masked array to be as >> transparent >> > as possible. >> >> I don't have any opinion about internal implementation. >> >> But users needs to be aware of whether they have masked arrays or not. >> Since many functions (most of scipy) wouldn't know how to handle NA >> and don't do any checks, (and shouldn't in my opinion if the NA check >> is costly). The result might be silently wrong numbers depending on >> the implementation. >> > > There should be a flag saying whether or not NA has been allocated and > allocation happens when NA is assigned to an array item, so that should be > fast. I don't think scipy currently deals with masked arrays in all areas,, > so I believe that the same problem exists there and would also exist for > missing data types. I think this sort of compatibility problem is worth a > whole discussion by itself. > > >> >> > >> >> b) There needs to be a distinction between missing and ignore. The >> >> mechanism for this is already in place in the payload type, although >> >> it isn't clear to me that that is uniformly used in all the NA code. >> >> There is also a place for missing *and* ignored. Which leads to >> > >> > If the idea of having two payloads is to avoid a maximum of "skipna & >> > friends" extra keywords, I would like it much. My feeling with my small >> > experience with R is that I end up calling every function with a >> > different magical set of keywords (na.rm, na.action, ... and I forgot). >> >> There is a reason for requiring the user to decide what to do about NA's. >> Either we have utility functions/methods to help the user change the >> arrays and treat NA's before calling a function, or the function needs >> to ask the user what should be done about possible NAs. >> Doing it automatically might only be useful for specialised packages. >> >> > That's what the different payloads would do. I think the common use case > would always have the ignore bit set. What are the other sorts of actions > you are interested in, and should they be part of the functions in Numpy, > such as mean and std, or should they rather implemented in stats packages > that may be more specialized? I see numpy.ma currently used in the > following spots in scipy: > > Like you said, this whole issue probably should be in a separate discussion, but I would like to point out here with my thoughts on default payload. If we don't have some sort of mechanism for flagging which functions are NA-friendly or not, then it would be wise to have NA default to NaN behavior. If only to prevent bugs that mess up data from being undetected. That being said, the determination of NA payload is tricky. Some functions may need to react differently to an NA. One that comes to mind is np.gradient(). However, other functions may not need to do anything because they depend entirely upon other functions that have already been updated to support NA. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 12:12 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > > > On Tue, Apr 24, 2012 at 9:25 AM, wrote: > >> On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig >> wrote: >> > Hi, >> > >> > Le 24/04/2012 15:14, Charles R Harris a écrit : >> >> >> >> a) All arrays should be implicitly masked, even if the mask isn't >> >> initially allocated. The maskna keyword can then be removed, taking >> >> with it the sense that there are two kinds of arrays. >> >> >> > >> > From my lazy user perspective, having masked and non-masked arrays share >> > the same "look and feel" would be a number one advantage over the >> > existing numpy.ma arrays. I would like masked array to be as >> transparent >> > as possible. >> >> I don't have any opinion about internal implementation. >> >> But users needs to be aware of whether they have masked arrays or not. >> Since many functions (most of scipy) wouldn't know how to handle NA >> and don't do any checks, (and shouldn't in my opinion if the NA check >> is costly). The result might be silently wrong numbers depending on >> the implementation. >> > > There should be a flag saying whether or not NA has been allocated and > allocation happens when NA is assigned to an array item, so that should be > fast. I don't think scipy currently deals with masked arrays in all areas,, > so I believe that the same problem exists there and would also exist for > missing data types. I think this sort of compatibility problem is worth a > whole discussion by itself. > To clarify a bit, a item could be marked as both missing and ignore. An item that is marked missing will propagate as missing, but if it is also ignored then things like mean and std will skip it. There would also be a clear operation that would clear the ignore bit but keep the missing bit. Now I can see the advantage of explicitly specifying behavior in functions as one is knows right at the spot what is intended whereas with the other alternative one needs to know the history of the array and whether ignore was ever set, but in that sense it is just like having default keyword values and could be implemented as such. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 9:25 AM, wrote: > On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig > wrote: > > Hi, > > > > Le 24/04/2012 15:14, Charles R Harris a écrit : > >> > >> a) All arrays should be implicitly masked, even if the mask isn't > >> initially allocated. The maskna keyword can then be removed, taking > >> with it the sense that there are two kinds of arrays. > >> > > > > From my lazy user perspective, having masked and non-masked arrays share > > the same "look and feel" would be a number one advantage over the > > existing numpy.ma arrays. I would like masked array to be as transparent > > as possible. > > I don't have any opinion about internal implementation. > > But users needs to be aware of whether they have masked arrays or not. > Since many functions (most of scipy) wouldn't know how to handle NA > and don't do any checks, (and shouldn't in my opinion if the NA check > is costly). The result might be silently wrong numbers depending on > the implementation. > There should be a flag saying whether or not NA has been allocated and allocation happens when NA is assigned to an array item, so that should be fast. I don't think scipy currently deals with masked arrays in all areas,, so I believe that the same problem exists there and would also exist for missing data types. I think this sort of compatibility problem is worth a whole discussion by itself. > > > > >> b) There needs to be a distinction between missing and ignore. The > >> mechanism for this is already in place in the payload type, although > >> it isn't clear to me that that is uniformly used in all the NA code. > >> There is also a place for missing *and* ignored. Which leads to > > > > If the idea of having two payloads is to avoid a maximum of "skipna & > > friends" extra keywords, I would like it much. My feeling with my small > > experience with R is that I end up calling every function with a > > different magical set of keywords (na.rm, na.action, ... and I forgot). > > There is a reason for requiring the user to decide what to do about NA's. > Either we have utility functions/methods to help the user change the > arrays and treat NA's before calling a function, or the function needs > to ask the user what should be done about possible NAs. > Doing it automatically might only be useful for specialised packages. > > That's what the different payloads would do. I think the common use case would always have the ignore bit set. What are the other sorts of actions you are interested in, and should they be part of the functions in Numpy, such as mean and std, or should they rather implemented in stats packages that may be more specialized? I see numpy.ma currently used in the following spots in scipy: scipy/stats/mstats_extras.py scipy/stats/tests/test_mstats_extras.py scipy/stats/tests/test_mstats_basic.py scipy/stats/mstats_basic.py scipy/signal/filter_design.py scipy/optimize/optimize.py The advantage of nans, I suppose, is that they are in the hardware and so already universally part of Numpy. NA would be introduced, so would require a bit more work. I expect it will be several (many) years before they are dealt with as a matter of course. At minimum, one would need to check if the masked flag is set. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
Hi, On Tue, Apr 24, 2012 at 6:14 AM, Charles R Harris wrote: > > > On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez > wrote: >> >> On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt >> wrote: >> > If you are referring to the traditional concept of a fork, and not to >> > the type we frequently make on GitHub, then I'm surprised that no one >> > has objected already. What would a fork solve? To paraphrase the >> > regexp saying: after forking, we'll simply have two problems. >> >> I concur with you here: github 'forks', yes, as many as possible! >> Hopefully every one of those will produce one or more PRs :) But a >> fork in the sense of a divergent parallel project? I think that would >> only be indicative of a complete failure to find a way to make >> progress here, and I doubt we're anywhere near that state. >> >> That forks are *possible* is indeed a valuable and important option in >> open source software, because it means that a truly dysfunctional >> original project team/direction can't hold a community hostage >> forever. But that doesn't mean that full-blown forks should be >> considered lightly, as they also carry enormous costs. >> >> I see absolutely nothing in the current scenario to even remotely >> consider that a full-blown fork would be a good idea, and I hope I'm >> right. It seems to me we're making progress on problems that led to >> real difficulties last year, but from multiple parties I see signs >> that give me reason to be optimistic that the project is getting >> better, not worse. >> > > We certainly aren't there at the moment, but I can see us heading that way. > But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since > then datetime, NA, polynomial work, and various other enhancements have gone > in along with some 280 bug fixes. The major technical problem blocking a 1.7 > release is getting datetime working reliably on windows. So I think that is > where the short term effort needs to be. Meanwhile, we are spending effort > to get out a 1.6.2 just so people can work with a stable version with some > of the bug fixes, and potentially we will spend more time and effort to pull > out the NA code. In the future there may be a transition to C++ and > eventually a break with the current ABI. Or not. > > There are at least two motivations that get folks to write code for open > source projects, scratching an itch and money. Money hasn't been a big part > of the Numpy picture so far, so that leaves scratching an itch. One of the > attractions of Numpy is that it is a small project, BSD licensed, and not > overburdened with governance and process. This makes scratching an itch not > as difficult as it would be in a large project. If Numpy remains a small > project but acquires the encumbrances of a big project much of that > attraction will be lost. Momentum and direction also attracts people, but > numpy is stalled at the moment as the whole NA thing circles around once > again. I think your assumptions are incorrect, although I have seen them before. No stated process leads to less encumbrance if and only if the implicit process works. It clearly doesn't work, precisely because the NA thing is circling round and round again. And the governance discussion. And previously the ABI breakage discussion. If you are on other mailing lists, I'm sure you are, you'll see that this does not happen to - say - Cython, or Sympy. In particular, I have not seen, on those lists, the current numpy way of simply blocking or avoiding discussion. Everything is discussed out to agreement, or at least until all parties accept the way forward. At the moment, the only hope I could imagine for the 'no governance is good governance' method, is that all those who don't agree would just shut up. It would be more peaceful, but for the reasons stated by Nathaniel, I think that would be a very bad outcome. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Tue, Apr 24, 2012 at 9:43 AM, Pierre Haessig wrote: > Hi, > > Le 24/04/2012 15:14, Charles R Harris a écrit : >> >> a) All arrays should be implicitly masked, even if the mask isn't >> initially allocated. The maskna keyword can then be removed, taking >> with it the sense that there are two kinds of arrays. >> > > From my lazy user perspective, having masked and non-masked arrays share > the same "look and feel" would be a number one advantage over the > existing numpy.ma arrays. I would like masked array to be as transparent > as possible. I don't have any opinion about internal implementation. But users needs to be aware of whether they have masked arrays or not. Since many functions (most of scipy) wouldn't know how to handle NA and don't do any checks, (and shouldn't in my opinion if the NA check is costly). The result might be silently wrong numbers depending on the implementation. > >> b) There needs to be a distinction between missing and ignore. The >> mechanism for this is already in place in the payload type, although >> it isn't clear to me that that is uniformly used in all the NA code. >> There is also a place for missing *and* ignored. Which leads to > > If the idea of having two payloads is to avoid a maximum of "skipna & > friends" extra keywords, I would like it much. My feeling with my small > experience with R is that I end up calling every function with a > different magical set of keywords (na.rm, na.action, ... and I forgot). There is a reason for requiring the user to decide what to do about NA's. Either we have utility functions/methods to help the user change the arrays and treat NA's before calling a function, or the function needs to ask the user what should be done about possible NAs. Doing it automatically might only be useful for specialised packages. My 2c Josef > > My 2 lazy user cents... > > Best, > Pierre > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
Hi, I finished reading the doc I listed in the other thread. As the NA stuff will be marked as Experimental in numpy 1.7, why not define a new macro like NPY_NA_VERSION that will give the version of the NA C-api? That way, people will be able to detect if there is change in the c-api of NA when they write it. So this will allow to break this interface more easily. We would just need to make a big warning to do this check it. The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow removing feature. Probably a function like PyArray_GetNACVersion would be useful too.[1] Continuing on my previous post, old code need to be changed to don't accept NA inputs. With the current trunk, this can be done like this: PyObject* an_input = ; if (!PyArray_Check(an_input) { PyErr_SetString(PyExc_ValueError, "expected an ndarray"); %(fail)s } if (NPY_FEATURE_VERSION >= 0x0008){ if(PyArray_HasNASupport((PyArrayObject*) an_input )){ PyErr_SetString(PyExc_ValueError, "masked array are not supported by this function"); %(fail)s } } In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This value wasn't changed in the trunk. I suppose it will be raised to 0x0008 for numpy 1.7. Can we suppose that old code check input with PyArray_Check()? I think so, but it would be really helpful if people that are here for longer them me can confirm/deny this? Frédéric [1] http://docs.scipy.org/doc/numpy/reference/c-api.array.html#checking-the-api-version ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
Hi, Le 24/04/2012 15:14, Charles R Harris a écrit : > > a) All arrays should be implicitly masked, even if the mask isn't > initially allocated. The maskna keyword can then be removed, taking > with it the sense that there are two kinds of arrays. > From my lazy user perspective, having masked and non-masked arrays share the same "look and feel" would be a number one advantage over the existing numpy.ma arrays. I would like masked array to be as transparent as possible. > b) There needs to be a distinction between missing and ignore. The > mechanism for this is already in place in the payload type, although > it isn't clear to me that that is uniformly used in all the NA code. > There is also a place for missing *and* ignored. Which leads to If the idea of having two payloads is to avoid a maximum of "skipna & friends" extra keywords, I would like it much. My feeling with my small experience with R is that I end up calling every function with a different magical set of keywords (na.rm, na.action, ... and I forgot). My 2 lazy user cents... Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Mon, Apr 23, 2012 at 11:35 PM, Fernando Perez wrote: > On Mon, Apr 23, 2012 at 8:49 PM, Stéfan van der Walt > wrote: > > If you are referring to the traditional concept of a fork, and not to > > the type we frequently make on GitHub, then I'm surprised that no one > > has objected already. What would a fork solve? To paraphrase the > > regexp saying: after forking, we'll simply have two problems. > > I concur with you here: github 'forks', yes, as many as possible! > Hopefully every one of those will produce one or more PRs :) But a > fork in the sense of a divergent parallel project? I think that would > only be indicative of a complete failure to find a way to make > progress here, and I doubt we're anywhere near that state. > > That forks are *possible* is indeed a valuable and important option in > open source software, because it means that a truly dysfunctional > original project team/direction can't hold a community hostage > forever. But that doesn't mean that full-blown forks should be > considered lightly, as they also carry enormous costs. > > I see absolutely nothing in the current scenario to even remotely > consider that a full-blown fork would be a good idea, and I hope I'm > right. It seems to me we're making progress on problems that led to > real difficulties last year, but from multiple parties I see signs > that give me reason to be optimistic that the project is getting > better, not worse. > > We certainly aren't there at the moment, but I can see us heading that way. But let's back up a bit. Numpy 1.6.0 came out just about 1 year ago. Since then datetime, NA, polynomial work, and various other enhancements have gone in along with some 280 bug fixes. The major technical problem blocking a 1.7 release is getting datetime working reliably on windows. So I think that is where the short term effort needs to be. Meanwhile, we are spending effort to get out a 1.6.2 just so people can work with a stable version with some of the bug fixes, and potentially we will spend more time and effort to pull out the NA code. In the future there may be a transition to C++ and eventually a break with the current ABI. Or not. There are at least two motivations that get folks to write code for open source projects, scratching an itch and money. Money hasn't been a big part of the Numpy picture so far, so that leaves scratching an itch. One of the attractions of Numpy is that it is a small project, BSD licensed, and not overburdened with governance and process. This makes scratching an itch not as difficult as it would be in a large project. If Numpy remains a small project but acquires the encumbrances of a big project much of that attraction will be lost. Momentum and direction also attracts people, but numpy is stalled at the moment as the whole NA thing circles around once again. What would I suggest as a way forward with the NA option. Let's take the issues. 1) Adding slots to PyArrayObject_fields. I don't think this is likely to be a problem unless someone's code passes the struct by value or uses assignment to initialize a statically allocated instance. I'm not saying no one does that, low level scientific code can contain all sorts of bizarre and astonishing constructs and it is also possible that these sort of things might turn up in an old FORTRAN program. The question here is whether to allow any changes at all, and I think we will have to in the future. Given that, consistent use of accessors will make later changes to the organization or implementation of the base structure transparent. Numpy itself now uses accessors for the heritage slots, but not for the new NA slots. So I suggest at a minimum adding accessors for the maskna_dtype, maskna_data, and maskna_strides. Of course, later removing these slots will still remain a problem. 2) NA. This breaks down into API and implementation issues. Personally, I think marking the NA stuff experimental leaves room to modify both and would prefer to go with what we have and change it into whatever looks best by modification through pull requests. This kicks the can down the road, but not so far that people sufficiently interested in working on the topic can't get modifications in. My own preferences for future API modifications are as follows. a) All arrays should be implicitly masked, even if the mask isn't initially allocated. The maskna keyword can then be removed, taking with it the sense that there are two kinds of arrays. b) There needs to be a distinction between missing and ignore. The mechanism for this is already in place in the payload type, although it isn't clear to me that that is uniformly used in all the NA code. There is also a place for missing *and* ignored. Which leads to c) Sums, etc. should always skip ignored data. If missing data is present, but not ignored, then a sum should return NA. The main danger I see here is that the behavior of arrays becomes state dependent, something that can lead to subtle problems. Ex
[Numpy-discussion] [ANN] Optimization with categorical variables, disjunctive (and other logical) constraints
hi all, free solver interalg for global nonlinear optimization with specifiable accuracy now can handle categorical variables, disjunctive (and other logical) constraints, thus making it available to solve GDP, possibly in multiobjective form. There are ~ 2 months till next OpenOpt release, but I guess someone may find it useful for his purposes right now. See here for more details. Regards, D. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion