Re: [Numpy-discussion] Proposed Roadmap Overview
Hi, On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire cjord...@uw.edu wrote: On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden stu...@molden.no wrote: Den 18. feb. 2012 kl. 05:01 skrev Jason Grout jason-s...@creativetrax.com: On 2/17/12 9:54 PM, Sturla Molden wrote: We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++. One example: How do we code multiple return values? In Python: - Return a tuple. In C: - Use pointers (evilness) In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C. C++ textbooks always pick the last... I would show the first and the second method, and perhaps intentionally forget the last. Sturla On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. Yes, it takes some practice to get used to what Cython will do, and how to optimize the output. As Sturla has said, regardless of the quality of the current product, it isn't stable. I've personally found it more or less rock solid. Could you say what you mean by it isn't stable? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
Hi, again (sorry), On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire cjord...@uw.edu wrote: On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some time. On the other hand, in the current development community around numpy, and among the subscribers to this mailing list, I suspect there is more Cython experience than C++ experience. Of course it might be that so-far undiscovered C++ developers are drawn to a C++ rewrite of Numpy. But it that really likely? I can see a C++ developer being drawn to C++ performance library they would use in their C++ applications, but it's harder for me to imagine a C++ programmer being drawn to a Python library because the internals are C++. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted franc...@continuum.io wrote: On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: But in the very end, when agreement can't be reached by other means, the developers are the one making the calls. (This is simply a consequence that they are the only ones who can credibly threaten to fork the project.) Interesting point. I hope I'm not pitching a log onto the fire here, but in numpy's case, there are very many capable developers on other projects who depend on numpy who could credibly threaten a fork if they felt numpy was drastically going wrong. Jason, that there capable developers out there that are able to fork NumPy (or any other project you can realize) is a given. The point Dag was signaling is that this threaten is more probable to happen *inside* the community. And you pointed out an important aspect too by saying if they felt numpy was drastically going wrong. It makes me the impression that some people is very frightened about something really bad would happen, well before it happens. While I agree that this is *possible*, I'd also advocate to give Travis the benefit of doubt. I'm convinced he (and Continuum as a whole) is making things happen that will benefit the entire NumPy community; but in case something gets really wrong and catastrophic, it is always a relief to know that things can be reverted in the pure open source tradition (by either doing a fork, creating a new foundation, or even better, proposing a new way to do things). What it does not sound reasonable to me is to allow fear to block Continuum efforts for making a better NumPy. I think it is better to relax a bit, see how things are going, and then judge by looking at the *results*. I'm finding this conversation a bit frustrating. The question on the table as I understand it, is just the following: Is there any governance structure / procedure / set of guidelines that would help ensure the long-term health of the numpy project? The subtext of your response is that you regard *any structure at all* as damaging to the numpy effort and in particular, as damaging to the efforts of Continuum. It seems to me that is a very extreme point of view, and I think, honestly, it is not tenable. But surely - surely - the best thing to do here is to formulate something that might be acceptable, and for everyone to say what they think the problems would be. Do you agree? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, Just for my own sake, can I clarify what you are saying here? On Thu, Feb 16, 2012 at 1:11 PM, Travis Oliphant tra...@continuum.io wrote: I'm not a big fan of design-by-committee as I haven't seen it be very successful in creating new technologies. It is pretty good at enforcing the status-quo. If I felt like that is what NumPy needed I would be fine with it. Was it your impression that what was being proposed, was design by committee? However, I feel that NumPy is going to be surpassed with other solutions if steps are not taken to improve the code-base *and* add new features. As far as you are concerned, is there any controversy about that? For the next 6-12 months, I am comfortable taking the benevolent dictator role. During that time, I hope we can find many more core developers and then re-visit the discussion. My view is that design decisions should be a consensus based on current contributors to the code base and major users. To continue to be relevant, NumPy has to serve it's customers. They are the ones who will have the final say. If others feel like they can do better, a fork is an option. I don't want that to happen, but it is the only effective and practical governance structure that exists in my mind outside of the self-governance of the people that participate. To confirm, you are saying that you can imagine no improvement in the current governance structure? No organizational structure can make up for the lack of great people putting their hearts and efforts into a great cause. But you agree that there might be an organizational structure that would make this harder or easier? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Thu, Feb 16, 2012 at 3:58 PM, Travis Oliphant tra...@continuum.io wrote: Matthew, What you should take from my post is that I appreciate your concern for the future of the NumPy project, and am grateful that you have an eye to the sort of things that can go wrong --- it will help ensure they don't go wrong. But, I personally don't agree that it is necessary to put any more formal structure in place at this time, and we should wait for 6-12 months, and see where we are at while doing everything we can to get more people interested in contributing to the project. I'm comfortable playing the role of BDF12 with a cadre of developers/contributors who seeks to come to consensus. I believe there are sufficient checks on the process that will make it quite difficult for me to *abuse* that in the short term. Charles, Rolf, Mark, David, Robert, Josef, you, and many others are already quite adept at calling me out when I do things they don't like or think are problematic. I encourage them to continue this. I can't promise I'll do everything you want, but I can promise I will listen and take your opinions seriously --- just like I take the opinions of every contributor to the NumPy and SciPy lists seriously (though weighted by the work-effort they have put on the project). We can all only continue to do our best to help out wherever we can. Just so we are clear: Continuum's current major client is the larger NumPy/SciPy community itself and this will remain the case for at least several months. You have nothing to fear from other clients we are trying to please. Thus, we are incentivized to keep as many people happy as possible. In the second place, the Foundation's major client is the same community (and even broader) and the rest of the board is committed to the overall success of the ecosystem. There is a reason the board is comprised of a wide-representation of that eco-system. I am very hopeful that numfocus will evolve over time to have an active community of people who participate in it's processes and plans to support as many projects as it can given the bandwidth and funding available to it. So, if I don't participate in this discussion, anymore, it's because I am working on some open-source things I'd like to show at PyCon, and time is clicking down. If you really feel strongly about this, then I would suggest that you come up with a proposal for governance that you would like us all to review. At the SciPy conference in Austin this summer we can talk about it --- when many of us will be face-to-face. This has not been an encouraging episode in striving for consensus. I see virtually no movement from your implied position at the beginning of this thread, other than the following 1) yes you are in charge 2) you'll consider other options in 6 to 12 months. I think you're saying here that you won't reply any more on this thread, and I suppose that reflects the importance you attach to this problem. I will not myself propose a governance model because I do not consider myself to have enough influence (on various metrics) to make it likely it would be supported. I wish that wasn't my perception of how things are done here. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Thu, Feb 16, 2012 at 5:26 PM, Alan G Isaac alan.is...@gmail.com wrote: On 2/16/2012 7:22 PM, Matthew Brett wrote: This has not been an encouraging episode in striving for consensus. Striving for consensus does not mean that a minority automatically gets veto rights. 'Striving' for consensus does imply some attempt to get to grips with the arguments, and working on some compromise to accommodate both parties. It seems to me there was very great latitude for finding such a comprise here, but Travis has terminated the discussion and I see no sign of a compromise. Striving for consensus can't of course be regulated. The desire has to be there. It's probably true, as Nathaniel says, that there isn't much you can do to legislate on that. We can only try to persuade. I was trying to do that, I failed, I'll have to look back and see if there was something else I could have done that would have been more useful to the same end, Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi John, On Thu, Feb 16, 2012 at 8:20 PM, John Hunter jdh2...@gmail.com wrote: On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac alan.is...@gmail.com wrote: On 2/16/2012 7:22 PM, Matthew Brett wrote: This has not been an encouraging episode in striving for consensus. I disagree. Failure to reach consensus does not imply lack of striving. Hey Alan, thanks for your thoughtful and nuanced views. I agree with everything you've said, but have a few additional points. I thought I'd looked deep in my heart and failed to find paranoia about corporate involvement in numpy. I am happy that Travis formed Continuum and look forward to the progress we can expect for numpy. I don't think the conversation was much about 'democracy'. As far as I was concerned, anything on the range of no change but at least being specific to full veto power from mailing list members was up for discussion and anything in between. I wish we had not had to deal with the various red herrings here, such as whether Continuum is good or bad, whether Travis has been given adequate credit, or whether companies are bad for software. But, we did. It's fine. Argument over now. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)
Hi, On Thu, Feb 16, 2012 at 10:11 PM, Travis Oliphant tra...@continuum.io wrote: The OS X slaves (especially PPC) are very valuable for testing. We have an intern who could help keep the build-bots going if you would give her access to those machines. Thanks for being willing to offer them. No problem. The OSX machines should be reliably available. Please do put your intern in touch, I'll give her access. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaac alan.is...@gmail.com wrote: On 2/14/2012 10:07 PM, Bruce Southey wrote: The one thing that gets over looked here is that there is a huge diversity of users with very different skill levels. But very few people have an understanding of the core code. (In fact the other thread about type-casting suggests that it is extremely few people.) So in all of this, I do not yet see 'community'. As an active user and long-time list member who has never even looked at the core code, I perhaps presumptuously urge a moderation of rhetoric. I object to the idea that users like myself do not form part of the community. This list has 1400 subscribers, and the fact that most of us are quiet most of the time does not mean we are not interested or attentive to the discussions, including discussions of governance. It looks to me like this will be great for NumPy. People who would otherwise not be able to spend much time on NumPy will be spending a lot of time improving the code and adding features. In my view, this will help NumPy advance which will enlarge the user community, which will slowly but inevitably enlarge the contributor community. I'm pretty excited about Travis's bold efforts to find ways to allow him and others to spend more time on NumPy. I wish him the best of luck. I think it is important to stick to the thread topic here, which is 'Governance'. It's not about whether it is good or bad that Travis has re-engaged in Numpy and is funding development in Numpy through his company. I'm personally very glad to see Travis back on the list and engaged again, but that's really not what the thread is about. The thread is about whether we need explicit Numpy governance, especially in the situation where one new company will surely dominate numpy development in the short term at least. I would say - for the benefit of Continuum Analytics and for the Numpy community, there should be explicit governance, that takes this relationship into account. I believe that leaving the governance informal and underspecified at this stage would be a grave mistake, for everyone concerned. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, Thanks for these interesting and specific questions. On Wed, Feb 15, 2012 at 11:33 AM, Eric Firing efir...@hawaii.edu wrote: On 02/15/2012 08:50 AM, Matthew Brett wrote: Hi, On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaacalan.is...@gmail.com wrote: On 2/14/2012 10:07 PM, Bruce Southey wrote: The one thing that gets over looked here is that there is a huge diversity of users with very different skill levels. But very few people have an understanding of the core code. (In fact the other thread about type-casting suggests that it is extremely few people.) So in all of this, I do not yet see 'community'. As an active user and long-time list member who has never even looked at the core code, I perhaps presumptuously urge a moderation of rhetoric. I object to the idea that users like myself do not form part of the community. This list has 1400 subscribers, and the fact that most of us are quiet most of the time does not mean we are not interested or attentive to the discussions, including discussions of governance. It looks to me like this will be great for NumPy. People who would otherwise not be able to spend much time on NumPy will be spending a lot of time improving the code and adding features. In my view, this will help NumPy advance which will enlarge the user community, which will slowly but inevitably enlarge the contributor community. I'm pretty excited about Travis's bold efforts to find ways to allow him and others to spend more time on NumPy. I wish him the best of luck. I think it is important to stick to the thread topic here, which is 'Governance'. Do you have in mind a model of how this might work? (I suspect you have already answered a question like that in some earlier thread; sorry.) A comparable project that is doing it right? The example that had come up previously was the book by Karl Fogel: http://producingoss.com/en/social-infrastructure.html http://producingoss.com/en/consensus-democracy.html In particular, the section When Consensus Cannot Be Reached, Vote in the second page. Here's an example of a voting policy: http://www.apache.org/foundation/voting.html Debian is a famous example: http://www.debian.org/devel/constitution Obviously some open-source projects do not have much of a formal governance structure, but I think in our case a) we have already run into problems with big decisions and b) we have now reached a situation where there is serious potential for actual or perceived problems with conflicts of interest. Governance implies enforcement power, doesn't it? Where, how, and by whom would the power be exercised? The governance that I had in mind is more to do with review and constraint of power. Thus, I believe we need a set of rules to govern how we deal with serious disputes, such as the masked array NA debate, or, previously the ABI breakage discussion at numpy 1.5.0. To go to a specific use-case. Let us imagine that Continuum think of an excellent feature they want in Numpy but that many others think would make the underlying array object too complicated. How would the desires of Continuum be weighed against the desires of other members of the community? It's not about whether it is good or bad that Travis has re-engaged in Numpy and is funding development in Numpy through his company. I'm personally very glad to see Travis back on the list and engaged again, but that's really not what the thread is about. The thread is about whether we need explicit Numpy governance, especially in the situation where one new company will surely dominate numpy development in the short term at least. I would say - for the benefit of Continuum Analytics and for the Numpy community, there should be explicit governance, that takes this relationship into account. Please elaborate; are you saying that Continuum Analytics must develop numpy as decided by some outside body? No - of course not. Here's the discussion from Karl Fogel's book: http://producingoss.com/en/contracting.html I'm proposing Governance not as some council that contracts work, but as a committee set up with formal rules that can resolve disputes and rule changes as they arise. This committee needs to be able to do this to make sure that the interests of the community (developers of numpy outside Continuum) are being represented. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root ben.r...@ou.edu wrote: On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac alan.is...@gmail.com wrote: Can you provide an example where a more formal governance structure for NumPy would have meant more or better code development? (Please do not suggest the NA discussion!) Why not the NA discussion? Would we really want to have that happen again? Note that it still isn't fully resolved and progress still needs to be made (I think the last thread did an excellent job of fleshing out the ideas, but it became too much to digest. We may need to have someone go through the information, reduce it down and make one last push to bring it to a conclusion). The NA discussion is the perfect example where a governance structure would help resolve disputes. Yes, that was the most obvious example. I don't know about you, but I can't see any sign of that one being resolved. The other obvious example was the dispute about ABI breakage for numpy 1.5.0 where I believe Travis did invoke some sort of committee to vote, but (Travis can correct me if I'm wrong), the committee was named ad-hoc and contacted off-list. Can you provide an example of what you might envision as a more formal governance structure? (I assume that any such structure will not put people who are not core contributors to NumPy in a position to tell core contributors what to spend their time on.) Early last December, Chuck Harris estimated that three people were active NumPy developers. I liked the idea of creating a board of these 3 and a rule that says any active developer can request to join the board, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority. I also suggested inviting to this board a developer or two from important projects that are very NumPy dependent (e.g., Matplotlib). I still like this idea. Would it fully satisfy you? I actually like that idea. Matthew, is this along the lines of what you were thinking? Honestly it would make me very happy if the discussion moved to what form the governance should take. I would have thought that 3 was too small a number. We should look at what other projects do. I think that this committee needs to be people who know numpy code; projects using numpy could advise, but people developing numpy should vote I think. There should be rules of engagement, a constitution, especially how to deal with disputes with Continuum or other contracting organizations. I would personally very much like to see a committment to consensus, where possible on these lines (as noted previously by Nathaniel): http://producingoss.com/en/consensus-democracy.html Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe mwwi...@gmail.com wrote: On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root ben.r...@ou.edu wrote: On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac alan.is...@gmail.com wrote: Can you provide an example where a more formal governance structure for NumPy would have meant more or better code development? (Please do not suggest the NA discussion!) Why not the NA discussion? Would we really want to have that happen again? Note that it still isn't fully resolved and progress still needs to be made (I think the last thread did an excellent job of fleshing out the ideas, but it became too much to digest. We may need to have someone go through the information, reduce it down and make one last push to bring it to a conclusion). The NA discussion is the perfect example where a governance structure would help resolve disputes. Yes, that was the most obvious example. I don't know about you, but I can't see any sign of that one being resolved. The other obvious example was the dispute about ABI breakage for numpy 1.5.0 where I believe Travis did invoke some sort of committee to vote, but (Travis can correct me if I'm wrong), the committee was named ad-hoc and contacted off-list. Can you provide an example of what you might envision as a more formal governance structure? (I assume that any such structure will not put people who are not core contributors to NumPy in a position to tell core contributors what to spend their time on.) Early last December, Chuck Harris estimated that three people were active NumPy developers. I liked the idea of creating a board of these 3 and a rule that says any active developer can request to join the board, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority. I also suggested inviting to this board a developer or two from important projects that are very NumPy dependent (e.g., Matplotlib). I still like this idea. Would it fully satisfy you? I actually like that idea. Matthew, is this along the lines of what you were thinking? Honestly it would make me very happy if the discussion moved to what form the governance should take. I would have thought that 3 was too small a number. One thing to note about this point is that during the NA discussion, the only people doing active C-level development were Charles and me. I suspect a discussion about how to recruit more people into that group might be more important than governance at this point in time. Mark - a) thanks for replying, it's good to hear your voice and b) I don't think there's any competition between the discussion about governance and the need to recruit more people into the group who understand the C code. Remember we are deciding here between governance - of a form to be decided - and no governance - which I think is the current situation. I know your desire is to see more people contributing to the C code. It would help a lot if you could say what you think the barriers are, how they could be lowered, and the risks that you see as a result of the numpy C expertise moving essentially into one company. Then we can formulate some governance that would help lower those barriers and reduce those risks. If we need a formal structure, maybe a good approach is giving Travis the final say for now, until a trigger point occurs. That could be 6 months after the number of active developers hits 5, or something like that. At that point, we would reopen the discussion with a larger group of people who would directly play in that role, and any decision made then will probably be better than a decision we make now while the development team is so small. Honestly - as I was saying to Alan and indirectly to Ben - any formal model - at all - is preferable to the current situation. Personally, I would say that making the founder of a company, which is working to make money from Numpy, the only decision maker on numpy - is - scary. But maybe it's the best way. But, again, we're all high-functioning sensible people, I'm sure it's possible for us to formulate what the risks are, what the potential solutions are, and come up with the best - maybe short-term - solution, See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac alan.is...@gmail.com wrote: My analysis is fundamentally different than Matthew and Benjamin's for a few reasons. 1. The problem has been miscast. The economic interests of the developers *always* has had an apparent conflict with the economic interests of the users: users want developers to work more on the code, and developers need to make a living, which often involves spending their time on other things. On this score, nothing has really changed. 2. It seems pretty clear that Matthew wants some governance power to be held by individuals who are not actively developing NumPy. As Chuck Harris pointed out long ago, that dog ain't going to hunt. 3. Constitutions can be broken (and are, all the time). Designing a stable institution requires making it in the interests of the members to participate. Any formal governance structure that can be desirable for the NumPy community as a whole has to be desirable for the core developers. The right way to produce a governance structure is to make concrete proposals and show how these proposals are in the interest of the *developers* (as well as of the users). For example, Benjamin obliquely suggested that with an appropriate governance board, the NA discussion could have simply been shut down by having the developers vote (as part of their governance). This might be in the interest of the developers and of the community (I'm not sure), but I doubt it is what Matthew has in mind. In any case, until proposals are put on the table along with a clear effort to illustrate why it is in the interest of the *developers* to adopt the proposals, I really do not see this discussion moving forward. That's helpful, it would be good to discuss concrete proposals. Would you care to flesh out your proposal in more detail or is it as you quoted it before? Where do you stand on the desirability of consensus? Do you have any suggestions on how to ensure that the non-Continuum community has sufficient weight in decision making? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 2:30 PM, Peter Wang pw...@streamitive.com wrote: On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote: Honestly - as I was saying to Alan and indirectly to Ben - any formal model - at all - is preferable to the current situation. Personally, I would say that making the founder of a company, which is working to make money from Numpy, the only decision maker on numpy - is - scary. How is this different from the situation of the last 4 years? Travis was President at Enthought, which makes money from not only Numpy but SciPy as well. In addition to employing Travis, Enthought also employees many other key contributors to Numpy and Scipy, like Robert and David. The difference is fairly obvious to me, but stop me if I'm wrong. First - although Enthought was in a position to influence numpy development, it didn't very much, partly, I suppose because Travis did not have time to contribute to numpy. The exception is of course the masked array stuff by Mark that caused a lot of controversy. Furthermore, the Scipy and Numpy mailing lists and repos and web pages were all hosted at Enthought. If they didn't like how a particular discussion was going, they could have memory-holed the entire conversation from the archives, or worse yet, revoked commit access and reverted changes. Obviously we should be realistic about the risks. Situations like that are very unlikely. But such things never transpired, and of course most of us know that such things would never happen. Right. I don't see why the current situation is any different from the previous situation, other than the fact that Travis actually plans on actively developing Numpy again, and that hardly seems scary. It would be silly to be worried about Travis contributing to numpy, in general. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/15/2012 02:24 PM, Mark Wiebe wrote: On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett matthew.br...@gmail.com mailto:matthew.br...@gmail.com wrote: Hi, On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe mwwi...@gmail.com mailto:mwwi...@gmail.com wrote: On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett matthew.br...@gmail.com mailto:matthew.br...@gmail.com wrote: Hi, On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root ben.r...@ou.edu mailto:ben.r...@ou.edu wrote: On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac alan.is...@gmail.com mailto:alan.is...@gmail.com wrote: Can you provide an example where a more formal governance structure for NumPy would have meant more or better code development? (Please do not suggest the NA discussion!) Why not the NA discussion? Would we really want to have that happen again? Note that it still isn't fully resolved and progress still needs to be made (I think the last thread did an excellent job of fleshing out the ideas, but it became too much to digest. We may need to have someone go through the information, reduce it down and make one last push to bring it to a conclusion). The NA discussion is the perfect example where a governance structure would help resolve disputes. Yes, that was the most obvious example. I don't know about you, but I can't see any sign of that one being resolved. The other obvious example was the dispute about ABI breakage for numpy 1.5.0 where I believe Travis did invoke some sort of committee to vote, but (Travis can correct me if I'm wrong), the committee was named ad-hoc and contacted off-list. Can you provide an example of what you might envision as a more formal governance structure? (I assume that any such structure will not put people who are not core contributors to NumPy in a position to tell core contributors what to spend their time on.) Early last December, Chuck Harris estimated that three people were active NumPy developers. I liked the idea of creating a board of these 3 and a rule that says any active developer can request to join the board, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority. I also suggested inviting to this board a developer or two from important projects that are very NumPy dependent (e.g., Matplotlib). I still like this idea. Would it fully satisfy you? I actually like that idea. Matthew, is this along the lines of what you were thinking? Honestly it would make me very happy if the discussion moved to what form the governance should take. I would have thought that 3 was too small a number. One thing to note about this point is that during the NA discussion, the only people doing active C-level development were Charles and me. I suspect a discussion about how to recruit more people into that group might be more important than governance at this point in time. Mark - a) thanks for replying, it's good to hear your voice and b) I don't think there's any competition between the discussion about governance and the need to recruit more people into the group who understand the C code. There hasn't really been any discussion about recruiting developers to compete with the governance topic, now we can let the topics compete. :) Some of the mechanisms which will help are already being set in motion through the discussion about better infrastructure support like bug trackers and continuous integration. The forthcoming roadmap discussion Travis alluded to, where we will propose a roadmap for review by the numpy user community, will include many more such points. Remember we are deciding here between governance - of a form to be decided - and no governance - which I think is the current situation. I know your desire is to see more people contributing to the C code. It would help a lot if you could say what you think the barriers are, how they could be lowered, and the risks that you see as a result of the numpy C expertise moving essentially into one company. Then we can formulate some governance that would help lower those barriers and reduce those risks. There certainly is governance now, it's just informal. It's a combination of how the design discussions
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/15/2012 02:24 PM, Mark Wiebe wrote: There certainly is governance now, it's just informal. It's a combination of how the design discussions are carried out, how pull requests occur, and who has commit rights. +1 If non-contributing users came along on the Cython list demanding that we set up a system to select non-developers along on a board that would have discussions in order to veto pull requests, I don't know whether we'd ignore it or ridicule it or try to show some patience, but we certainly wouldn't take it seriously. In the spirit (as I read) of Dag's post, maybe we should accept that this thread is not going anywhere much, and summarize: The current situation is the following: Travis is de-facto BDFL for Numpy Disputes get resolved by convening an ad-hoc group of interested and / or active developers to resolve or vote, maybe off-list. How this happens is for Travis to call. I think that's reasonable? As far as I can make out, in favor of the current status quo with no significant modification are: Travis (is that right)? Mark Peter Bryan vdv Perry Dag In favor of some sort of formalization of governance to be decided are: Me Ben R (did I get that right?) Bruce Southey Souheil Inati TJ Joe H I am not quite sure which side of that fence are: Josef Alan Chuck If I missed someone who gave an opinion - sorry - please do speak up. I think it's clear that if - you, Travis, don't want to go this direction, there isn't much chance of anything happening, and I think those of us who think something needs doing will have to keep quiet, as Dag suggests. I would only suggest that you (Travis) specify that you will take the BDFL role so that we can be clear about the informal governance at least. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 6:07 PM, josef.p...@gmail.com wrote: On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/15/2012 02:24 PM, Mark Wiebe wrote: There certainly is governance now, it's just informal. It's a combination of how the design discussions are carried out, how pull requests occur, and who has commit rights. +1 If non-contributing users came along on the Cython list demanding that we set up a system to select non-developers along on a board that would have discussions in order to veto pull requests, I don't know whether we'd ignore it or ridicule it or try to show some patience, but we certainly wouldn't take it seriously. In the spirit (as I read) of Dag's post, maybe we should accept that this thread is not going anywhere much, and summarize: The current situation is the following: Travis is de-facto BDFL for Numpy Disputes get resolved by convening an ad-hoc group of interested and / or active developers to resolve or vote, maybe off-list. How this happens is for Travis to call. I think that's reasonable? As far as I can make out, in favor of the current status quo with no significant modification are: Travis (is that right)? Mark Peter Bryan vdv Perry Dag In favor of some sort of formalization of governance to be decided are: Me Ben R (did I get that right?) Bruce Southey Souheil Inati TJ Joe H I am not quite sure which side of that fence are: Josef Actually in the sense of separation of powers, I would vote for Chuck as president, Travis as prime minister and an independent release manager as supreme court, and the noisy mailing list community as parliament. That sounds dangerously Canadian ... But actually - I was hoping for an answer to whether you felt there was a need for a more formal governance structure, or not. (I don't see a constitution yet.) My feeling is there is not enough appetite for any change for that to be worth thinking about, but I might be wrong. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update
Hi, On Wed, Feb 15, 2012 at 9:47 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/15/2012 05:02 PM, Matthew Brett wrote: Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 02/15/2012 02:24 PM, Mark Wiebe wrote: There certainly is governance now, it's just informal. It's a combination of how the design discussions are carried out, how pull requests occur, and who has commit rights. +1 If non-contributing users came along on the Cython list demanding that we set up a system to select non-developers along on a board that would have discussions in order to veto pull requests, I don't know whether we'd ignore it or ridicule it or try to show some patience, but we certainly wouldn't take it seriously. Ouch. Is that me, one of the non-contributing users? Was I suggesting that we set up a system to select non-developers to a board? I must say, now you mention it, I do feel a bit ridiculous. In retrospect I was unfair and my email way too harsh. Anyway, I'm really happy with your follow-up in turning this into something more constructive. Don't worry - thanks for this reply. You believe, I suppose, that there are no significant risks in nearly all the numpy core development being done by a new company, or at least, that there can little benefit to a governance discussion in that situation. I think you are wrong, but of course it's a tenable point of view, The question is more about what can possibly be done about it. To really shift power, my hunch is that the only practical way would be to, like Mark said, make sure there are very active non-Continuum-employed developers. But perhaps I'm wrong. It's not obvious to me that there isn't a set of guidelines, procedures, structures that would help to keep things clear in this situation. Obviously it would be good to have more non-Continuum developers, but also obviously, there is a risk that that won't happen. Sometimes it is worth taking some risks because it means one can go forward faster. Possibly *a lot* faster, if one shifts things from email to personal communication. Yes, obviously it's in no-one's interest to slow down the Continuum developers. I wonder though whether there is a way of organizing things, that does not slow down the Continuum developers, but does keep the sense of community involvement and ownership. It is not like the current versions of NumPy disappear. If things do go wrong and NumPy is developed in some crazy direction, it's easy to go for the stagnated option simply by taking the current release and maintain bugfixes on it. But we all want to avoid a fork, which is what that could easily become. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1
Hi Travis, On Mon, Feb 13, 2012 at 11:46 PM, Travis Oliphant tra...@continuum.io wrote: Here is the code I used to determine the coercion table of types. I first used *all* of the numeric_ops, narrowed it down to those with 2 inputs and 1 output, and then determined the run-time coercion table. Then, I removed ops that had the same tables until I was left with binary ops that had different coercion tables. Some operations were NotImplemented and I used 'X' in the table for those combinations. The table for each op is a dictionary with keys given by (type1, type2) and values given by a length-4 list of the types of the result between: [scalar-scalar, scalar-array, array-scalar, array-array] where the first term is type1 and the second term is type2. This resulting dictionary of tables for each op is then saved to a file. I ran this code for NumPy 1.5.1 64-bit and then again for NumPy 1.6.1 64-bit. I also ran this code for NumPy 1.4.1 64-bit and NumPy 1.3.1.dev 64-bit. The code to compare them is also attached. I'm attaching also the changes that have occurred between 1.3.1.dev and 1.4.1, 1.4.1 to 1.5.1, and finally 1.5.1 to 1.6.1 As you can see there were changes in each release. Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. I really would like to just say all is well, and it's no big deal. I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. Thanks for looking into this. It strikes me that changes in behavior here could be dangerous and easily missed, and it does seem to me that it is worth a pause to consider what the effect of the changes might be. Obviously, now both 1.6 and 1.6.1 are in the wild, there will be costs to reverting as well. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1
Hi, On Tue, Feb 14, 2012 at 10:25 AM, Travis Oliphant tra...@continuum.io wrote: On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: Hi Travis, It is great that some resources can be spent to have people paid to work on NumPy. Thank you for making that happen. I am slightly confused about roadmaps for numpy 1.8 and 2.0. This needs discussion on the ML, and our release manager currently is Ralf - he is the one who ultimately decides what goes when. Thank you for reminding me of this. Ralf and I spoke several days ago, and have been working on how to give him more time to spend on SciPy full-time. As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will be the release manager again. Ralf will continue serving as release manager for SciPy. For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. I only know that I won't be release manager past NumPy 1.X. I am also not completely comfortable by having a roadmap advertised to Pycon not coming from the community. This is my bad wording which is a function of being up very late. At PyCon we will be discussing the roadmap conversations that are taking place on this list. We won't be presenting anything there related to the NumPy project that has not first been discussed here. The community will have ample opportunity to provide input, suggestions, and criticisms for anything that goes into NumPy --- the same as I've always done before when releasing open source software. In fact, I will also be discussing at PyCon, the creation of NumFOCUS (NumPy Foundation for Open Code for Usable Science) which has been organized precisely for ensuring that NumPy, SciPy, Matplotlib, and IPython stay community-focused and community-led even while receiving input and money from multiple companies and organizations. There is a mailing list for numfocus that you can sign up for if you would like to be part of those discussions. Let me know if you would like more information about that. John Hunter, Fernando Perez, me, Perry Greenfield, and Jarrod Millman are the initial board of the Foundation. But, I expect the Foundation directors to evolve over time. I should say that I have no knowledge of the events above other than from the mailing list (I say that only because some of you may know that I'm a friend and colleague of Jarrod and Fernando). Travis - I hope you don't mind, but here I post some links that I have just found: http://technicaldiscovery.blogspot.com/2012/01/transition-to-continuum.html http://www.continuum.io/ I see that you've founded a new company, Continuum Analytics, and you are working with Peter Wang, Mark Wiebe, Francesc Alted (PyTables), and Bryan Van de Ven. I think you mentioned this earlier in one of the recent threads. In practice this gives your company an overwhelming voice in the direction of numpy. From the blog post you say: This may also mean different business models and licensing around some of the NumPy-related code that the company writes. Obviously your company will need to make enough money to cover your salaries and more. There is huge potential here for clashes of interest, and for perceived clashes of interest. The perceived clashes are just as damaging as the actual clashes. I still don't think we've got a Numpy steering group. The combination of the huge concentration of numpy resources in your company, and a lack of explicit community governance, seems to me to be something that needs to be fixed urgently. Do you agree? Is there any reason why the numfocus group was formed without obvious public discussion about it's composition, remit or governance? I'm not objecting to it's composition, but I think it is a mistake to make large decisions like this without public consultation. I imagine that what happened was that things moved too fast to make it attractive to slow the process by public discussion. I implore you to slow down and commit yourself to have that discussion in full and in public, in the interests of the common ownership of the project. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1
Hi, On Tue, Feb 14, 2012 at 1:54 PM, Travis Oliphant tra...@continuum.io wrote: There is a mailing list for numfocus that you can sign up for if you would like to be part of those discussions. Let me know if you would like more information about that. John Hunter, Fernando Perez, me, Perry Greenfield, and Jarrod Millman are the initial board of the Foundation. But, I expect the Foundation directors to evolve over time. I should say that I have no knowledge of the events above other than from the mailing list (I say that only because some of you may know that I'm a friend and colleague of Jarrod and Fernando). Thanks for speaking up, Matthew. I knew that this was my first announcement of the Foundation to this list. Things are still just starting around that organization, and so there is plenty of time for input. This sort of thing has actually been under-way for a long time --- it just has not received much impetus until now for one reason or another. To be clear, there were several email posts about a Foundation to this list last fall and we took the discussion of the Foundation that has really been in the works for a couple of years (thanks to Jarrod), to a Google Group (very poorly) called Fastechula. There were 33 people who signed up for that list and discussions continued sporadically on that list away from this one. When we selected the name NumFOCUS just a few weeks ago, we created the list for numfocus and then I signed everyone up for that list who was on the other one. I apologize if anyone felt left out. That is not my intention. My point is that there are two ways go to about this process, one is open and the other is closed. In the open version, someone proposes such a group to the mailing lists. They ask for expressions of interest. The discussion might then move to another mailing list that is publicly known and widely advertised. Members of the board are proposed in public. There might be some sort of formal or informal voting process. The reason to prefer this to the more informal private negotiations is that a) the community feels a greater ownership and control of the process and b) it is much harder to weaken or subvert an organization that explicitly does all its business in public. The counter-argument usually goes 'members X, Y and Z are of impeccable integrity and would only do what is best for the public good'. And usually, members X, Y and Z are indeed of impeccable integrity. Nevertheless I'm sure I don't have to unpack the evidence that this approach frequently fails and can fail in a catastrophic way. Perceptions can be damaging. This is one of the big reasons for the organization of the Foundation -- to be a place separate from any commercial venture which can direct resources to a vision whose goal is more democratically determined. Are you proposing that the Foundation oversee Numpy governance and direction? From your chosen members I'm guessing that the idea is for the foundation to think about broad strategy rather than - say - whether missing values should be encoded with masked arrays? I think we do have a NumPy steering group if you want to call it that. It is currently me, Mark Wiebe, and Charles Harris. Rolf Gommers, Pauli Virtanen, David Cournapeau and Robert Kern also have opinions that carry significant weight. Are there other people that should be on this list? There are other people who also speak up on this list whose opinions will be listened to and heard. In fact, I hope that many more people will come to the list and speak out as development increases. The point I was making was that the concentration of numpy development hours and talent in your company makes it urgent that the numpy governance is set out formally, that the interests of the company are made clear, and that the steering group can be assured of explicit and public independence from the interests of the company, if and when that becomes necessary. In the past, the numpy steering group has seemed a virtual organization, formed ad-hoc when needed, and with no formal governance. I'm saying that I firmly believe that has to change, to avoid the actual or perceived loss of community ownership. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1
Hi, On Tue, Feb 14, 2012 at 3:58 PM, Travis Oliphant tra...@continuum.io wrote: When we selected the name NumFOCUS just a few weeks ago, we created the list for numfocus and then I signed everyone up for that list who was on the other one. I apologize if anyone felt left out. That is not my intention. My point is that there are two ways go to about this process, one is open and the other is closed. In the open version, someone proposes such a group to the mailing lists. They ask for expressions of interest. The discussion might then move to another mailing list that is publicly known and widely advertised. Members of the board are proposed in public. There might be some sort of formal or informal voting process. The reason to prefer this to the more informal private negotiations is that a) the community feels a greater ownership and control of the process and b) it is much harder to weaken or subvert an organization that explicitly does all its business in public. Your points are well taken. However, my point is that this has been discussed on an open mailing list. Things weren't *as* open as they could have been, perhaps, in terms of board selection. But, there was opportunity for people to provide input. I am on the numpy, scipy, matplotlib, ipython and cython mailing lists. Jarrod and Fernando are friends of mine. I've been obviously concerned about numpy governance for some time. I didn't know about this mailing list, had only a vague idea that some sort of foundation was being proposed and I had no idea at all that you'd selected a board. Would you say that was closer to 'open' or closer to 'closed'? Perceptions can be damaging. This is one of the big reasons for the organization of the Foundation -- to be a place separate from any commercial venture which can direct resources to a vision whose goal is more democratically determined. Are you proposing that the Foundation oversee Numpy governance and direction? From your chosen members I'm guessing that the idea is for the foundation to think about broad strategy rather than - say - whether missing values should be encoded with masked arrays? No, I am not proposing that. The Foundation will be focused on higher-level broad strategy sorts of things: mostly around how to raise money and how to direct that money to projects that have their own development cycles. I would think the Foundation would be interested in paying for things like issue trackers and continuous integration servers as well. It will leave NumPy management to this list and the people who have gathered around this watering hole. Obviously, there will be points of connection, but exactly how this will play-out depends on who shows up to both organizations. I think we do have a NumPy steering group if you want to call it that. It is currently me, Mark Wiebe, and Charles Harris. Rolf Gommers, Pauli Virtanen, David Cournapeau and Robert Kern also have opinions that carry significant weight. Are there other people that should be on this list? There are other people who also speak up on this list whose opinions will be listened to and heard. In fact, I hope that many more people will come to the list and speak out as development increases. The point I was making was that the concentration of numpy development hours and talent in your company makes it urgent that the numpy governance is set out formally, that the interests of the company are made clear, and that the steering group can be assured of explicit and public independence from the interests of the company, if and when that becomes necessary. In the past, the numpy steering group has seemed a virtual organization, formed ad-hoc when needed, and with no formal governance. I'm saying that I firmly believe that has to change, to avoid the actual or perceived loss of community ownership. I hear your point. Thank you for sharing it. Fortunately, we are having this discussion, and plan to continue to have it as any concerns arise. I think the situation is actually less concentrated than it used to be when the SciPy steering committee was discussed. On that note, I think the SciPy steering committee needs serious revision as well. But, we've all just been getting along pretty well without too much formalism, so far, so perhaps that is enough for now. But a) there have already been serious unresolved disagreements on this list (I note no resolution of the masks / NA debate) and b) the whole point is to set up structures that can deal with the problems before or as they arise. After the problem arises, it is too late. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1
Hi, On Tue, Feb 14, 2012 at 4:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Feb 14, 2012 at 3:58 PM, Travis Oliphant tra...@continuum.io wrote: When we selected the name NumFOCUS just a few weeks ago, we created the list for numfocus and then I signed everyone up for that list who was on the other one. I apologize if anyone felt left out. That is not my intention. My point is that there are two ways go to about this process, one is open and the other is closed. In the open version, someone proposes such a group to the mailing lists. They ask for expressions of interest. The discussion might then move to another mailing list that is publicly known and widely advertised. Members of the board are proposed in public. There might be some sort of formal or informal voting process. The reason to prefer this to the more informal private negotiations is that a) the community feels a greater ownership and control of the process and b) it is much harder to weaken or subvert an organization that explicitly does all its business in public. Your points are well taken. However, my point is that this has been discussed on an open mailing list. Things weren't *as* open as they could have been, perhaps, in terms of board selection. But, there was opportunity for people to provide input. I am on the numpy, scipy, matplotlib, ipython and cython mailing lists. Jarrod and Fernando are friends of mine. I've been obviously concerned about numpy governance for some time. I didn't know about this mailing list, had only a vague idea that some sort of foundation was being proposed and I had no idea at all that you'd selected a board. Would you say that was closer to 'open' or closer to 'closed'? By the way - I want to be clear - I am not suggesting that I should have been one of the people involved in these discussions. If you were choosing a small number of people to discuss this with, one of them should not be me. I am saying that, if I didn't know, it's reasonable to assume that very few people knew, who weren't being explicitly told, and that this means that the process was, effectively, closed. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] can_cast with structured array output - bug?
Hi, On Mon, Feb 13, 2012 at 7:02 PM, Mark Wiebe mwwi...@gmail.com wrote: I took a look into the code to see what is causing this, and the reason is that nothing has ever been implemented to deal with the fields. This means it falls back to treating all struct dtypes as if they were a plain void dtype, which allows anything to be cast to it. While I was redoing the casting subsystem for 1.6, I did think on this issue, and decided that it wasn't worth tackling it at the time because the 'safe'/'same_kind'/'unsafe' don't seem sufficient to handle what might be desired. I tried to leave this alone as much as possible. Some random thoughts about this are: * Casting a scalar to a struct dtype: should it be safe if the scalar can be safely cast to each member of the struct dtype? This is the NumPy broadcasting rule applied to dtypes as if the struct dtype is another dimension. * Casting one struct dtype to another: If the fields of the source are a subset of the target, and the types can safely convert, should that be a safe cast? If the fields of the source are not a subset of the target, should that still be a same_kind cast? Should a second enum which complements the safe/same_kind/unsafe one, but is specific for how adding/removing struct fields be added? This is closely related to adding ufunc support for struct dtypes, and the choices here should probably be decided at the same time as designing how the ufuncs should work. Thanks for the discussion - that's very helpful. How about, at a first pass, returning True for conversion of void types only if input dtype == output dtype, then adding more sophisticated rules later? See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
Hi, On Mon, Feb 13, 2012 at 12:44 PM, Travis Oliphant tra...@continuum.io wrote: On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant tra...@continuum.io wrote: I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments. Both of these plans allow Open Source projects to have unlimited plans for free. Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. YouTrack from JetBrains: http://www.jetbrains.com/youtrack/features/issue_tracking.html This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. I do like the team behind JetBrains. And I've seen and heard good things about TeamCity. Thanks for reminding me about the build-bot situation. That is one thing I would like to address sooner rather than later as well. We've (nipy) got a buildbot collection working OK. If you want to go that way you are welcome to use our machines. It's a somewhat flaky setup though. http://nipy.bic.berkeley.edu/builders I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. Ondrej did some nice stuff on integrating a build with the github pull requests: https://github.com/sympy/sympy-bot Some discussion of buildbot and Jenkins: http://vperic.blogspot.com/2011/05/continuous-integration-and-sympy.html See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
Hi, On Mon, Feb 13, 2012 at 2:33 PM, jason-s...@creativetrax.com wrote: On 2/13/12 2:56 PM, Matthew Brett wrote: I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. I'm not aware of a Jenkins buildbot system for Sage, though I think Cython uses such a system: https://sage.math.washington.edu:8091/hudson/ We do have a number of systems we build and test Sage on, though I don't think we have continuous integration yet. I've CCd Jeroen Demeyer, who is the current release manager for Sage. Jeroen, do we have an automatic buildbot system for Sage? Ah - sorry - I was thinking of the Cython system on the SAGE server. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: import numpy as np Adata = np.array([127], dtype=np.int8) Bdata = np.int16(127) (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy = 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] can_cast with structured array output - bug?
Hi, I've also just noticed this oddity: In [17]: np.can_cast('c', 'u1') Out[17]: False OK so far, but... In [18]: np.can_cast('c', [('f1', 'u1')]) Out[18]: True In [19]: np.can_cast('c', [('f1', 'u1')], 'safe') Out[19]: True In [20]: np.can_cast(np.ones(10, dtype='c'), [('f1', 'u1')]) Out[20]: True I think this must be a bug. In the other direction, it makes more sense to me: In [24]: np.can_cast([('f1', 'u1')], 'c') Out[24]: False In [25]: np.can_cast([('f1', 'u1')], [('f1', 'u1')]) Out[25]: True Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unexpected reorganization of internal data
Hi, On Tue, Jan 31, 2012 at 8:29 AM, Mads Ipsen madsip...@gmail.com wrote: Hi, I am confused. Here's the reason: The following structure is a representation of N points in 3D space: U = numpy.array([[x1,y1,z1], [x1,y1,z1],...,[xn,yn,zn]]) So the array U has shape (N,3). This order makes sense to me since U[i] will give you the i'th point in the set. Now, I want to pass this array to a C++ function that does some stuff with the points. Here's how I do that void Foo::doStuff(int n, PyObject * numpy_data) { // Get pointer to data double * const positions = (double *) PyArray_DATA(numpy_data); // Print positions for (int i=0; in; ++i) { float x = static_castfloat(positions[3*i+0]) float y = static_castfloat(positions[3*i+1]) float z = static_castfloat(positions[3*i+2]) printf(Pos[%d] = %f %f %f\n, x, y, z); } } When I call this routine, using a swig wrapped Python interface to the C++ class, everything prints out nice. Now, I want to apply a rotation to all the positions. So I set up some rotation matrix R like this: R = numpy.array([[r11,r12,r13], [r21,r22,r23], [r31,r32,r33]]) To apply the matrix to the data in one crunch, I do V = numpy.dot(R, U.transpose()).transpose() Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed. So apparently the internal data structure handled by numpy has been reorganized, even though I called transpose() twice, which I would expect to cancel out each other. However, if I do: V = numpy.array(U.transpose()).transpose() and call the C++ routine, everything is perfectly fine, ie. the data structure is as expected. What went wrong? The numpy array reserves the right to organize its data internally. For example, a numpy array can be in Fortran order in memory, or C order in memory, and many more complicated schemes. You might want to have a look at: http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray If you depend on a particular order for your array memory, you might want to look at: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ascontiguousarray.html Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] adding unsigned int and int
Hi, On Tue, Dec 6, 2011 at 4:45 AM, Skipper Seabold jsseab...@gmail.com wrote: Hi, Is this intended? [~/] [1]: np.result_type(np.uint, np.int) [1]: dtype('float64') I would guess so - if your system ints are 64 bit. int64 can't contain the range for uint64, nor can uint64 contain all int64, If there had been a larger int type, it would promote to int, I believe. At least on my system: In [4]: np.result_type(np.int32, np.uint32) Out[4]: dtype('int64') Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
Hi, 2011/12/5 Stéfan van der Walt ste...@sun.ac.za: As for barriers to entry, improving the the nature of discourse on the mailing list (when it comes to thorny issues) would be good. Technical barriers are not that hard to breach for our community; setting the right social atmosphere is crucial. I'm just about to get on a plane and am going to be out of internet range for a while, so, in the spirit of constructive discussion: In the spirit of use-cases: Would it be fair to say that the two contentious recent discussions have been: The numpy ABI breakage, 2.0 vs 1.5.1 discussion The masked array discussion(s) ? What did we do wrong or right in each of these two discussions? What could we have done better? What process would help us to do better? Travis - for your board-only-post mailing list - my feeling is that this is going in the wrong direction. The effect of the board-only mailing list is to explicitly remove non-qualified people from the discussion. This will make it more explicit that the substantial decisions will be make by a few important people. Do you (Travis - or Mark?) think that, if this had happened earlier in the masked array discussion, it would have been less contentious, or had more substantial content? My instinct would be the reverse, and the best solution would have been to pause and commit to beating out the issues and getting agreement. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
Hi Travis, On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.com wrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects. Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently. For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). Thanks very much for starting this discussion. You have probably seen that my preference would be for all discussions to be public - in the sense that all can contribute. So, it seems reasonable to me to have 'board' as you describe, but that the board should vote on the same mailing list as the rest of the discussion. Having a separate mailing list for discussion makes the separation overt between those with a granted voice and those without, and I would hope for a structure which emphasized discsussion in an open forum. Put another way, what advantage would having a separate public mailing list have? How does this governance compare to that of - say - Linux or Python or Debian? My worry will be that it will be too tempting to terminate discussions and proceed to resolve by vote, when voting (as Karl Vogel describes) may still do harm. What will be the position - maybe I mean your position - on consensus as Nathaniel has described it? I feel the masked array discussion would have been more productive (an maybe shorter and more to the point) if there had been some rule-of-thumb that every effort is made to reach consensus before proceeding to implementation - or a vote. For example, in the masked array discussion, I would have liked to be able to say 'hold on, we have a rule that we try our best to reach consensus; I do not feel we have done that yet'. See you, Matthew I guess that the ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.org still says source in some subversion repo -- should be git !?
Yo, On Thu, Dec 1, 2011 at 8:01 PM, Jarrod Millman mill...@berkeley.edu wrote: On Mon, Nov 28, 2011 at 1:19 PM, Matthew Brett matthew.br...@gmail.com wrote: Maybe the content could be put in http://github.com/scipy/scipy.github.com so we can make pull requests there? The source is here: https://github.com/scipy/scipy.org-new Are you then the person to ask about merging pull requests and uploading the docs? See you (literally), Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.org still says source in some subversion repo -- should be git !?
Hi, On Mon, Nov 28, 2011 at 1:01 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Fri, Nov 25, 2011 at 7:20 PM, Sebastian Haase seb.ha...@gmail.com wrote: google search for: numpy browse source points here: http://new.scipy.org/download.html which talks about: svn co http://svn.scipy.org/svn/numpy/trunk numpy The problem is that new.scipy.org duplicates content from scipy.org, and is not so new anymore. I suspect that there's more out of date info (like installation instructions). Is anyone still working on this, or planning to do so in the near future? If not, it may be better to disable this site until someone volunteers to spend time on it again. Who controls the new.scipy.org site? Maybe the content could be put in http://github.com/scipy/scipy.github.com so we can make pull requests there? And redirect new.scipy.org to http://scipy.github.com ? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Tue, Nov 15, 2011 at 12:51 AM, David Cournapeau courn...@gmail.com wrote: On Tue, Nov 15, 2011 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Mon, Nov 14, 2011 at 10:08 PM, David Cournapeau courn...@gmail.com wrote: On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? It's a compiler/architecture thing and depends on how the compiler interprets the long double c type. The gcc compiler does do 80 bit math on Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't produce float96 numbers, if it does, it is a bug. Mingw uses the gcc compiler, so the numbers are there, but if it uses the MS library it will have to convert them to double to do computations like sin(x) since there are no microsoft routines for extended precision. I suspect that gcc/ms combo is what is producing the odd results you are seeing. I think we might be talking past each other a bit. It seems to me that, if float96 must use float64 math, then it should be removed from the numpy namespace, because If we were to do so, it would break too much code. David - please - obviously I'm not suggesting removing it without deprecating it. Let's say I find it debatable that removing it (with all the deprecations) would be a good use of effort, especially given that there is no obviously better choice to be made. a) It implies higher precision than float64 but does not provide it b) It uses more memory to no obvious advantage There is an obvious advantage: to handle memory blocks which use long double, created outside numpy (or even python). Right - but that's a bit arcane, and I would have thought np.longdouble would be a good enough name for that. Of course, the users may be surprised, as I was, that memory allocated for higher precision is using float64, and that may take them some time to work out. I'll say
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? It's a compiler/architecture thing and depends on how the compiler interprets the long double c type. The gcc compiler does do 80 bit math on Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't produce float96 numbers, if it does, it is a bug. Mingw uses the gcc compiler, so the numbers are there, but if it uses the MS library it will have to convert them to double to do computations like sin(x) since there are no microsoft routines for extended precision. I suspect that gcc/ms combo is what is producing the odd results you are seeing. I think we might be talking past each other a bit. It seems to me that, if float96 must use float64 math, then it should be removed from the numpy namespace, because a) It implies higher precision than float64 but does not provide it b) It uses more memory to no obvious advantage On the other hand, it seems to me that raw gcc does use higher precision for basic math on long double, as expected. For example, this guy passes: #include math.h #include assert.h int main(int argc, char* argv) { double d; long double ld; d = pow(2, 53); ld = d; assert(d == ld); d += 1; ld += 1; /* double rounds down because it doesn't have enough precision */ assert(d != ld); assert(d == ld - 1); } whereas numpy does not use the higher precision: In [10]: a = np.float96(2**53) In [11]: a Out[11]: 9007199254740992.0 In [12]: b = np.float64(2**53) In [13]: b Out[13]: 9007199254740992.0 In [14]: a == b Out[14]: True In [15]: (a + 1) == (b + 1) Out[15]: True So maybe there is a way of picking up the gcc math in numpy? Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Mon, Nov 14, 2011 at 10:08 PM, David Cournapeau courn...@gmail.com wrote: On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? It's a compiler/architecture thing and depends on how the compiler interprets the long double c type. The gcc compiler does do 80 bit math on Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't produce float96 numbers, if it does, it is a bug. Mingw uses the gcc compiler, so the numbers are there, but if it uses the MS library it will have to convert them to double to do computations like sin(x) since there are no microsoft routines for extended precision. I suspect that gcc/ms combo is what is producing the odd results you are seeing. I think we might be talking past each other a bit. It seems to me that, if float96 must use float64 math, then it should be removed from the numpy namespace, because If we were to do so, it would break too much code. David - please - obviously I'm not suggesting removing it without deprecating it. a) It implies higher precision than float64 but does not provide it b) It uses more memory to no obvious advantage There is an obvious advantage: to handle memory blocks which use long double, created outside numpy (or even python). Right - but that's a bit arcane, and I would have thought np.longdouble would be a good enough name for that. Of course, the users may be surprised, as I was, that memory allocated for higher precision is using float64, and that may take them some time to work out. I'll say again that 'longdouble' says to me 'something specific to the compiler' and 'float96' says 'something standard in numpy', and that I - was surprised - when I found out what it was. Otherwise, while gcc indeed supports long double, the fact that the C runtime doesn't really mean it is hopeless to reach any kind of consistency. I'm sorry for my ignorance
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: On OSX 32-bit where float128 is definitely 80 bit precision we see the sign bit being flipped to show us the beginning of the number: In [33]: bigbin(np.float128(2**53)-1) Out[33]: '1011011100111000' In [34]: bigbin(-np.float128(2**53)+1) Out[34]: '111100111000' I think that's 48 bits of padding followed by the number (bit 49 is being flipped with the sign). On windows (well, wine, but I think it's the same): bigbin(np.float96(2**53)-1) Out[14]: '011100111000' bigbin(np.float96(-2**53)+1) Out[15]: '111100111000' Thanks, Matthew bigbin-definition import sys LE = sys.byteorder == 'little' import numpy as np def bigbin(val): val = np.asarray(val) nbytes = val.dtype.itemsize dt = [('f', np.uint8, nbytes)] out = [np.binary_repr(el, 8) for el in val.view(dt)['f']] if LE: out = out[::-1] return ''.join(out) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. So - now I am not sure what this float96 is. I was expecting 80 bit extended precision, but it doesn't look right for that... Does anyone know what representation this is? Thanks a lot, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Int casting different across platforms
Hi, On Sat, Nov 5, 2011 at 6:24 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Nov 4, 2011 at 5:21 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I noticed this: (Intel Mac): In [2]: np.int32(np.float32(2**31)) Out[2]: -2147483648 (PPC): In [3]: np.int32(np.float32(2**31)) Out[3]: 2147483647 I assume what is happening is that the casting is handing off to the c library, and that behavior of the c library differs on these platforms? Should we expect or hope that this behavior would be the same across platforms? Heh. I think the conversion is basically undefined because 2**31 won't fit in int32. The Intel example just takes the bottom 32 bits of 2**31 expressed as a binary integer, the PPC throws up its hands and returns the maximum value supported by int32. Numpy supports casts from unsigned to signed 32 bit numbers by using the same bits, as does C, and that would comport with the Intel example. It would probably be useful to have a Numpy convention for this so that the behavior was consistent across platforms. Maybe for float types we should raise an error if the value is out of bounds. Just to see what happens: #include stdio.h #include math.h int main(int argc, char* argv) { double x; int y; x = pow(2, 31); y = (int)x; printf(%d, %d\n, sizeof(int), y); } Intel, gcc: 4, -2147483648 PPC, gcc: 4, 2147483647 I think that's what you predicted. Is it strange that the same compiler gives different results? It would be good if the behavior was the same across platforms - the unexpected negative overflow caught me out at least. An error sounds sensible to me. Would it cost lots of cycles? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Int casting different across platforms
Hi, On Sun, Nov 6, 2011 at 2:39 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Nov 5, 2011 at 7:35 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Nov 5, 2011 at 4:07 PM, Matthew Brett matthew.br...@gmail.com wrote: Intel, gcc: 4, -2147483648 PPC, gcc: 4, 2147483647 I think that's what you predicted. Is it strange that the same compiler gives different results? It would be good if the behavior was the same across platforms - the unexpected negative overflow caught me out at least. An error sounds sensible to me. Would it cost lots of cycles? C99 says (section F.4): If the floating value is infinite or NaN or if the integral part of the floating value exceeds the range of the integer type, then the ‘‘invalid’’ floating-point exception is raised and the resulting value is unspecified. Whether conversion of non-integer floating values whose integral part is within the range of the integer type raises the ‘‘inexact’’ floating-point exception is unspecified. So it sounds like the compiler is allowed to return whatever nonsense it likes in this case. But, you should be able to cause this to raise an exception by fiddling with np.seterr. However, that doesn't seem to work for me with numpy 1.5.1 on x86-64 linux :-( np.int32(np.float32(2**31)) -2147483648 np.seterr(all=raise) np.int32(np.float32(2**31)) -2147483648 I think this must be a numpy or compiler bug? I don't believe the floating point status is checked in the numpy conversion routines. That looks like a nice small project for someone interested in learning the numpy - . To my shame I doubt that I will have the time to do this, but just in case I or someone does get time, is there a good place to start to look? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Int casting different across platforms
Hi, I noticed this: (Intel Mac): In [2]: np.int32(np.float32(2**31)) Out[2]: -2147483648 (PPC): In [3]: np.int32(np.float32(2**31)) Out[3]: 2147483647 I assume what is happening is that the casting is handing off to the c library, and that behavior of the c library differs on these platforms? Should we expect or hope that this behavior would be the same across platforms? Thanks for any pointers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float64 / int comparison different from float / int comparison
Hi, On Tue, Nov 1, 2011 at 8:39 AM, Chris.Barker chris.bar...@noaa.gov wrote: On 10/31/11 6:38 PM, Stéfan van der Walt wrote: On Mon, Oct 31, 2011 at 6:25 PM, Matthew Brettmatthew.br...@gmail.com wrote: Oh, dear, I'm suffering now: In [12]: res 2**31-1 Out[12]: array([False], dtype=bool) I'm seeing: ... Your result seems very strange, because the numpy scalars should perform exactly the same inside and outside an array. I get what Stéfan gets: In [32]: res = np.array((2**31,), dtype=np.float32) In [33]: res 2**31-1 Out[33]: array([ True], dtype=bool) In [34]: res[0] 2**31-1 Out[34]: True In [35]: res[0].dtype Out[35]: dtype('float32') In [36]: np.__version__ Out[36]: '1.6.1' (OS-X, Intel, Python2.7) Something is very odd with your build! Well - numpy 1.4.1 on Debian squeeze. I get the same as you with current numpy trunk. Stefan and I explored the issue a bit further and concluded that, in numpy trunk, the current behavior is explicable by upcasting to float64 during the comparison: In [86]: np.array(2**63, dtype=np.float) 2**63 - 1 Out[86]: False In [87]: np.array(2**31, dtype=np.float) 2**31 - 1 Out[87]: True because 2**31 and 2**31-1 are both exactly representable in float64, but 2**31-1 is not exactly representable in float32. Maybe this: In [88]: np.promote_types('f4', int) Out[88]: dtype('float64') tells us this information. The command is not available for numpy 1.4.1. I suppose it's possible that the upcasting rules were different in 1.4.1 and that is the cause of the different behavior. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Float128 integer comparison
Hi, On Sat, Oct 15, 2011 at 1:34 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 15.10.2011, at 9:42PM, Aronne Merrelli wrote: On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Continuing the exploration of float128 - can anyone explain this behavior? np.float64(9223372036854775808.0) == 9223372036854775808L True np.float128(9223372036854775808.0) == 9223372036854775808L False int(np.float128(9223372036854775808.0)) == 9223372036854775808L True np.round(np.float128(9223372036854775808.0)) == np.float128(9223372036854775808.0) True I know little about numpy internals, but while fiddling with this, I noticed a possible clue: np.float128(9223372036854775808.0) == 9223372036854775808L False np.float128(4611686018427387904.0) == 4611686018427387904L True np.float128(9223372036854775808.0) - 9223372036854775808L Traceback (most recent call last): File stdin, line 1, in module TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long' np.float128(4611686018427387904.0) - 4611686018427387904L 0.0 My speculation - 9223372036854775808L is the first integer that is too big to fit into a signed 64 bit integer. Python is OK with this but that means it must be containing that value in some more complicated object. Since you don't get the type error between float64() and long: np.float64(9223372036854775808.0) - 9223372036854775808L 0.0 Maybe there are some unimplemented pieces in numpy for dealing with operations between float128 and python arbitrary longs? I could see the == test just producing false in that case, because it defaults back to some object equality test which isn't actually looking at the numbers. That seems to make sense, since even upcasting from a np.float64 still lets the test fail: np.float128(np.float64(9223372036854775808.0)) == 9223372036854775808L False while np.float128(9223372036854775808.0) == np.uint64(9223372036854775808L) True and np.float128(9223372036854775809) == np.uint64(9223372036854775809L) False np.float128(np.uint(9223372036854775809L) == np.uint64(9223372036854775809L) True Showing again that the normal casting to, or reading in of, a np.float128 internally inevitably calls the python float(), as already suggested in one of the parallel threads (I think this also came up with some of the tests for precision) - leading to different results than when you can convert from a np.int64 - this makes the outcome look even weirder: np.float128(9223372036854775807.0) - np.float128(np.int64(9223372036854775807)) 1.0 np.float128(9223372036854775296.0) - np.float128(np.int64(9223372036854775807)) 1.0 np.float128(9223372036854775295.0) - np.float128(np.int64(9223372036854775807)) -1023.0 np.float128(np.int64(9223372036854775296)) - np.float128(np.int64(9223372036854775807)) -511.0 simply due to the nearest np.float64 always being equal to MAX_INT64 in the two first cases above (or anything in between)... Right - just for the record, I think there are four relevant problems. 1: values being cast to float128 appear to go through float64 -- In [119]: np.float128(2**54-1) Out[119]: 18014398509481984.0 In [120]: np.float128(2**54)-1 Out[120]: 18014398509481983.0 2: values being cast from float128 to int appear to go through float64 again --- In [121]: int(np.float128(2**54-1)) Out[121]: 18014398509481984 http://projects.scipy.org/numpy/ticket/1395 3: comparison to python long ints is always unequal --- In [139]: 2**63 # 2*63 correctly represented in float128 Out[139]: 9223372036854775808L In [140]: int(np.float64(2**63)) Out[140]: 9223372036854775808L In [141]: int(np.float128(2**63)) Out[141]: 9223372036854775808L In [142]: np.float128(2**63) == 2**63 Out[142]: False In [143]: np.float128(2**63)-1 == 2**63-1 Out[143]: True In [144]: np.float128(2**63) == np.float128(2**63) Out[144]: True Probably because, as y'all are saying, numpy tries to convert to np.int64, fails, and falls back to an object array: In [145]: np.array(2**63) Out[145]: array(9223372036854775808L, dtype=object) In [146]: np.array(2**63-1) Out[146]: array(9223372036854775807L) 4 : any other operation of float128 with python long ints fails -- In [148]: np.float128(0) + 2**63 --- TypeError Traceback (most recent call last) /home/mb312/ipython-input-148-5cc20524867d in module() 1 np.float128(0) + 2**63 TypeError: unsupported operand type(s) for +: 'numpy.float128' and 'long' In [149
Re: [Numpy-discussion] Nice float - integer conversion?
Hi, On Sat, Oct 15, 2011 at 12:20 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 11, 2011 at 7:32 PM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Oct 11, 2011 at 2:06 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 11 Oct 2011, at 20:06, Matthew Brett wrote: Have I missed a fast way of doing nice float to integer conversion? By nice I mean, rounding to the nearest integer, converting NaN to 0, inf, -inf to the max and min of the integer range? The astype method and cast functions don't do what I need here: In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) Out[40]: array([1, 0, 0, 0], dtype=int16) In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) Out[41]: array([1, 0, 0, 0], dtype=int16) Have I missed something obvious? np.[a]round comes closer to what you wish (is there consensus that NaN should map to 0?), but not quite there, and it's not really consistent either! In a way, there is already consensus in the code. np.nan_to_num() by default converts nans to zero, and the infinities go to very large and very small. np.set_printoptions(precision=8) x = np.array([np.inf, -np.inf, np.nan, -128, 128]) np.nan_to_num(x) array([ 1.79769313e+308, -1.79769313e+308, 0.e+000, -1.2800e+002, 1.2800e+002]) Right - but - we'd still need to round, and take care of the nasty issue of thresholding: x = np.array([np.inf, -np.inf, np.nan, -128, 128]) x array([ inf, -inf, nan, -128., 128.]) nnx = np.nan_to_num(x) nnx array([ 1.79769313e+308, -1.79769313e+308, 0.e+000, -1.2800e+002, 1.2800e+002]) np.rint(nnx).astype(np.int8) array([ 0, 0, 0, -128, -128], dtype=int8) So, I think nice_round would look something like: def nice_round(arr, out_type): in_type = arr.dtype.type mx = floor_exact(np.iinfo(out_type).max, in_type) mn = floor_exact(np.iinfo(out_type).max, in_type) nans = np.isnan(arr) out = np.rint(np.clip(arr, mn, mx)).astype(out_type) out[nans] = 0 return out with floor_exact being something like: https://github.com/matthew-brett/nibabel/blob/range-dtype-conversions/nibabel/floating.py In case anyone is interested or for the sake of anyone later googling this thread - I made a working version of nice_round: https://github.com/matthew-brett/nibabel/blob/floating-stash/nibabel/casting.py Docstring: def nice_round(arr, int_type, nan2zero=True, infmax=False): Round floating point array `arr` to type `int_type` Parameters -- arr : array-like Array of floating point type int_type : object Numpy integer type nan2zero : {True, False} Whether to convert NaN value to zero. Default is True. If False, and NaNs are present, raise CastingError infmax : {False, True} If True, set np.inf values in `arr` to be `int_type` integer maximum value, -np.inf as `int_type` integer minimum. If False, merely set infs to be numbers at or near the maximum / minumum number in `arr` that can be contained in `int_type`. Therefore False gives faster conversion at the expense of infs that are further from infinity. Returns --- iarr : ndarray of type `int_type` Examples nice_round([np.nan, np.inf, -np.inf, 1.1, 6.6], np.int16) array([ 0, 32767, -32768, 1, 7], dtype=int16) It wasn't straightforward to find the right place to clip the array to stop overflow on casting, but I think it's working and tested now. See y'all, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] float64 / int comparison different from float / int comparison
Hi, I just ran into this confusing difference between np.float and np.float64: In [8]: np.float(2**63) == 2**63 Out[8]: True In [9]: np.float(2**63) 2**63-1 Out[9]: True In [10]: np.float64(2**63) == 2**63 Out[10]: True In [11]: np.float64(2**63) 2**63-1 Out[11]: False In [16]: np.float64(2**63-1) == np.float(2**63-1) Out[16]: True I believe values above 2*52 are all represented as integers in float64. http://matthew-brett.github.com/pydagogue/floating_point.html Is this this int64 issue that came up earlier in float128 comparison? Why the difference between np.float and np.float64? Thanks for any insight, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float64 / int comparison different from float / int comparison
Hi, 2011/10/31 Stéfan van der Walt ste...@sun.ac.za: On Mon, Oct 31, 2011 at 11:23 AM, Matthew Brett matthew.br...@gmail.com wrote: In [8]: np.float(2**63) == 2**63 Out[8]: True In [9]: np.float(2**63) 2**63-1 Out[9]: True In [10]: np.float64(2**63) == 2**63 Out[10]: True In [11]: np.float64(2**63) 2**63-1 Out[11]: False In [16]: np.float64(2**63-1) == np.float(2**63-1) Out[16]: True Interesting. Turns out that np.float(x) returns a Python float object. If you change the experiment to only use numpy array scalars, things are more consistent: In [36]: np.array(2**63, dtype=np.float) 2**63 - 1 Out[36]: False In [37]: np.array(2**63, dtype=np.float32) 2**63 - 1 Out[37]: False In [38]: np.array(2**63, dtype=np.float64) 2**63 - 1 Oh, dear, I'm suffering now: In [11]: res = np.array((2**31,), dtype=np.float32) In [12]: res 2**31-1 Out[12]: array([False], dtype=bool) OK - that's what I was expecting from the above, but now: In [13]: res[0] 2**31-1 Out[13]: True In [14]: res[0].dtype Out[14]: dtype('float32') Sorry, maybe I'm not thinking straight, but I'm confused... See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 10:02 PM, Travis Oliphant oliph...@enthought.com wrote: Here are my needs: 1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?) I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype. Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype? That is an interesting idea. It's essentially what people like Wes McKinney are doing now. However, the issue is going to be whether or not you do something special or not with the NA values in the low-level C function the dtype dispatches to. This is the reason for the special bit-pattern dtype. I've always thought that requiring NA checks for code that doesn't want to worry about it would slow things down un-necessarily for those use-cases. Right - now that the caffeine has run through my system adequately, I have a few glasses of wine to disrupt my logic and / or social skills but: Is there any way you could imagine something like this?: In [3]: a = np.arange(10, dtype=np.float) In [4]: a.flags Out[4]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : False In [5]: a[0] = np.NA In [6]: a.flags Out[6]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : True Obviously extension writers would have to keep the flag maintained... Sorry if that doesn't make sense, I do not claim to be in full possession of my faculties, See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 11:19 PM, Travis Oliphant oliph...@enthought.com wrote: Thanks again for your email, I'm sure I'm not the only one who breathes a deep sigh of relief when I see your posts. I appreciate Nathaniel's idea to pull the changes and I can respect his desire to do that. It seemed like there was a lot more heat than light in the discussion this summer. The differences seemed to be enflamed by the discussion instead of illuminated by it. Perhaps, that is why Nathaniel felt like merging Mark's pull request was too strong-armed and not a proper resolution. However, I did not interpret Matthew or Nathaniel's explanations of their position as manipulative or inappropriate. Nonetheless, I don't think removing Mark's changes are a productive direction to take at this point. I agree, it would have been much better to reach a rough consensus before the code was committed. At least, those who felt like their ideas where not accounted for should have felt like there was some plan to either accommodate them, or some explanation of why that was not a good idea. The only thing I recall being said was that there was nobody to implement their ideas. I wish that weren't the case. I think we can still continue to discuss their concerns and look for ways to reasonably incorporate their use-cases if possible. I have probably contributed in the past to the idea that he who writes the code gets the final say. In early-stage efforts, this is approximately right, but success of anything relies on satisfied users and as projects mature the voice of users becomes more relevant than the voice of contributors in my mind. I've certainly had to learn that in terms of ABI changes to NumPy. I think that's right though - that the person who wrote the code has the final say. But that's the final say. The question I wanted to ask was the one Nathaniel brought up at the beginning of the thread, which is, before the final say, how hard do we try for consensus? Is that - the numpy way? Here Chuck was saying 'I listen to you in proportion to your code contribution' (I hope I'm not misrepresenting him). I think that's different way of working than the consensus building that Karl Fogel describes. But maybe that is just the numpy way. I would feel happier to know what that way is. Then, when we get into this kind of dispute Chuck can say 'Matthew, change the numpy constitution or accept the situation because that's how we've agreed to work'. And I'll say - 'OK - I don't like it, but I agree those are the rules'. And we'll get on with it. But at the moment, it feels as if it isn't clear, and, as Ben pointed out, that means we are having a discussion and a discussion about the discussion at the same time. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Large numbers into float128
Hi, On Sun, Oct 30, 2011 at 2:38 AM, Berthold Höllmann berth...@xn--hllmanns-n4a.de wrote: Matthew Brett matthew.br...@gmail.com writes: Hi, Can anyone think of a good way to set a float128 value to an arbitrarily large number? As in v = int_to_float128(some_value) ? I'm trying things like v = np.float128(2**64+2) but, because (in other threads) the float128 seems to be going through float64 on assignment, this loses precision, so although 2**64+2 is representable in float128, in fact I get: In [35]: np.float128(2**64+2) Out[35]: 18446744073709551616.0 In [36]: 2**64+2 Out[36]: 18446744073709551618L So - can anyone think of another way to assign values to float128 that will keep the precision? Just use float128 all the was through, and avoid casting to float in between: . %20.1f%float(2**64+2) '18446744073709551616.0' . np.float128(np.float128(2)**64+2) 18446744073709551618.0 Ah yes - sorry - that would work in this example where I know the component parts of the number, but I was thinking in the general case where I have been given any int. I think my code works for that, by casting to float64 to break up the number into parts: In [35]: def int_to_float128(val): :f64 = np.float64(val) :res = val - int(f64) :return np.float128(f64) + np.float128(res) : In [36]: int_to_float128(2**64) Out[36]: 18446744073709551616.0 In [37]: int_to_float128(2**64+2) Out[37]: 18446744073709551618.0 Thanks, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus
Hi, On Sun, Oct 30, 2011 at 11:37 AM, Chris Barker chris.bar...@noaa.gov wrote: On 10/29/11 2:59 PM, Charles R Harris wrote: I'm much opposed to ripping the current code out. It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project. 1) contributing to the discussion IS a positive contribution to the project. Yes, but, personally I'd rather the discussion was not about who was saying something, but what they were saying. That is, if someone proposes something, or offers a discussion, we don't first ask 'who are you', but try and engage with the substance of the argument. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sun, Oct 30, 2011 at 12:24 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. Indeed. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. It is in this case. I agree
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 8:38 PM, Benjamin Root ben.r...@ou.edu wrote: Matt, On Friday, October 28, 2011, Matthew Brett matthew.br...@gmail.com wrote: Forget about rudeness or decision processes. No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen. Don't get me wrong. In general, you are right. And maybe we all should discuss something to that effect for numpy. But I would rather do that when there isn't such contention and tempers. That's a reasonable point. As for allegations of rudeness, I believe that we are actually very close to consensus that I immediately wanted to squelch any sort of meta-meta-disagreements about who was being rude to who. As a quick band-aide, anybody who felt slighted by me gets a drink on me at the next scipy conference. From this point on, let's institute a 10 minute rule -- write your email, wait ten minutes, read it again and edit it. Good offer. I make the same one. I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate). Thank you - that is very helpful. Are you saying that you'd be OK setting missing values like this? a.mask[0:2] = False Probably not that far, because that would be an attribute that may or may not exist. Rather, I might like the idea of a NA to always mean absent (and destroys - even through views), and MA (or some other name) which always means ignore (and has the masking behavior with views). This makes specific behaviors tied distinctly to specific objects. Ah - yes - thank you. I think you and I at least have somewhere to go for agreement, but, I don't know how to work towards a numpy-wide agreement. Do you have any thoughts? For the read side, do you mean you're OK with this a.isna() To identify the missing values, as is currently the case? Or something else? Yes. A missing value is a missing value, regardless of it being absent or marked as ignored. But it is a bit more subtle than that. I should just be able to add two arrays together and the data should know what to do. When the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then I don't have to do much to prepare higher level funcs for missing data. If so, then I think we're very close, it's just a discussion about names. And what does ignore + absent equals. ;-) ignore + absent == special_value_of_some_sort :) Just joking, See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. I'm not. All I want to point out is is that design and implementation are not completely separated either. No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts. We don't need to do it that way. We're a mature sensible bunch of adults Agreed:) Ah - if only it was that easy :) who can talk out the issues until we agree
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. I'm not. All I want to point out is is that design and implementation are not completely separated either. No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 12:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 1:26 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 1:05 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 1:41 PM, Benjamin Root ben.r...@ou.edu wrote: On Saturday, October 29, 2011, Charles R Harris charlesr.har...@gmail.com wrote: Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell participant in my dictionary. Chuck This is a very good point, but I would highly caution against alienating anybody here. Frankly, I am surprised how much my opinion has been taken here given the very little numpy code I have submitted (I think maybe two or three patches). The Numpy community is far more than just those who use the core library. There is pandas, bottleneck, mpl, the scikits, and much more. Numpy would be nearly useless without them, and certainly vice versa. I was quite impressed by your comments on Mark's work, I thought they were excellent. It doesn't really take much to make an impact in a small community overburdened by work. We are all indebted to each other for our works. We must never lose that perspective. We all seem to have a different set of assumptions of how development should work. Each project follows its own workflow. Numpy should be free to adopt their own procedures, and we are free to discuss them. I do agree with chuck that he shouldn't have to make a written invitation to each and every person to review each pull. However, maybe some work can be done to bring the pull request and issues discussion down to the mailing list. I would like to do something similar with mpl. As for voting rights, let's make that a separate discussion. With such a small community, I'd rather avoid the whole voting thing if possible. But, if there is one thing worse than voting, it is implicit voting. Implicit voting is where you ignore people who you don't think should have a voice. Unless I'm mistaken, that's what you are suggesting should be the norm. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? OK, update: I took Ben's 10 minutes to go back and read the reference doc and your email again, just in case. The current implementation still seems natural to me to explain. It fits my use-cases. Perhaps that's different for you because you and I deal with different kinds of data. I don't have to explicitly treat absent and ignored data differently; those two are actually mixed and indistinguishable already in much of my data. Therefore the current implementation works well for me, having to make a distinction would be a needless complication. OK - I'm not sure that contributes much to the discussion, because the problem is being able to explain to each other in details why one solution is preferable to another. To follow your own advice, you'd post some code snippets showing how you'd see the two ideas playing out and why one is clearer than the other. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. If it isn't then, let me know, preferably with something like I can't see exactly how the following [code snippet] would work in your conception of the problem - and then I can either try and give fake examples, or write a mock up. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. Indeed. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation. Thanks - this is helpful. It doesn't require ripping everything out. Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different.If you think we should not do that, and you are interested, please say why
[Numpy-discussion] Large numbers into float128
Hi, Can anyone think of a good way to set a float128 value to an arbitrarily large number? As in v = int_to_float128(some_value) ? I'm trying things like v = np.float128(2**64+2) but, because (in other threads) the float128 seems to be going through float64 on assignment, this loses precision, so although 2**64+2 is representable in float128, in fact I get: In [35]: np.float128(2**64+2) Out[35]: 18446744073709551616.0 In [36]: 2**64+2 Out[36]: 18446744073709551618L So - can anyone think of another way to assign values to float128 that will keep the precision? Thanks a lot, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. Indeed. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. It is in this case. I agree
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 4:18 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 4:28 PM, Han Genuit hangen...@gmail.com wrote: To be honest, you have been slandering a lot, also in previous discussions, to get what you wanted. This is not a healthy way of discussion, nor does it help in any way. That's a severe accusation. Please quote something I said that was false, or unfair. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 11:14 AM, Wes McKinney wesmck...@gmail.com wrote: On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney wesmck...@gmail.com wrote: On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root ben.r...@ou.edu wrote: On Friday, October 28, 2011, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me. Best, Matthew Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress? Forget about rudeness or decision processes. I will start by saying that I am willing to separate ignore and absent, but only on the write
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. Indeed. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2
Re: [Numpy-discussion] Large numbers into float128
Hi, On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Can anyone think of a good way to set a float128 value to an arbitrarily large number? As in v = int_to_float128(some_value) ? I'm trying things like v = np.float128(2**64+2) but, because (in other threads) the float128 seems to be going through float64 on assignment, this loses precision, so although 2**64+2 is representable in float128, in fact I get: In [35]: np.float128(2**64+2) Out[35]: 18446744073709551616.0 In [36]: 2**64+2 Out[36]: 18446744073709551618L So - can anyone think of another way to assign values to float128 that will keep the precision? To answer my own question - I found an unpleasant way of doing this. Basically it is this: def int_to_float128(val): f64 = np.float64(val) res = val - int(f64) return np.float128(f64) + np.float128(res) Used in various places here: https://github.com/matthew-brett/nibabel/blob/e18e94c5b0f54775c46b1c690491b8bd6f07eb49/nibabel/floating.py Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Sat, Oct 29, 2011 at 7:48 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 7:47 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root ben.r...@ou.edu wrote: On Thursday, October 27, 2011, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant oliph...@enthought.com wrote: That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). What is the counter-argument to this proposal? What exactly do you find convincing? The current masks propagate by default: In [1]: a = ones(5, maskna=1) In [2]: a[2] = NA In [3]: a Out[3]: array([ 1., 1., NA, 1., 1.]) In [4]: a + 1 Out[4]: array([ 2., 2., NA, 2., 2.]) In [5]: a[2] = 10 In [5]: a Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. Chuck I think chuck sums it up quite nicely. The implementation detail about using mask versus bit patterns can still be discussed and addressed. Personally, I just don't see how parameterized dtypes would be easier to use than the pseudo assignment. The elegance of mark's solution was to consider the treatment of missing data in a unified manner. This puts missing data in a more prominent spot for extension builders, which should greatly improve support throughout the ecosystem. Are extension builders then required to use the numpy C API to get their data? Speaking as an extension builder, I would rather you gave me the mask and the bitpattern information and let me do that myself. By letting there be a single missing data framework (instead of two) all that users need to figure out is when they want nan-like behavior (propagate) or to be more like masks (skip). Numpy takes care of the rest. There is a reason why I like using masked arrays because I don't have to use nansum in my library functions to guard against the possibility of receiving nans. Duck-typing is a good thing. My argument against separating IGNORE and PROPAGATE is that it becomes too tempting to want to mix these in an array, but the desired behavior would likely become ambiguous.. There is one other proplem that I just thought of that I don't think has been outlined in either NEP. What if I perform an operation between an array set up with propagate NAs and an array with skip NAs? These are explicitly covered in the alterNEP: https://gist.github.com/1056379/ Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 9:21 AM, Chris.Barker chris.bar...@noaa.gov wrote: On 10/27/11 7:51 PM, Travis Oliphant wrote: As I mentioned. I find the ability to separate an ABSENT idea from an IGNORED idea convincing. In other words, I think distinguishing between masks and bit-patterns is not just an implementation detail, but provides a useful concept for multiple use-cases. Exactly -- while one can implement ABSENT with a mask, one can not implement IGNORE with a bit-pattern. So it is not an implementation detail. I also think bit-patterns are a bit of a dead end: - there is only a standard for one data type family: i.e. NaN for ieee float types - So we would be coming up with our own standard (or adopting an existing one, but I don't think there is one widely supported) for other types. This means: 1) a lot of work to do Largest possible negative integer for ints / largest integer for uints / not allowed for bool? 2) a binary format incompatible with other code, compilers, etc. This is a BIG deal -- a major strength of numpy is that it serves as a wrapper for a data block that is compatible with C, Fortran or whatever code -- special bit patterns would make this a lot harder. Extension code is going to get harder. At the moment, as far as I understand it, our extension code can receive a masked array and (without an explicit check from us) ignore the mask and process all the values. Then you're in the unfortunate situation of caring what's under the mask. Bitpatterns would - I imagine - be safer in that respect in that they would be new dtypes and thus extension code would by default reject them as unknown. We also talked about the fact that a 8-bit mask provides the ability to carry other information in the mask -- not jsut missing or ignored, but a handful of other possible reasons for masking. I think that has a lot of possibilities. On 10/28/11 2:11 AM, Stéfan van der Walt wrote: Another data point: I've been spending some time on scikits-image recently, and although masked values would be highly useful in that context, the cost of doubling memory use (for uint8 images, e.g.) is too high. 2) that we make a concerted effort to implement the bitmask mode of operation as soon as possible. I wonder if that might be handled as a scikits-image extension, rather than core numpy? I think Stefan and Nathaniel and Gary Strangman and others are saying we don't want to pay the price of a large memory hike for masking. I suspect that Nathaniel is right, and that a large majority of those of us who want 'missing data' functionality, also want what we've called ABSENT missing values, and care about memory. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root ben.r...@ou.edu wrote: On Thursday, October 27, 2011, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant oliph...@enthought.com wrote: That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). What is the counter-argument to this proposal? What exactly do you find convincing? The current masks propagate by default: In [1]: a = ones(5, maskna=1) In [2]: a[2] = NA In [3]: a Out[3]: array([ 1., 1., NA, 1., 1.]) In [4]: a + 1 Out[4]: array([ 2., 2., NA, 2., 2.]) In [5]: a[2] = 10 In [5]: a Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. Chuck I think chuck sums it up quite nicely. The implementation detail about using mask versus bit patterns can still be discussed and addressed. Personally, I just don't see how parameterized dtypes would be easier to use than the pseudo assignment. The elegance of mark's solution was to consider the treatment of missing data in a unified manner. This puts missing data in a more prominent spot for extension builders, which should greatly improve support throughout the ecosystem. Are extension builders then required to use the numpy C API to get their data? Speaking as an extension builder, I would rather you gave me the mask and the bitpattern information and let me do that myself. Forgive me, I wasn't clear. What I am speaking of is more about a typical human failing. If a programmer for a module never encounters masked arrays, then when they code up a function to operate on numpy data, it is quite likely that they would never take it into consideration. Notice the prolific use of np.asarray() even within the numpy codebase, which destroys masked arrays. Hmm - that sounds like it could cause some surprises. So, what you were saying was just that it was good that masked arrays were now closer to the core? That's reasonable, but I don't think it's relevant to the current discussion. I think we all agree it is nice to have masked arrays in the core. However, by making missing data support more integral into the core of numpy, then it is far more likely that a programmer would take it into consideration when designing their algorithm, or at least explicitly document that their module does not support missing data. Both NEPs does this by making missing data front-and-center. However, my belief is that Mark's approach is easier to comprehend and is cleaner. Cleaner features means that it is more likely to be used. The main motivation for the alterNEP was our strong feeling that separating ABSENT and IGNORE was easier to comprehend and cleaner. I think it would be hard to argue that the aterNEP idea is not more explicit. By letting there be a single missing data framework (instead of two) all that users need to figure out is when they want nan-like behavior (propagate) or to be more like masks (skip). Numpy takes care of the rest. There is a reason why I like using masked arrays because I don't have to use nansum in my library functions to guard against the possibility of receiving nans. Duck-typing is a good thing. My argument against separating IGNORE and PROPAGATE is that it becomes too tempting to want to mix these in an array, but the desired behavior would likely become ambiguous.. There is one other proplem that I just thought of that I don't think has been outlined in either NEP. What if I perform
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 12:15 PM, Lluís xscr...@gmx.net wrote: Summarizing: let's forget for a moment that mask has a meaning in english: This is at the core of the problem. You and I know what's really going on - there's a mask over the data. But in what follows we're going to try and pretend that is not what is going on. The result is something that is rather hard to understand, and, when you do understand it, it's surprising and inconvenient. This is all because we tried to conceal what was really going on. - maskna corresponds to ABSENT - ownmaskna corresponds to IGNORED The problem here is that of the two implementation mechanisms (masks and bitpatterns), only the first can provide both semantics. But let's be clear. The current masked array implementation is made so it looks like ABSENT, and makes IGNORED hard to get to. Let's start with an array that already supports NAs: In [1]: a = np.array([1, 2, 3], maskna = True) ABSENT (destructive NA assignment) -- Once you assign NA, even if you're using NA masks, the value seems to be lost forever (i.e., the assignment is destructive regardless of the value): In [2]: b = a.view() In [3]: c = a.view(maskna = True) In [4]: b[0] = np.NA In [5]: a Out[5]: array([NA, 2, 3]) In [6]: b Out[6]: array([NA, 2, 3]) In [7]: c Out[7]: array([NA, 2, 3]) Right - the mask (fundamentally an IGNORED signal) is pretending to implement ABSENT. But - as you point out below - I'm pasting it here - in fact it's IGNORED. In [21]: a = np.array([1, 2, 3]) Out[21]: array([1, 2, 3]) In [22]: b = a.view(maskna = True) In [23]: b[0] = np.NA In [24]: a Out[24]: array([1, 2, 3]) In [25]: b Out[25]: array([NA, 2, 3]) But now - I've done this: a = np.array([99, 100, 3], maskna=True) a[0:2] = np.NA You and I know that I've got an array with values [99, 100, 3] and a mask with values [False, False, True]. So maybe I'd like to see what happens if I take off the mask from the second value. I know that's what I want to do, but I don't know how to do it, because you won't let me manipulate the mask, because I'm not allowed to know that the NA values come from the mask. The alterNEP is just saying - please - be straight with me. If you're doing masking, show me the mask, and don't try and hide that there are stored values underneath. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett matthew.br...@gmail.com wrote: You and I know that I've got an array with values [99, 100, 3] and a mask with values [False, False, True]. So maybe I'd like to see what happens if I take off the mask from the second value. I know that's what I want to do, but I don't know how to do it, because you won't let me manipulate the mask, because I'm not allowed to know that the NA values come from the mask. The alterNEP is just saying - please - be straight with me. If you're doing masking, show me the mask, and don't try and hide that there are stored values underneath. Considering that you have admitted before to not regularly using masked arrays, I seriously doubt that you would be able to judge whether this is a significant detriment or not. My entire point that I have been making is that Mark's implementation is not the same as the current masked arrays. Instead, it is a cleaner, more mature implementation that gets rid of extraneous features. This may explain why we don't seem to be getting anywhere. I am sure that Mark's implementation of masking is great. We're not talking about that. We're talking about whether it's a good idea to make masking look as though it is implementing the ABSENT idea. That's what I think is confusing, and that's the conversation I have been trying to pursue. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 1:52 PM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett matthew.br...@gmail.com wrote: You and I know that I've got an array with values [99, 100, 3] and a mask with values [False, False, True]. So maybe I'd like to see what happens if I take off the mask from the second value. I know that's what I want to do, but I don't know how to do it, because you won't let me manipulate the mask, because I'm not allowed to know that the NA values come from the mask. The alterNEP is just saying - please - be straight with me. If you're doing masking, show me the mask, and don't try and hide that there are stored values underneath. Considering that you have admitted before to not regularly using masked arrays, I seriously doubt that you would be able to judge whether this is a significant detriment or not. My entire point that I have been making is that Mark's implementation is not the same as the current masked arrays. Instead, it is a cleaner, more mature implementation that gets rid of extraneous features. This may explain why we don't seem to be getting anywhere. I am sure that Mark's implementation of masking is great. We're not talking about that. We're talking about whether it's a good idea to make masking look as though it is implementing the ABSENT idea. That's what I think is confusing, and that's the conversation I have been trying to pursue. Best, Matthew Sorry if I came across too strongly there. No disrespect was intended. I wasn't worried about the disrespect. It's just I feel the discussion has not been to the point. Personally, I think we are getting somewhere. We have been whittling away what it is that we do agree upon, and have begun to specify *exactly* what it is that we disagree on. I have understand your concern, and -- like I said in my previous email -- it makes sense from the perspective of numpy.ma users have had up to now. But I'm not a numpy.ma user, I'm just someone who knows that what you are doing is masking out values. The fact that I do not use numpy.ma points out that it's possible to find this highly counter-intuitive without prior bias. But, I re-raise my point that I have been making about the need to re-think masked arrays. If we consider masks as advanced slicing or boolean indexing, then being unable to access the underlying values actually makes a lot of sense. Consider it a contract when I pass a set of data with only certain values exposed. Because I passed the data with only those values exposed, then it must have been entirely my intention to let the function know of only those values. It would be a violation of that contract if the function obtained those masked values. If I want to communicate both the original values and a particular mask, then I pass the array and a view with a particular mask. This is the old discussion about what Python users expect. I think they expect to be treated as adults. That is, breaking the contract should not be easy to do by accident, but it should be allowed. Maybe it would be helpful that an array can never have its own mask, but rather, only views can carry masks? In conclusion, I submit that this is largely a problem that can be solved with the proper documentation. New users who never used numpy.ma before do not have to concern themselves with the old way of thinking and are just simply taught what masked arrays are. Meanwhile, a special section of the documentation should be made that teaches numpy.ma users how masked arrays should be. I don't think documentation will solve it. In a way, the ideal user is someone who doesn't know what's going on, because, for a while, they may not realize that when they thought they were doing assignment, in fact they are doing masking. Unfortunately, I suspect almost everyone using these things will start to realize that, and then they will start getting confused. I find it confusing, and I believe myself to understand the issues pretty well, and be of numpy-user-range comprehension powers. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
On Fri, Oct 28, 2011 at 2:32 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris charlesr.har...@gmail.com wrote: 2011/10/28 Stéfan van der Walt ste...@sun.ac.za On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root ben.r...@ou.edu wrote: The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean ignore and np.NA mean absent. How far off are we really from consensus? Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made. Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around. We have not always been good at separating the concept of disagreement from that of rudeness. As I've said before, one form of rudeness (and not disagreement) is ignoring people. We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 4:21 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 5:09 PM, Matthew Brett matthew.br...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris charlesr.har...@gmail.com wrote: 2011/10/28 Stéfan van der Walt ste...@sun.ac.za On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root ben.r...@ou.edu wrote: The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean ignore and np.NA mean absent. How far off are we really from consensus? Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made. Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around. We have not always been good at separating the concept of disagreement from that of rudeness. As I've said before, one form of rudeness (and not disagreement) is ignoring people. We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion. Trying things out in preparation for discussion is also a mark of respect. Have you worked with the current implementation? OK - this seems to me to be rude. Why? Because you have presumably already read what my concerns were, and my discussion of the current implementation in my reply to Travis. You haven't made any effort to point out to me where I may be wrong or failing to understand. I infer that you are merely saying 'go away and come back later'. And that is rude. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like every interested party has a veto[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the NA mask feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature
Re: [Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Hi, On Fri, Oct 28, 2011 at 4:53 PM, Benjamin Root ben.r...@ou.edu wrote: On Friday, October 28, 2011, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented.However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's formulas (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me. Best, Matthew Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress? Yes, please. Forget about rudeness or decision processes. No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen. I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant oliph...@enthought.com wrote: So, I am very interested in making sure I remember the details of the counterproposal. What I recall is that you wanted to be able to differentiate between a bit-pattern mask and a boolean-array mask in the API. I believe currently even when bit-pattern masks are implemented the difference will be hidden from the user on the Python level. I am sure to be missing other parts of the discussion as I have been in and out of it. The ideas -- The question that we were addressing in the alter-NEP was: should missing values implemented as bitpatterns appear to be the same as missing values implemented with masks? We said no, and Mark said yes. To restate the argument in brief; Nathaniel and I and some others thought that there were two separable ideas in play: 1) A value that is finally and completely missing. == ABSENT 2) A value that we would like to ignore for the moment but might want back at some future time == IGNORED (I'm using the adjectives ABSENT and IGNORED here to be short for the objects 'absent value' and 'ignored value'. This is to distinguish from the verbs below). We thought bitpatterns were a good match for the former, and masking was a good match for the latter. We all agreed there were two things you might like to do with values that were missing in both senses above: A) PROPAGATE; V + 1 == V B) SKIP; K + 1 == 1 (Note verbs for the behaviors). I believe the original np.ma masked arrays always SKIP. In [2]: a = np.ma.masked_array? In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) In [4]: a Out[4]: masked_array(data = [-- 2], mask = [ True False], fill_value = 99) In [5]: a.sum() Out[5]: 2 There was some discussion as to whether there was a reason to think that ABSENT should always or by default PROPAGATE, and IGNORED should always or by default SKIP. Chuck is referring to this idea when he said further up this thread: For instance, I'm thinking skipna=1 is the natural default for the masked arrays. The current implementation --- What we have now is an implementation of masked arrays, but more tightly integrated into the numpy core. In our language we have an implementation of IGNORED that is tuned to be nearly indistinguishable from the behavior we are expecting of ABSENT. Specifically, once you have done this: In [9]: a = np.array([99, 2], maskna=True) you can get something representing the mask: In [11]: np.isna(a) Out[11]: array([False, False], dtype=bool) but I believe there is no way of setting the mask directly. In order to set the mask, you have to do what looks like an assignment: In [12]: a[0] = np.NA In [14]: a Out[14]: array([NA, 2]) In fact, what has happened is the mask has changed, but the underlying value has not: In [18]: orig = np.array([99, 2]) In [19]: a = orig.view(maskna=True) In [20]: a[0] = np.NA In [21]: a Out[21]: array([NA, 2]) In [22]: orig Out[22]: array([99, 2]) This is different from real assignment: In [23]: a[0] = 0 In [24]: a Out[24]: array([0, 2], maskna=True) In [25]: orig Out[25]: array([0, 2]) Some effort has gone into making it difficult to pull off the mask: In [30]: a.view(np.int64) Out[30]: array([NA, 2]) In [31]: a.view(np.int64).flags Out[31]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False MASKNA : True OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [32]: a.astype(np.int64) --- ValueErrorTraceback (most recent call last) /home/mb312/ipython-input-32-e7f3381c9692 in module() 1 a.astype(np.int64) ValueError: Cannot assign NA to an array which does not support NAs The default behavior of the masked values is PROPAGATE, but they can be individually made to SKIP: In [28]: a.sum() # PROPAGATE Out[28]: NA(dtype='int64') In [29]: a.sum(skipna=True) # SKIP Out[29]: 2 Where's the beef? - I personally still think that it is confusing to fuse the concept of: 1) Masked arrays 2) Arrays with bitpattern codes for missing and the concepts of A) ABSENT and B) IGNORED Consequences for current code Specifically, it still seems to me to make sense to prefer this: a = np.array([99, 2[, masking=True) a.mask [ True, True ] a.sum() 101 a.mask[0] = False a.sum() 2 It might make sense, as Chuck suggests, to change the default to 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED and 'skipna' to skipignored' for clarity. I still think the pseudo-assignment: In [20]: a[0] = np.NA is confusing, and should be removed. Later, should we ever have bitpatterns, there would be something like np.ABSENT. This of course would make sense for assignment: In [20]: a[0] = np.ABSENT There would be
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Wed, Oct 26, 2011 at 1:07 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 25, 2011 at 4:49 PM, Matthew Brett matthew.br...@gmail.com wrote: I guess from your answer that such a warning would be complicated to implement, and if that's the case, I can imagine it would be low priority. I assume the problem is more that it would be a weirdo check that becomes a maintenance burden (what is this doing here? Do we still need it? who knows?) than that it would be hard to do. You can easily do it yourself as a workaround... if not str(np.longdouble(2)**64 - 1).startswith(1844): warn(Printing of longdoubles is fubared! Beware! Beware!) Thanks - yes - I was only thinking of someone like me getting confused and thinking badly of us if they run into this. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant oliph...@enthought.com wrote: So, I am very interested in making sure I remember the details of the counterproposal. What I recall is that you wanted to be able to differentiate between a bit-pattern mask and a boolean-array mask in the API. I believe currently even when bit-pattern masks are implemented the difference will be hidden from the user on the Python level. I am sure to be missing other parts of the discussion as I have been in and out of it. Nathaniel - are you online today? Do you have time to review the current implementation and see if it affects the initial discussion? I'm running around most of today but I should have time to do some thinking later this afternoon CA time. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 7:31 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Oct 24, 2011 at 10:59 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I just ran into this on a PPC machine: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-4daf949' In [3]: res = np.longdouble(2)**64 In [4]: res Out[4]: 18446744073709551616.0 In [5]: 2**64 Out[5]: 18446744073709551616L In [6]: res-1 Out[6]: 36893488147419103231.0 Same for numpy 1.4.1. I don't have a SPARC to test on but I believe it's the same double-double type? The PPC uses two doubles to represent long doubles, the SPARC uses software emulation of ieee quad precision for long doubles, very different. Yes, thanks - I read more after my post. I guess from this: http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/128bit_long_double_floating-point_datatype.htm that AIX does use double-double. The subtraction of 1 working like multiplication by two is strange, perhaps the one is getting subtracted from the exponent somehow? It would be interesting to see if the same problem happens in pure c. As a work around, can I ask what you are trying to do with the long doubles? I was trying to use them as an intermediate format for high-precision floating point calculations, before converting to integers. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 06:59, Matthew Brett kirjoitti: res = np.longdouble(2)**64 res-1 36893488147419103231.0 Can you check if long double works properly (not a given) in C on that platform: long double x; x = powl(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); or, in case the platform doesn't have powl: long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); Both the same as numpy: [mb312@jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' [mb312@jerry ~]$ ./a.out 1.84467e+19 3.68935e+19 Thanks, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 8:04 AM, Lluís xscr...@gmx.net wrote: Matthew Brett writes: I'm afraid I find this whole thread very unpleasant. I have the odd impression of being back at high school. Some of the big kids are pushing me around and then the other kids join in. It didn't have to be this way. Someone could have replied like this to Nathaniel: Oh - yes - I'm sorry - we actually had the discussion on the pull request. Looking back, I see that we didn't flag this up on the mailing list and maybe we should have. Thanks for pointing that out. Maybe we could start another discussion of the API in view of the changes that have gone in. But that didn't happen. Well, I really thought that all the interested parties would take a look at [1]. While it's true that the pull requests are not obvious if you're not using the functionalities of the github web (or unless announced in this list), I think that Mark's announcement was precisely directed at having a new round of discussions after having some code to play around with and see how intuitive or counter-intuitive the implemented concepts could be. I just wanted to be clear what I meant. The key point is not whether or not the pull-request or request for testing was in fact the right place for the discussion that Travis suggested. I guess you can argue that either way. I'd say no, but I can see how you would disagree on that. The key point is - how much do we value constructive disagreement? If we do value constructive disagreement then we'll go out of our way to talk through the points of contention, and make sure that the people who disagree, especially the minority, feel that they have been fully heard. If we don't value constructive disagreement then we'll let the other side know that further disagreement will be taken as a sign of bad faith. Now - what do you see here? I see the second and that worries me. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 10:52 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 25, 2011 at 11:45 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 06:59, Matthew Brett kirjoitti: res = np.longdouble(2)**64 res-1 36893488147419103231.0 Can you check if long double works properly (not a given) in C on that platform: long double x; x = powl(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); or, in case the platform doesn't have powl: long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); Both the same as numpy: [mb312@jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' I think implicit here means that that the arguments and the return values are treated as integers. Did you #include math.h? Ah - you've detected my severe ignorance of c. But with math.h, the result is the same, #include stdio.h #include math.h int main(int argc, char* argv) { long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); } See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
On Tue, Oct 25, 2011 at 11:05 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 25, 2011 at 10:52 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 25, 2011 at 11:45 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 06:59, Matthew Brett kirjoitti: res = np.longdouble(2)**64 res-1 36893488147419103231.0 Can you check if long double works properly (not a given) in C on that platform: long double x; x = powl(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); or, in case the platform doesn't have powl: long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); Both the same as numpy: [mb312@jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' I think implicit here means that that the arguments and the return values are treated as integers. Did you #include math.h? Ah - you've detected my severe ignorance of c. But with math.h, the result is the same, #include stdio.h #include math.h int main(int argc, char* argv) { long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); } By the way - if you want a login to this machine, let me know - it's always on and we're using it as a buildslave already. Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 11:14 AM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 19:45, Matthew Brett kirjoitti: [clip] or, in case the platform doesn't have powl: long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); Both the same as numpy: [mb312@jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' [mb312@jerry ~]$ ./a.out 1.84467e+19 3.68935e+19 This result may indicate that it's the *printing* of long doubles that's broken. Note how the value cast as double prints the correct result, whereas the %Lg format code gives something wrong. Ah - sorry - I see now what you were trying to do. Can you try to check this by doing something like: - do some set of calculations using np.longdouble in Numpy (that requires the extra accuracy) - at the end, cast the result back to double In [1]: import numpy as np In [2]: res = np.longdouble(2)**64 In [6]: res / 2**32 Out[6]: 4294967296.0 In [7]: (res-1) / 2**32 Out[7]: 8589934591.98 In [8]: np.float((res-1) / 2**32) Out[8]: 4294967296.0 In [9]: np.float((res) / 2**32) Out[9]: 4294967296.0 Thanks, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 11:24 AM, Benjamin Root ben.r...@ou.edu wrote: On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 25, 2011 at 8:04 AM, Lluís xscr...@gmx.net wrote: Matthew Brett writes: I'm afraid I find this whole thread very unpleasant. I have the odd impression of being back at high school. Some of the big kids are pushing me around and then the other kids join in. It didn't have to be this way. Someone could have replied like this to Nathaniel: Oh - yes - I'm sorry - we actually had the discussion on the pull request. Looking back, I see that we didn't flag this up on the mailing list and maybe we should have. Thanks for pointing that out. Maybe we could start another discussion of the API in view of the changes that have gone in. But that didn't happen. Well, I really thought that all the interested parties would take a look at [1]. While it's true that the pull requests are not obvious if you're not using the functionalities of the github web (or unless announced in this list), I think that Mark's announcement was precisely directed at having a new round of discussions after having some code to play around with and see how intuitive or counter-intuitive the implemented concepts could be. I just wanted to be clear what I meant. The key point is not whether or not the pull-request or request for testing was in fact the right place for the discussion that Travis suggested. I guess you can argue that either way. I'd say no, but I can see how you would disagree on that. This is getting very meta... a disagreement about the disagreement. Yes, the important point is a social one. The other points are details. The key point is - how much do we value constructive disagreement? Personally, I value it very much. Well - I think everyone believes that that they value constructive discussion, but the question is, what happens when people really disagree? My impression of the discussion we all had at the beginning was that the needs of the two distinct communities (R-users and masked array users) were both heard and largely addressed. Aspects of both approaches were used, and the final result is, IMHO, inspired and elegant. Is it perfect? No. Are there ways to improve it? Absolutely, and I fully expect that to happen. To be clear once more, I personally feel we don't need to discuss: 1) Whether Mark did a good job on the code (I have high bias to imagine so). 2) Whether something along these lines would be good to have in numpy If we do value constructive disagreement then we'll go out of our way to talk through the points of contention, and make sure that the people who disagree, especially the minority, feel that they have been fully heard. If we don't value constructive disagreement then we'll let the other side know that further disagreement will be taken as a sign of bad faith. Now - what do you see here? I see the second and that worries me. It is disappointing that you choose not to participate in the thread linked above or in the pull request itself. If I remember correctly, you were working on finishing up your dissertation, so I fully understand the time constraints involved there. However, the pull request and the email notification is the de facto method of staging and discussing changes in any development project. No objections were raised in that pull request, so it went in after some time passed. To hold off the merge, all one would need to do is fire off a quick comment requesting a delay to have a chance to review the pull request. I think the pull-request was not the right vehicle for the discussion, you think it was, that's fine, I don't think we need to rehearse that. My question (if you are answering my question) is: if you put yourself in my or Nathaniel's shoes, would you feel that you had been warmly encouraged to express disagreement, or would you feel something else. Luckily, git is a VCS, so we are fully capable of reverting any necessary changes if warranted. If you have any concerns or suggestions for changes in the current implementation, feel free to raise them and open additional pull requests. There is no ganging up here or any other subterfuge. Tell us exactly what are your issues with the current setup, provide example code demonstrating the issues, and we can certainly discuss ways to improve this. Has the situation changed since the counter-NEP that Nathaniel and I wrote up? Remember, we *all* have a common agreement here. NumPy needs better support for missing data (in whatever form). Let's work from that assumption and make NumPy a better library to use for everybody! I remember walking past a church in a small town in the California desert. It had a sign outside saying 'People who are busy rowing do not have time to rock the boat'. This seemed to me a total failure
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 12:01 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 25 Oct 2011, at 20:05, Matthew Brett wrote: Both the same as numpy: [mb312@jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' I think implicit here means that that the arguments and the return values are treated as integers. Did you #include math.h? Ah - you've detected my severe ignorance of c. But with math.h, the result is the same, #include stdio.h #include math.h int main(int argc, char* argv) { long double x; x = pow(2, 64); x -= 1; printf(%g %Lg\n, (double)x, x); } What system/compiler is this? I am getting ./ldouble 1.84467e+19 1.84467e+19 and res = np.longdouble(2)**64 res 18446744073709551616.0 2**64 18446744073709551616L res-1 18446744073709551615.0 np.__version__ '1.6.1' as well as with np.__version__ '2.0.0.dev-3d06f02' [yes, not very up to date] and for all gcc versions /usr/bin/gcc -v Using built-in specs. Target: powerpc-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --program-prefix= --host=powerpc-apple-darwin9 --target=powerpc-apple-darwin9 Thread model: posix gcc version 4.0.1 (Apple Inc. build 5493) to /sw/bin/gcc-fsf-4.6 -v Using built-in specs. COLLECT_GCC=/sw/bin/gcc-fsf-4.6 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.6/libexec/gcc/powerpc-apple-darwin9.8.0/4.6.1/lto-wrapper Target: powerpc-apple-darwin9.8.0 Configured with: ../gcc-4.6.1/configure --prefix=/sw --prefix=/sw/lib/gcc4.6 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.6/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.6 --enable-cloog-backend=isl --disable-libjava-multilib --disable-libquadmath Thread model: posix gcc version 4.6.1 (GCC) uname -a Darwin osiris.astro.physik.uni-goettingen.de 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh mb312@jerry ~]$ gcc -v Using built-in specs. Target: powerpc-apple-darwin8 Configured with: /var/tmp/gcc/gcc-5370~2/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8 --target=powerpc-apple-darwin8 Thread model: posix gcc version 4.0.1 (Apple Computer, Inc. build 5370) [mb312@jerry ~]$ uname -a Darwin jerry.bic.berkeley.edu 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 12:14 PM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 20:29, Matthew Brett kirjoitti: [clip] In [7]: (res-1) / 2**32 Out[7]: 8589934591.98 In [8]: np.float((res-1) / 2**32) Out[8]: 4294967296.0 Looks like a bug in the C library installed on the machine, then. It's either in wontfix territory for us, or in the cast to doubles before formatting one. In the latter case, one would have to maintain a list of broken C libraries (ugh). How about a check at import time and a warning when printing? Is that hard to do? See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, On Tue, Oct 25, 2011 at 2:58 PM, David Cournapeau courn...@gmail.com wrote: On Tue, Oct 25, 2011 at 8:22 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Tue, Oct 25, 2011 at 12:14 PM, Pauli Virtanen p...@iki.fi wrote: 25.10.2011 20:29, Matthew Brett kirjoitti: [clip] In [7]: (res-1) / 2**32 Out[7]: 8589934591.98 In [8]: np.float((res-1) / 2**32) Out[8]: 4294967296.0 Looks like a bug in the C library installed on the machine, then. It's either in wontfix territory for us, or in the cast to doubles before formatting one. In the latter case, one would have to maintain a list of broken C libraries (ugh). How about a check at import time and a warning when printing? Is that hard to do? That's fragile IMO. I think that Chuck summed it well: long double are not portable, don't use them unless you have to or you can rely on platform-specificities. That reminds me of the old joke about the Irishman giving directions - If I were you, I wouldn't start from here. I would rather spend some time on implementing/integrating portable quad precision in software, I guess from your answer that such a warning would be complicated to implement, and if that's the case, I can imagine it would be low priority. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, Thank you for your gracious email. On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant oliph...@enthought.com wrote: It is a shame that Nathaniel and perhaps Matthew do not feel like their voice was heard. I wish I could have participated more fully in some of the discussions. I don't know if I could have really helped, but I would have liked to have tried to perhaps work alongside Mark to integrate some of the other ideas that had been expressed during the discussion. Unfortunately, I was traveling in NYC most of the time that Mark was working on this project and did not get a chance to interact with him as much as I would have liked. My view is that we didn't get quite to where I thought we would get, nor where I think we could be. I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. Merging Mark's code does not mean there is not more work to be done, but it is consistent with the reality that currently development on NumPy happens when people have the time to do it. I have not seen anything to convince me that there is not still time to make specific API changes that address some of the concerns. Perhaps, Nathaniel and or Matthew could summarize their concerns again and if desired submit a pull request to revert the changes. However, there is a definite bias against removing working code unless the arguments are very strong and receive a lot of support from others. Honestly - I am not sure whether there is any interest now, in the arguments we made before. If there is, who is interested? I mean, past politeness. I wasn't trying to restart that discussion, because I didn't know what good it could do. At first I was hoping that we could ask whether there was a better way of dealing with disagreements like this. Later it seemed to me that the atmosphere was getting bad, and I wanted to say that because I thought it was important. Thank you for continuing to voice your opinions even when it may feel that the tide is against you. My view is that we only learn from people who disagree with us. Thank you for saying that. I hope that y'all will tell me if I am making it harder for you to disagree, and I am sorry if I did so here. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] float128 / longdouble on PPC - is it broken?
Hi, I just ran into this on a PPC machine: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-4daf949' In [3]: res = np.longdouble(2)**64 In [4]: res Out[4]: 18446744073709551616.0 In [5]: 2**64 Out[5]: 18446744073709551616L In [6]: res-1 Out[6]: 36893488147419103231.0 Same for numpy 1.4.1. I don't have a SPARC to test on but I believe it's the same double-double type? See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion