Re: [Numpy-discussion] Strange PyArray_FromObject() behavior

2012-02-16 Thread Val Kalatsky
Hi Bill,

Looks like you are running a very fresh version of numpy.
Without knowing the build version and what's going on in the extension
module I can't tell you much.
The usual suspects would be:
1) Numpy bug, not too likely.
2) Incorrect use of PyArray_FromObject, you'll need to send more info.
3) Something is seriously corrupted, probably not the case, because
segfault would follow quickly.

Please provide more info.
Val

PS Is it something related to what we'll be working on (Trilinos)?


On Thu, Feb 16, 2012 at 11:09 AM, Spotz, William F wrote:

>  I have a user who is reporting tests that are failing on his platform.
>  I have not been able to reproduce the error on my system, but working with
> him, we have isolated the problem to unexpected results when
> PyArray_FromObject() is called.  Here is the chain of events:
>
>  In python, an integer is calculated.  Specifically, it is
>
>  len(result.errors) + len(result.failures)
>
>  where result is a unit test result object from the unittest module.  I
> had him verify that this value was in fact a python integer.  In my
> extension module, this PyObject gets passed to the PyArray_FromObject()
> function in a routine that comes from numpy.i.  What I expect, and what I
> typically get, is a numpy scalar array of type C long.  I had my user print
> the result using PyObject_Print() and what he got was
>
>  array([0:00:00], dtype=timedelta64[us])
>
>  I am stuck as to why this might be happening.  Any ideas?
>
>  Thanks
>
>  ** Bill Spotz  **
> ** Sandia National Laboratories  Voice: (505)845-0170  **
> ** P.O. Box 5800 Fax:   (505)284-0154  **
> ** Albuquerque, NM 87185-0370Email: wfsp...@sandia.gov **
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi John,

On Thu, Feb 16, 2012 at 8:20 PM, John Hunter  wrote:
>
>
> On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac  wrote:
>>
>> On 2/16/2012 7:22 PM, Matthew Brett wrote:
>> > This has not been an encouraging episode in striving for consensus.
>>
>> I disagree.
>> Failure to reach consensus does not imply lack of striving.
>>
>
> Hey Alan, thanks for your thoughtful and nuanced views.  I agree  with
> everything you've said, but have a few additional points.

I thought I'd looked deep in my heart and failed to find paranoia
about corporate involvement in numpy.

I am happy that Travis formed Continuum and look forward to the
progress we can expect for numpy.

I don't think the conversation was much about 'democracy'.  As far as
I was concerned, anything on the range of "no change but at least
being specific" to "full veto power from mailing list members" was up
for discussion and anything in between.

I wish we had not had to deal with the various red herrings here, such
as whether Continuum is good or bad, whether Travis has been given
adequate credit, or whether companies are bad for software.   But, we
did.  It's fine.  Argument over now.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [EXTERNAL] Re: Strange PyArray_FromObject() behavior

2012-02-16 Thread Bill Spotz
Val,

The problem occurs in function

  PyArrayObject* obj_to_array_allow_conversion(PyObject* input,
   int typecode,
   int* is_new_object)

in numpy.i (which is the numpy SWIG interface file that I authored and is in 
the numpy distribution).  The argument "input" comes in as a python int of 
value 0, "typecode" is NPY_NOTYPE to signify that the type should be detected, 
and "is_new_object" is an output flag.  This function calls

  PyArray_FromObject(input, typecode, 0, 0)

This is, in fact, a part of the PyTrilinos package, specifically the Teuchos 
module (Teuchos is our general tools package).  The context here is the Teuchos 
Comm classes' reduce() method, in this case a summation over processors.  We 
will be working with Tpetra classes that are built on top of a Teuchos Comm 
class.

Thanks,
Bill

On Feb 16, 2012, at 9:37 PM, Val Kalatsky wrote:

> 
> Hi Bill, 
> 
> Looks like you are running a very fresh version of numpy. 
> Without knowing the build version and what's going on in the extension module 
> I can't tell you much.
> The usual suspects would be:
> 1) Numpy bug, not too likely.
> 2) Incorrect use of PyArray_FromObject, you'll need to send more info. 
> 3) Something is seriously corrupted, probably not the case, because segfault 
> would follow quickly. 
> 
> Please provide more info.
> Val
> 
> PS Is it something related to what we'll be working on (Trilinos)?
> 
> 
> On Thu, Feb 16, 2012 at 11:09 AM, Spotz, William F  wrote:
> I have a user who is reporting tests that are failing on his platform.  I 
> have not been able to reproduce the error on my system, but working with him, 
> we have isolated the problem to unexpected results when PyArray_FromObject() 
> is called.  Here is the chain of events:
> 
> In python, an integer is calculated.  Specifically, it is
> 
> len(result.errors) + len(result.failures)
> 
> where result is a unit test result object from the unittest module.  I had 
> him verify that this value was in fact a python integer.  In my extension 
> module, this PyObject gets passed to the PyArray_FromObject() function in a 
> routine that comes from numpy.i.  What I expect, and what I typically get, is 
> a numpy scalar array of type C long.  I had my user print the result using 
> PyObject_Print() and what he got was
> 
> array([0:00:00], dtype=timedelta64[us])
> 
> I am stuck as to why this might be happening.  Any ideas?
> 
> Thanks
> 
> ** Bill Spotz  **
> ** Sandia National Laboratories  Voice: (505)845-0170  **
> ** P.O. Box 5800 Fax:   (505)284-0154  **
> ** Albuquerque, NM 87185-0370Email: wfsp...@sandia.gov **
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Benjamin Root
On Thursday, February 16, 2012, John Hunter wrote:

>
>
> On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac 
> 
> > wrote:
>
>> On 2/16/2012 7:22 PM, Matthew Brett wrote:
>> > This has not been an encouraging episode in striving for consensus.
>>
>> I disagree.
>> Failure to reach consensus does not imply lack of striving.
>>
>>
> Hey Alan, thanks for your thoughtful and nuanced views.  I agree  with
> everything you've said, but have a few additional points.
>
> At the risk of wading into a thread that has grown far too long, and
> echoing Eric's comments that the idea of governance is murky at best
> when there is no provision for enforceability, I have a few comments.
> Full disclosure: Travis has asked me and I have agreed to to serve on
> a board for "numfocus", the not-for-profit arm of his efforts to
> promote numpy and related tools.  Although I have no special numpy
> developer chops, as the original author of matplotlib, which is one of
> the leading "numpy clients", he asked me to join his organization as a
> "community representative".  I support his efforts, and so agreed to
> join the numfocus board.
>
> My first and most important point is that the subtext of many postings here
> about the fear of undue and inappropriate influence of Continuum under
> Travis' leadership is far overblown.  Travis created numpy -- it is
> his baby.  Undeniably, he created it by standing on the shoulders of
> giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and
> many others.  But the idea that we need to guard against the
> possibility that his corporate interests will compromise his interests
> in "what is best for numpy" is academic at best.
>
> As someone who has created a significant project in the realm of
> "scientific computing in Python", I can tell you that it is something
> I take quite a bit of pride in and it is very important to me that the
> project thrives as it was intended to: as a free, open-source,
> best-practice way of doing science.  I know Travis well enough to know
> he feels the same way -- numpy doing well is *at least* important to
> him his company doing well.  All of his recent actions to start a
> company and foundation which focuses resources on numpy and related
> tools reinforce that view.  If he had a different temperament, he
> wouldn't have devoted five to ten years of is life to Numeric, scipy
> and numpy.  He is a BDFL for a reason: he has earned our trust.
>
> And he has proven his ability to lead when *almost everyone* was
> against him.  At the height of the Numeric/numarray split, and I was
> deeply involved in this as the mpl author because we had a "numerix"
> compatibility layer to allow users to use one or the other, Travis
> proposed writing numpy to solve both camp's problems.  I really can't
> remember a single individual who supported him.  What I remember is
> the cacophony of voices who though this was a bad idea, because of the
> "third fork" problem.  But Travis forged ahead, on his own, wrote
> numpy, and re-united the Numeric and numarray camps.  And
> all-the-while he maintained his friendship with the numarray
> developers (Perry Greenfield who led the numarray development effort
> has also been invited by Travis to the numfocus board, as has Fernando
> Perez and Jarrod Millman).  Although MPL at the time agreed to support
> a third version in its numerix compatibility layer for numpy, I can
> thankfully say we have since dropped support for the compatibility
> layer entirely as we all use numpy now.  This to me is the distilled
> essence of leadership, against the voices of the masses, and it bears
> remembering.
>
> I have two more points I want to make: one is on democracy, and one is
> on corporate control.  On corporate control: there have been a number
> of posts in this thread about the worries and dangers that Continuum
> poses as the corporate sponser of numpy development, about how this
> may cause numpy to shift from a model of a few loosely connected,
> decentralized cadre of volunteers to a centrally controlled steering
> committee of programmers who are controlled by corporate headquarters
> and who make all their decisions around the water cooler unobserved by
> the community of users.
>
> I want to make a connection to something that happened in the history
> of matplotlib development, something that is not strictly analogous
> but I think close enough to be informative.  Sometime around 2005,
> Perry Greenfield, who heads the development team of the Space
> Telescope Science Institute (STScI) that is charged with processing
> the Hubble image pipeline, emailed me that he was considering using
> matplotlib as their primary image visualization tool.  I can't tell
> you how excited I was at the time.  The idea of having institutional
> sponsorship from someone as prestigious and resourceful as STScI was
> hugely motivating.  I worked feverishly for months to add stuff they
> needed: better rendering, better image support, matht

Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Travis Oliphant
The OS X slaves (especially PPC) are very valuable for testing.We have an 
intern who could help keep the build-bots going if you would give her access to 
those machines. 

Thanks for being willing to offer them. 

-Travis


On Feb 16, 2012, at 6:36 PM, Matthew Brett wrote:

> Hi,
> 
> On Thu, Feb 16, 2012 at 4:12 PM, Nathaniel Smith  wrote:
>> On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball  wrote:
>>> Buildbot is used by some big projects (e.g. Python, Chromium, and
>>> Mozilla), but I'm aware that several projects in the scientific/numeric
>>> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
>>> often using a hosted Jenkins solution such as Shining Panda. A difficult
>>> part of running a Buildbot service is finding hardware for the slaves
>>> and keeping them alive, so a hosted solution sounds wonderful (assuming
>>> hosted solutions offer an adequate range of operating systems etc).
>> 
>> A quick look at Shining Panda suggests that you get no coverage for
>> anything but Linux, which is a good start but rather limiting. IME by
>> far the most annoying part of a useful buildbot setup is keeping all
>> the build slaves up and working. It's one thing to set up a build
>> environment in one OS, it's quite another to keep like 5 of them
>> working, each on a different volunteered machine where you don't have
>> root and the person who does isn't answering email... the total effort
>> isn't large, but it's really poorly suited to the nature of volunteer
>> labor, because it needs prompt attention at random intervals. (Also,
>> this doesn't become obvious until after one's already gotten
>> everything set up, so then you're stuck limping along because who
>> wants to start over and build something more maintainable...)
>> 
>> If anyone has existing sysadmin resources then keeping build-slaves
>> running is a place where they'd be a huge contribution.
> 
> Yup - keeping the slaves running is the big problem.
> 
> We do have various slaves running here that at least are all
> accessible by me, and Jarrod, and (at a pinch) Fernando, Stefan and
> others.
> 
> These are:
> 
> XP (when I'm not using the machine, which is the large majority of the time)
> OSX 10.5
> OSX 10.4 PPC
> Linux 32 bit
> Linux 64 bit
> 
> These are all real machines not virtual machines.  I'm happy to give
> some reliable person ssh access to the buildslave user on these
> machines.  They won't necessarily be available for all time, they are
> dotted around campus doing various jobs like being gateways, project
> machines, occasional desktops.
> 
> See you,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 10:11 PM, Travis Oliphant  wrote:
> The OS X slaves (especially PPC) are very valuable for testing.    We have an 
> intern who could help keep the build-bots going if you would give her access 
> to those machines.
>
> Thanks for being willing to offer them.

No problem.  The OSX machines should be reliably available.  Please do
put your intern in touch, I'll give her access.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread josef . pktd
On Fri, Feb 17, 2012 at 12:54 AM, Benjamin Root  wrote:
>
>
> On Thursday, February 16, 2012, John Hunter wrote:
>>
>>
>>
>> On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac 
>> wrote:
>>>
>>> On 2/16/2012 7:22 PM, Matthew Brett wrote:
>>> > This has not been an encouraging episode in striving for consensus.
>>>
>>> I disagree.
>>> Failure to reach consensus does not imply lack of striving.
>>>
>>
>> Hey Alan, thanks for your thoughtful and nuanced views.  I agree  with
>> everything you've said, but have a few additional points.
>>
>> At the risk of wading into a thread that has grown far too long, and
>> echoing Eric's comments that the idea of governance is murky at best
>> when there is no provision for enforceability, I have a few comments.
>> Full disclosure: Travis has asked me and I have agreed to to serve on
>> a board for "numfocus", the not-for-profit arm of his efforts to
>> promote numpy and related tools.  Although I have no special numpy
>> developer chops, as the original author of matplotlib, which is one of
>> the leading "numpy clients", he asked me to join his organization as a
>> "community representative".  I support his efforts, and so agreed to
>> join the numfocus board.
>>
>> My first and most important point is that the subtext of many postings
>> here
>> about the fear of undue and inappropriate influence of Continuum under
>> Travis' leadership is far overblown.  Travis created numpy -- it is
>> his baby.  Undeniably, he created it by standing on the shoulders of
>> giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and
>> many others.  But the idea that we need to guard against the
>> possibility that his corporate interests will compromise his interests
>> in "what is best for numpy" is academic at best.
>>
>> As someone who has created a significant project in the realm of
>> "scientific computing in Python", I can tell you that it is something
>> I take quite a bit of pride in and it is very important to me that the
>> project thrives as it was intended to: as a free, open-source,
>> best-practice way of doing science.  I know Travis well enough to know
>> he feels the same way -- numpy doing well is *at least* important to
>> him his company doing well.  All of his recent actions to start a
>> company and foundation which focuses resources on numpy and related
>> tools reinforce that view.  If he had a different temperament, he
>> wouldn't have devoted five to ten years of is life to Numeric, scipy
>> and numpy.  He is a BDFL for a reason: he has earned our trust.
>>
>> And he has proven his ability to lead when *almost everyone* was
>> against him.  At the height of the Numeric/numarray split, and I was
>> deeply involved in this as the mpl author because we had a "numerix"
>> compatibility layer to allow users to use one or the other, Travis
>> proposed writing numpy to solve both camp's problems.  I really can't
>> remember a single individual who supported him.  What I remember is
>> the cacophony of voices who though this was a bad idea, because of the
>> "third fork" problem.  But Travis forged ahead, on his own, wrote
>> numpy, and re-united the Numeric and numarray camps.  And
>> all-the-while he maintained his friendship with the numarray
>> developers (Perry Greenfield who led the numarray development effort
>> has also been invited by Travis to the numfocus board, as has Fernando
>> Perez and Jarrod Millman).  Although MPL at the time agreed to support
>> a third version in its numerix compatibility layer for numpy, I can
>> thankfully say we have since dropped support for the compatibility
>> layer entirely as we all use numpy now.  This to me is the distilled
>> essence of leadership, against the voices of the masses, and it bears
>> remembering.
>>
>> I have two more points I want to make: one is on democracy, and one is
>> on corporate control.  On corporate control: there have been a number
>> of posts in this thread about the worries and dangers that Continuum
>> poses as the corporate sponser of numpy development, about how this
>> may cause numpy to shift from a model of a few loosely connected,
>> decentralized cadre of volunteers to a centrally controlled steering
>> committee of programmers who are controlled by corporate headquarters
>> and who make all their decisions around the water cooler unobserved by
>> the community of users.
>>
>> I want to make a connection to something that happened in the history
>> of matplotlib development, something that is not strictly analogous
>> but I think close enough to be informative.  Sometime around 2005,
>> Perry Greenfield, who heads the development team of the Space
>> Telescope Science Institute (STScI) that is charged with processing
>> the Hubble image pipeline, emailed me that he was considering using
>> matplotlib as their primary image visualization tool.  I can't tell
>> you how excited I was at the time.  The idea of having institutional
>> sponsorship from someone as prestigious and reso

Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Scott Sinclair
On 16 February 2012 17:31, Bruce Southey  wrote:
> On 02/16/2012 08:06 AM, Scott Sinclair wrote:
>> This is not intended to downplay the concerns raised in this thread,
>> but I can't help myself.
>>
>> I propose the following (tongue-in-cheek) patch against the current
>> numpy master branch.
>>
>> https://github.com/scottza/numpy/compare/constitution
>>
>> If this gets enough interest, I'll consider submitting a "real" pull request 
>> ;-)

> Now that is totally disrespectful and just plain ignorant! Not to
> mention the inability to count people correctly.

I'm sorry that you feel that way and apologize if I've offended you. I
didn't expect to and assure you that was not my intention.

That said, I do hope that we can continue to make allowance for (very
occasional) levity in the community.

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange PyArray_FromObject() behavior

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 10:09 AM, Spotz, William F wrote:

>  I have a user who is reporting tests that are failing on his platform.
>  I have not been able to reproduce the error on my system, but working with
> him, we have isolated the problem to unexpected results when
> PyArray_FromObject() is called.  Here is the chain of events:
>
>  In python, an integer is calculated.  Specifically, it is
>
>  len(result.errors) + len(result.failures)
>
>  where result is a unit test result object from the unittest module.  I
> had him verify that this value was in fact a python integer.  In my
> extension module, this PyObject gets passed to the PyArray_FromObject()
> function in a routine that comes from numpy.i.  What I expect, and what I
> typically get, is a numpy scalar array of type C long.  I had my user print
> the result using PyObject_Print() and what he got was
>
>  array([0:00:00], dtype=timedelta64[us])
>
>
That's strange. Is the output always a zero and the type a timedelta64? In
the absence of better info I'd quess a stray pointer or, unlikely, byte
order. The numpy version would be nice to know. If you have an old version
of numpy you could also give it a shot to see what happens.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1

2012-02-16 Thread David Cournapeau
On Tue, Feb 14, 2012 at 6:25 PM, Travis Oliphant  wrote:
>
> On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote:
>
>> Hi Travis,
>>
>> It is great that some resources can be spent to have people paid to
>> work on NumPy. Thank you for making that happen.
>>
>> I am slightly confused about roadmaps for numpy 1.8 and 2.0. This
>> needs discussion on the ML, and our release manager currently is Ralf
>> - he is the one who ultimately decides what goes when.
>
> Thank you for reminding me of this.  Ralf and I spoke several days ago, and 
> have been working on how to give him more time to spend on SciPy full-time.   
> As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will 
> be the release manager again.   Ralf will continue serving as release manager 
> for SciPy.
>
> For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager.   I 
> only know that I won't be release manager past NumPy 1.X.
>
>> I am also not
>> completely comfortable by having a roadmap advertised to Pycon not
>> coming from the community.
>
> This is my bad wording which is a function of being up very late.    At PyCon 
> we will be discussing the roadmap conversations that are taking place on this 
> list.   We won't be presenting anything there related to the NumPy project 
> that has not first been discussed here.

Thanks for clarifying this Travis, that makes it much clearer. Looking
forward to hearing what will be presented at pycon !

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Pauli Virtanen
Hi,

16.02.2012 06:09, josef.p...@gmail.com kirjoitti:
[clip]
> numpy linalg.svd doesn't produce always the same results
> 
> running this gives two different answers,
> using scipy.linalg.svd I always get the same answer, which is one of
> the numpy answers
> (numpy random.multivariate_normal is collateral damage)

Are you using a Windows binary for Numpy compiled with the Intel
compilers, or maybe linked with Intel MKL?

If yes, one possibility is that the exact sequence of floating point
operations in SVD or some other step in the calculation depends on the
data alignment, which can affect rounding error.

See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf

That would explain why the pattern you see is quasi-deterministic. The
other explanation would be using uninitialized memory at some point, but
that seems quite unlikely.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Paul Anton Letnes
> 
> An example I really like is LibreOffice's "get involved" page.
> 
> http://www.libreoffice.org/get-involved/
> 
> Producing something similar for NumPy will take some work, but I believe it's 
> needed.

Speaking as someone who has contributed to numpy in a microscopic fashion, I 
agree completely. I spent quite a few hours digging through the webpages, 
asking for help on the mailing list, reading the Trac, reading git tutorials 
etc. before I managed to do something remotely useful. In general, I think the 
webpage for numpy (and scipy, but let's not discuss that here) would benefit 
from some refurbishing, including the documentation pages. As an example, one 
of the links on the webpage is "Numpy for MATLAB users". I never used matlab 
much, so this is completely irrelevant for me.

I think there should be a discussion about what goes on the front page, and it 
should be as little as possible, but not less than that. Make it easy for 
people to
1) start using numpy
2) reading detailed documentation
3) reporting bugs
4) contributing to numpy
because those are the fundamental things a user/developer wants from an open 
source project. Right now there's Trac, github, numpy.scipy.org, 
http://docs.scipy.org/doc/, the mailing list, and someone mentioned a google 
group discussing something or other. It took me years to figure out how things 
are patched together, and I'm still not sure exactly who reads the Trac 
discussion, github discussion, and mailing list discussions.

tl;dr: Numpy is awesome (TM) but needs a more coherent online presence, and one 
that makes it easy to contribute back to the project.

Thanks for making numpy awesome!

Paul


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Jason Grout
On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
> But in the very end, when agreement can't
> be reached by other means, the developers are the one making the calls.
> (This is simply a consequence that they are the only ones who can
> credibly threaten to fork the project.)

Interesting point.  I hope I'm not pitching a log onto the fire here, 
but in numpy's case, there are very many capable developers on other 
projects who depend on numpy who could credibly threaten a fork if they 
felt numpy was drastically going wrong.

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Perry Greenfield

On Feb 15, 2012, at 6:18 PM, Joe Harrington wrote:
>
>
> Of course, balancing all of this (and our security blanket) is the
> possibility of someone splitting the code if they don't like how
> Continuum runs things.  Perry, you've done that yourself to this  
> code's
> predecessor, so you know the risks.  You did that in response to one
> constituency's moving the code in a direction you didn't like (or not
> moving it in one you did, I don't remember exactly), as in your  
> example
> #2.  So, while progress might be made when that happens, last time it
> hurt astronomers enough that you rolled your own and had to put  
> several
> FTE on the problem.  That split held back adoption of numpy both in  
> the
> astronomy community and outside it, for like 5 years.  Perhaps some
> governance would have saved you the effort and cost and the community
> the grief of the numarray split.  Of course, lots of good eventually
> came from the split.

It wasn't quite like that (hindsight often obscures the perspective at  
the time). At that time, there was a quasi-consensus that Numeric  
needed some sort of rewrite. When we started numarray, it wasn't our  
intent to split the community. That did happen since numarray didn't  
satisfy enough of the community to get them to buy into it. (It's even  
more involved than that, but there is no need to rehash those details).

I'm not sure what to make of the claim the split held back adoption of  
numpy. It only makes sense if you say it held back adoption of Numeric  
in the astronomy community. Numpy wasn't available, and when it was,  
it didn't take nearly that long to get adopted. I'd have to check, but  
I'm pretty sure we switched to using it as quickly as possible once it  
was ready to use.

And I still maintain Numeric wasn't really suitable for our needs.  
Some overhaul was needed, and with that would have been some pain.  
Could it have all gone smoother somehow? In some ideal world, perhaps.  
But maybe numarray was a secret plot to get Travis to do numpy all  
along, and that was the only way to get where we needed to get ;-)

Perry

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Francesc Alted
On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:

> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
>> But in the very end, when agreement can't
>> be reached by other means, the developers are the one making the calls.
>> (This is simply a consequence that they are the only ones who can
>> credibly threaten to fork the project.)
> 
> Interesting point.  I hope I'm not pitching a log onto the fire here, 
> but in numpy's case, there are very many capable developers on other 
> projects who depend on numpy who could credibly threaten a fork if they 
> felt numpy was drastically going wrong.

Jason, that there capable developers out there that are able to fork NumPy (or 
any other project you can realize) is a given.  The point Dag was signaling is 
that this threaten is more probable to happen *inside* the community.

And you pointed out an important aspect too by saying "if they felt numpy was 
drastically going wrong".  It makes me the impression that some people is very 
frightened about something really bad would happen, well before it happens.  
While I agree that this is *possible*, I'd also advocate to give Travis the 
benefit of doubt.  I'm convinced he (and Continuum as a whole) is making things 
happen that will benefit the entire NumPy community; but in case something gets 
really wrong and catastrophic, it is always a relief to know that things can be 
reverted in the pure open source tradition (by either doing a fork, creating a 
new foundation, or even better, proposing a new way to do things).  What it 
does not sound reasonable to me is to allow fear to block Continuum efforts for 
making a better NumPy.  I think it is better to relax a bit, see how things are 
going, and then judge by looking at the *results*.

My two cents,

Disclaimer: As my e-mail address makes clear, I'm a Continuum guy.

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Jason Grout
On 2/16/12 6:23 AM, Francesc Alted wrote:
> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:
>
>> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
>>> But in the very end, when agreement can't be reached by other
>>> means, the developers are the one making the calls. (This is
>>> simply a consequence that they are the only ones who can credibly
>>> threaten to fork the project.)
>>
>> Interesting point.  I hope I'm not pitching a log onto the fire
>> here, but in numpy's case, there are very many capable developers
>> on other projects who depend on numpy who could credibly threaten a
>> fork if they felt numpy was drastically going wrong.
>
> Jason, that there capable developers out there that are able to fork
> NumPy (or any other project you can realize) is a given.  The point
> Dag was signaling is that this threaten is more probable to happen
> *inside* the community.

Sure.  Given numpy's status as a fundamental building block of many 
systems, though, if there was a perceived problem by downstream, it's 
more liable to be forked than most other projects that aren't so close 
to the headwaters.

>
> And you pointed out an important aspect too by saying "if they felt
> numpy was drastically going wrong".  It makes me the impression that
> some people is very frightened about something really bad would
> happen, well before it happens.  While I agree that this is
> *possible*, I'd also advocate to give Travis the benefit of doubt.
> I'm convinced he (and Continuum as a whole) is making things happen
> that will benefit the entire NumPy community; but in case something
> gets really wrong and catastrophic, it is always a relief to know
> that things can be reverted in the pure open source tradition (by
> either doing a fork, creating a new foundation, or even better,
> proposing a new way to do things).  What it does not sound reasonable
> to me is to allow fear to block Continuum efforts for making a better
> NumPy.  I think it is better to relax a bit, see how things are
> going, and then judge by looking at the *results*.

I'm really happy about Continuum.  I agree with Mark that numpy 
certainly could use a few more core developers.  I've not decided on how 
much structure I feel numpy governance needs (nor do I think it's 
particularly important for me to decide how I feel at this point on the 
subject).

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Thomas Kluyver
If I can chime in as a newcomer on this list:

I don't think a conflict of interest is at all likely, but I can see
the point of those saying that it's worth thinking about this while
everything is going well. If any tension does arise, it will be all
but impossible to decide on a fair governance structure, because
everyone will root for the system that looks likely to produce their
favoured outcome.

It strikes me that the effort everyone's put into this thread could
have by now designed some way to resolve disputes. ;-) It could be as
simple as 'so-and-so gets to make the final call', through to
committees, voting systems, etc. So long as everything's going well,
it shouldn't restrict anyone, and it would reassure anyone who does
have concerns (justified or not) about conflicts of interest.

Thanks,
Thomas
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen  wrote:
> Hi,
>
> 16.02.2012 06:09, josef.p...@gmail.com kirjoitti:
> [clip]
>> numpy linalg.svd doesn't produce always the same results
>>
>> running this gives two different answers,
>> using scipy.linalg.svd I always get the same answer, which is one of
>> the numpy answers
>> (numpy random.multivariate_normal is collateral damage)
>
> Are you using a Windows binary for Numpy compiled with the Intel
> compilers, or maybe linked with Intel MKL?

This was with the official numpy installer, compiled with MingW

I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in
several runs I always get the same answer.

>
> If yes, one possibility is that the exact sequence of floating point
> operations in SVD or some other step in the calculation depends on the
> data alignment, which can affect rounding error.
>
> See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
>
> That would explain why the pattern you see is quasi-deterministic. The
> other explanation would be using uninitialized memory at some point, but
> that seems quite unlikely.

Running the script on the commandline I always get several patterns,
but running the script in the same process didn't converge to a unique
pattern.

We had other cases of several patterns in quasi-deterministic linalg
before, but as far as I remember only in the final digits of
precision, where it didn't matter much except for reducing test
precision in my cases.

In the random multivariate normal case in the ticket the differences
are large, which makes them pretty unreliable and useless for
reproducability.

Josef

>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Pauli Virtanen
16.02.2012 14:14, josef.p...@gmail.com kirjoitti:
[clip]
> We had other cases of several patterns in quasi-deterministic linalg
> before, but as far as I remember only in the final digits of
> precision, where it didn't matter much except for reducing test
> precision in my cases.
> 
> In the random multivariate normal case in the ticket the differences
> are large, which makes them pretty unreliable and useless for
> reproducability.

Now that I read your mail more carefully, the following piece of code
indeed does not give reproducible results on Linux with ATLAS either:


import numpy as np
from numpy.linalg import svd

d = 10
alpha = 1 / d**0.5
mu = np.ones(d)
R = alpha * np.ones((d, d)) + (1 - alpha) * np.eye(d)

for i in range(10):
u, s, vH = svd(R)
print vH[-1,1], abs(u.dot(np.diag(s)).dot(vH)-R).max()
print s
---

Of course, the returned SVD decomposition *is* correct in all cases.

The reason seems to be that the matrix has 9 coinciding singular values,
and the (alignment-dependent) rounding error is sufficient to perturb
the choice (or order?) of singular vectors.

So, the algorithm used to generate multivariate normal random numbers is
then actually numerically unstable, as it relies on the order of
singular vectors returned by SVD.

I'm not sure how to fix this. Maybe the vectors returned by SVD should
be sorted if there are numerically close singular values. Just ensuring
alignment of the input probably won't guarantee reproducibility across
platforms.

Please file a bug ticket, so this doesn't get forgotten...

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 8:14 AM,   wrote:
> On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen  wrote:
>> Hi,
>>
>> 16.02.2012 06:09, josef.p...@gmail.com kirjoitti:
>> [clip]
>>> numpy linalg.svd doesn't produce always the same results
>>>
>>> running this gives two different answers,
>>> using scipy.linalg.svd I always get the same answer, which is one of
>>> the numpy answers
>>> (numpy random.multivariate_normal is collateral damage)
>>
>> Are you using a Windows binary for Numpy compiled with the Intel
>> compilers, or maybe linked with Intel MKL?
>
> This was with the official numpy installer, compiled with MingW
>
> I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in
> several runs I always get the same answer.
>
>>
>> If yes, one possibility is that the exact sequence of floating point
>> operations in SVD or some other step in the calculation depends on the
>> data alignment, which can affect rounding error.
>>
>> See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
>>
>> That would explain why the pattern you see is quasi-deterministic. The
>> other explanation would be using uninitialized memory at some point, but
>> that seems quite unlikely.
>
> Running the script on the commandline I always get several patterns,
> but running the script in the same process didn't converge to a unique
> pattern.
>
> We had other cases of several patterns in quasi-deterministic linalg
> before, but as far as I remember only in the final digits of
> precision, where it didn't matter much except for reducing test
> precision in my cases.
>
> In the random multivariate normal case in the ticket the differences
> are large, which makes them pretty unreliable and useless for
> reproducability.

linalg question

Is there anything special, or are there specific numerical problems
with an svd when most singular values are the same ?

The example has all random variables equal correlated

singular values
>>> s
array([ 3.84604989,  0.68377223,  0.68377223,  0.68377223,  0.68377223,
0.68377223,  0.68377223,  0.68377223,  0.68377223,  0.68377223])

Josef

>
> Josef
>
>>
>> --
>> Pauli Virtanen
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 8:45 AM, Pauli Virtanen  wrote:
> 16.02.2012 14:14, josef.p...@gmail.com kirjoitti:
> [clip]
>> We had other cases of several patterns in quasi-deterministic linalg
>> before, but as far as I remember only in the final digits of
>> precision, where it didn't matter much except for reducing test
>> precision in my cases.
>>
>> In the random multivariate normal case in the ticket the differences
>> are large, which makes them pretty unreliable and useless for
>> reproducability.
>
> Now that I read your mail more carefully, the following piece of code
> indeed does not give reproducible results on Linux with ATLAS either:
>
> 
> import numpy as np
> from numpy.linalg import svd
>
> d = 10
> alpha = 1 / d**0.5
> mu = np.ones(d)
> R = alpha * np.ones((d, d)) + (1 - alpha) * np.eye(d)
>
> for i in range(10):
>    u, s, vH = svd(R)
>    print vH[-1,1], abs(u.dot(np.diag(s)).dot(vH)-R).max()
> print s
> ---
>
> Of course, the returned SVD decomposition *is* correct in all cases.
>
> The reason seems to be that the matrix has 9 coinciding singular values,
> and the (alignment-dependent) rounding error is sufficient to perturb
> the choice (or order?) of singular vectors.
>
> So, the algorithm used to generate multivariate normal random numbers is
> then actually numerically unstable, as it relies on the order of
> singular vectors returned by SVD.
>
> I'm not sure how to fix this. Maybe the vectors returned by SVD should
> be sorted if there are numerically close singular values. Just ensuring
> alignment of the input probably won't guarantee reproducibility across
> platforms.
>
> Please file a bug ticket, so this doesn't get forgotten...

the multivariate normal case is already
http://projects.scipy.org/numpy/ticket/1842
I can add the diagnosis.

If I interpret you correctly, this should be a svd ticket, or an svd
ticket as "duplicate" ?

Thanks,

Josef


>
>        Pauli
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Scott Sinclair
On 16 February 2012 15:08, Thomas Kluyver  wrote:
> It strikes me that the effort everyone's put into this thread could
> have by now designed some way to resolve disputes. ;-)

This is not intended to downplay the concerns raised in this thread,
but I can't help myself.

I propose the following (tongue-in-cheek) patch against the current
numpy master branch.

https://github.com/scottza/numpy/compare/constitution

If this gets enough interest, I'll consider submitting a "real" pull request ;-)

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Pauli Virtanen
16.02.2012 14:54, josef.p...@gmail.com kirjoitti:
[clip]
> If I interpret you correctly, this should be a svd ticket, or an svd
> ticket as "duplicate" ?

I think it should be a multivariate normal ticket.

"Fixing" SVD is in my opinion not sensible: its only guarantee is that A
= U S V^H down to numerical precision and S are sorted. If the algorithm
assumes something extra, it is wrong. This sort of reproducibility
issues affect potentially all code (depends on the compiler and
libraries used), and trying to combat it at the linalg level is IMHO not
our business --- if someone really wants it, they should tell their C
compiler and all libraries to use a reproducible FP model.

However, we should ensure the algorithms we provide are stable against
rounding error. In this case, the random number generation is not, so it
should be fixed.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Jason Grout
On 2/16/12 8:06 AM, Scott Sinclair wrote:
> On 16 February 2012 15:08, Thomas Kluyver  wrote:
>> It strikes me that the effort everyone's put into this thread could
>> have by now designed some way to resolve disputes. ;-)
>
> This is not intended to downplay the concerns raised in this thread,
> but I can't help myself.
>
> I propose the following (tongue-in-cheek) patch against the current
> numpy master branch.
>
> https://github.com/scottza/numpy/compare/constitution
>
> If this gets enough interest, I'll consider submitting a "real" pull request 
> ;-)

Time to start submitting lots of 1-line commits and typo fixes to pad my 
karma :).

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Peter Wang
On Feb 16, 2012, at 12:08 AM, Matthew Brett wrote:

>> The question is more about what can possibly be done about it. To really
>> shift power, my hunch is that the only practical way would be to, like
>> Mark said, make sure there are very active non-Continuum-employed
>> developers. But perhaps I'm wrong.
> 
> It's not obvious to me that there isn't a set of guidelines,
> procedures, structures that would help to keep things clear in this
> situation.

Matthew, I think this is the crux of the issue.

There are two kinds of disagreements which could polarize Numpy development: 
disagreements over vision/values, and disagreements over implementation.  The 
latter can be (and has been) resolved in an ad-hoc fashion because we are all 
consenting adults here, and as long as there is a consensus about the shared 
values (i.e. long-term vision) of the project, we can usually work something 
out.

Disagreements over values and long-term vision are the ones that actually do 
split developer communities, and which procedural guidelines are really quite 
poor at resolving.  In the realm of open source software, value differences 
(most commonly, licensing disagreements) generally manifest as forks, 
regardless of what governance may be in place.  At the end of the day, you 
cannot compel people to continue committing to a project that they feel is 
going the *wrong direction*, not merely the right direction in the wrong way.

In the physical world, where we are forced to share geographic space with 
people who may have vastly different values, it is useful to have a framework 
for resolution of value differences, because a fork attempt usually means 
physical warfare.  Hence, constitutions, checks & balances, impeachment 
procedures, etc. are all there to avoid forking.  But with software, forks are 
not so costly, and not always a bad thing.  Numpy itself arose from merging 
Numeric and its fork, Numarray, and X.org and EGCS are examples of big forks of 
major projects which later became the mainline trunk.  In short, even if you 
*could* put governance in place to prevent a fork, that's not always a Good 
Thing.  Creative destruction is vital to the health of any organism or 
ecosystem, because that is how evolution frequently achieves its greatest 
results.

Of course, this is not to say that I have any desire to see Numpy forked.  What 
I *do* desire is a modular, extensible core of Numpy will allow the 
experimentation and creative destruction to occur, while minimizing the merge 
effort when people realize that someone cool has been developed.  Lowering the 
barrier to entry for hacking on the core array code is not merely for 
Continuum's benefit, but rather will benefit the ecosystem as a whole.

No matter how one feels about the potential conflicts of interest, I think we 
can all agree that the alternative of stagnation is far, far worse.  The only 
way to avoid stagnation is to give the hackers and rebels plenty of room to 
play, while ensuring a stable base platform for end users and downstream 
projects to avoid code churn.  Travis's and Mark's roadmap proposals for 
creating a modular core and an extensible C-level ABI are a key technical 
mechanism for achieving this.

Ultimately, procedures and guidelines are only a means to an end, not an ends 
unto themselves.  Curiously enough, I have not yet seen anyone articulate the 
desire for those *ends* themselves to be written down or manifest as a 
document.  Now, if the Numpy developers want to produce a "vision document" or 
"values statement" for the project, I think that would help as a reference 
point for any potential disagreements over the direction of the project as 
commercial stakeholders become involved.  But, of course, the request for such 
a document is itself an unfunded mandate, so it's perfectly possible we may get 
a one-liner like "make Python scientific computing awesome."  :-)


-Peter

Disclaimer: I work with Travis at Continuum.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 9:08 AM, Pauli Virtanen  wrote:
> 16.02.2012 14:54, josef.p...@gmail.com kirjoitti:
> [clip]
>> If I interpret you correctly, this should be a svd ticket, or an svd
>> ticket as "duplicate" ?
>
> I think it should be a multivariate normal ticket.
>
> "Fixing" SVD is in my opinion not sensible: its only guarantee is that A
> = U S V^H down to numerical precision and S are sorted. If the algorithm
> assumes something extra, it is wrong. This sort of reproducibility
> issues affect potentially all code (depends on the compiler and
> libraries used), and trying to combat it at the linalg level is IMHO not
> our business --- if someone really wants it, they should tell their C
> compiler and all libraries to use a reproducible FP model.

I agree, I added the comments to the ticket.

>
> However, we should ensure the algorithms we provide are stable against
> rounding error. In this case, the random number generation is not, so it
> should be fixed.

storing the last column of v

vli = []
for i in range(10):
(u,s,v) = svd(R)
print('v[:,-1]')
print(v[:,-4:])
vli.append(v[:, -1])

>>> np.unique([tuple(vv.tolist()) for vv in vli])
array([[-0.31622777, -0.11785113,  0.08706383,  0.42953906,  0.75736963,
-0.31048693, -0.01693654,  0.10328164, -0.04417299, -0.10540926],
   [-0.31622777, -0.03661979,  0.61237244, -0.15302481,  0.0664198 ,
 0.11341968,  0.38265194,  0.51112292, -0.10540926,  0.25335061]])


The different v are not just a reordering of each other.

If my linear algebra is correct, then the algorithm provides different
basis vectors for the subspace with identical singular values.

I don't see any way to fix multivariate_normal for this case, except
for dropping svd or for random perturbing a covariance matrix with
multiplicity of singular values.

Josef

>
>        Pauli
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Bruce Southey
On 02/16/2012 08:06 AM, Scott Sinclair wrote:
> On 16 February 2012 15:08, Thomas Kluyver  wrote:
>> It strikes me that the effort everyone's put into this thread could
>> have by now designed some way to resolve disputes. ;-)
> This is not intended to downplay the concerns raised in this thread,
> but I can't help myself.
>
> I propose the following (tongue-in-cheek) patch against the current
> numpy master branch.
>
> https://github.com/scottza/numpy/compare/constitution
>
> If this gets enough interest, I'll consider submitting a "real" pull request 
> ;-)
>
> Cheers,
> Scott
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Now that is totally disrespectful and just plain ignorant! Not to 
mention the inability to count people correctly.
Yes, 'you pushed my button' so to speak.
As I understand it, all the pre-git history just contains the 
information of the person who actually committed the change into the 
numpy trunk. It does not hold any information of Numeric and the history 
of numarray so I really question the accuracy of the counting. Also it 
misses many of the 'user' patches that lead to those changes (perhaps 
these user-patches are now in git).

The second aspect is time frame as you do get a very different list if 
you just restrict it to 'current developers' eg adding '--since="1 year 
ago".

It is disrespectful because many of the heated discussions are not about 
code per se but about the design and expected behavior. Counting commits 
or lines will never tell you any of those things.

So I do agree with David's suggestion.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Pierre Haessig

Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :

I don't see any way to fix multivariate_normal for this case, except
for dropping svd or for random perturbing a covariance matrix with
multiplicity of singular values.

Hi,
I just made a quick search in what R guys are doing. It happens there 
are several codes (http://cran.r-project.org/web/views/Multivariate.html 
). For instance, mvtnorm 
(http://cran.r-project.org/web/packages/mvtnorm/index.html). I've 
attached the related function from the source code of this package.


Interestingly enough, it seems they provide 3 different methods (svd, 
eigen values, and Cholesky).
I don't have the time now to dive in the assessments of pros and cons of 
those three. Maybe one works for our problem, but I didn't check yet.


Pierre



# $Id: mvnorm.R 222 2011-01-31 14:02:02Z thothorn $

rmvnorm<-function (n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)),
   method=c("eigen", "svd", "chol"))
{
if (!isSymmetric(sigma, tol = sqrt(.Machine$double.eps), 
 check.attributes = FALSE)) {
stop("sigma must be a symmetric matrix")
}
if (length(mean) != nrow(sigma)) {
stop("mean and sigma have non-conforming size")
}
sigma1 <- sigma
dimnames(sigma1) <- NULL
if(!isTRUE(all.equal(sigma1, t(sigma1{
warning("sigma is numerically not symmetric")
}

method <- match.arg(method)

if(method == "eigen"){
ev <- eigen(sigma, symmetric = TRUE)
if (!all(ev$values >= -sqrt(.Machine$double.eps) * abs(ev$values[1]))){
warning("sigma is numerically not positive definite")
}
retval <- ev$vectors %*%  diag(sqrt(ev$values), 
  length(ev$values)) %*% t(ev$vectors)
}
else if(method == "svd"){
sigsvd <- svd(sigma)
if (!all(sigsvd$d >= -sqrt(.Machine$double.eps) * abs(sigsvd$d[1]))){
warning("sigma is numerically not positive definite")
}
retval <- t(sigsvd$v %*% (t(sigsvd$u) * sqrt(sigsvd$d)))
}
else if(method == "chol"){
retval <- chol(sigma, pivot = TRUE)
o <- order(attr(retval, "pivot"))
retval <- retval[,o]
}

retval <- matrix(rnorm(n * ncol(sigma)), nrow = n) %*%  retval
retval <- sweep(retval, 2, mean, "+")
colnames(retval) <- names(mean)
retval
}

dmvnorm <- function (x, mean, sigma, log=FALSE)
{
if (is.vector(x)) {
x <- matrix(x, ncol = length(x))
}
if (missing(mean)) {
mean <- rep(0, length = ncol(x))
}
if (missing(sigma)) {
sigma <- diag(ncol(x))
}
if (NCOL(x) != NCOL(sigma)) {
stop("x and sigma have non-conforming size")
}
if (!isSymmetric(sigma, tol = sqrt(.Machine$double.eps), 
 check.attributes = FALSE)) {
stop("sigma must be a symmetric matrix")
}
if (length(mean) != NROW(sigma)) {
stop("mean and sigma have non-conforming size")
}
distval <- mahalanobis(x, center = mean, cov = sigma)
logdet <- sum(log(eigen(sigma, symmetric=TRUE,
   only.values=TRUE)$values))
logretval <- -(ncol(x)*log(2*pi) + logdet + distval)/2
if(log) return(logretval)
exp(logretval)
}
  
# file MASS/R/mvrnorm.R
# copyright (C) 1994-2004 W. N. Venables and B. D. Ripley
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 or 3 of the License
#  (at your option).
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  A copy of the GNU General Public License is available at
#  http://www.r-project.org/Licenses/
#
mvrnorm <- function(n = 1, mu, Sigma, tol=1e-6, empirical = FALSE)
{
p <- length(mu)
if(!all(dim(Sigma) == c(p,p))) stop("incompatible arguments")
eS <- eigen(Sigma, symmetric = TRUE, EISPACK = TRUE)
ev <- eS$values
if(!all(ev >= -tol*abs(ev[1L]))) stop("'Sigma' is not positive definite")
X <- matrix(rnorm(p * n), n)
if(empirical) {
X <- scale(X, TRUE, FALSE) # remove means
X <- X %*% svd(X, nu = 0)$v # rotate to PCs
X <- scale(X, FALSE, TRUE) # rescale PCs to unit variance
}
X <- drop(mu) + eS$vectors %*% diag(sqrt(pmax(ev, 0)), p) %*% t(X)
nm <- names(mu)
if(is.null(nm) && !is.null(dn <- dimnames(Sigma))) nm <- dn[[1L]]
dimnames(X) <- list(nm, NULL)
if(n == 1) drop(X) else t(X)
}
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Warren Weckesser
On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig
wrote:

> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>
>  I don't see any way to fix multivariate_normal for this case, except
>> for dropping svd or for random perturbing a covariance matrix with
>> multiplicity of singular values.
>>
> Hi,
> I just made a quick search in what R guys are doing. It happens there are
> several codes 
> (http://cran.r-project.org/**web/views/Multivariate.html).
>  For instance, mvtnorm (
> http://cran.r-project.org/**web/packages/mvtnorm/index.**html).
> I've attached the related function from the source code of this package.
>
> Interestingly enough, it seems they provide 3 different methods (svd,
> eigen values, and Cholesky).
> I don't have the time now to dive in the assessments of pros and cons of
> those three. Maybe one works for our problem, but I didn't check yet.
>
> Pierre
>
>

For some alternatives to numpy's multivariate_normal, see
http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
(Cholesky and eigh) are just a couple lines of code.

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Robert Kern
On Thu, Feb 16, 2012 at 16:12, Pierre Haessig  wrote:
> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>
>> I don't see any way to fix multivariate_normal for this case, except
>> for dropping svd or for random perturbing a covariance matrix with
>> multiplicity of singular values.
>
> Hi,
> I just made a quick search in what R guys are doing. It happens there are
> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
> instance, mvtnorm
> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
> the related function from the source code of this package.
>
> Interestingly enough, it seems they provide 3 different methods (svd, eigen
> values, and Cholesky).
> I don't have the time now to dive in the assessments of pros and cons of
> those three. Maybe one works for our problem, but I didn't check yet.

The main reason I used the SVD variant is because the Cholesky
decomposition failed on some covariance matrices that were nearly not
positive definite (i.e. had a nearly-0 eigenvalue). In the application
that I extracted this code from, this was a valid thing to do; the
deviates just inhabit an infinitely thin subspace of the main space,
but are otherwise multivariate-normally-distributed in that subspace.

I'm not too attached to the semantics. We should check that the
Cholesky decomposition is stable before switching, though. The
eigenvalue algorithm probably suffers from instability just as much as
the SVD one.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
 wrote:
>
>
> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig 
> wrote:
>>
>> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>>
>>> I don't see any way to fix multivariate_normal for this case, except
>>> for dropping svd or for random perturbing a covariance matrix with
>>> multiplicity of singular values.
>>
>> Hi,
>> I just made a quick search in what R guys are doing. It happens there are
>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
>> instance, mvtnorm
>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
>> the related function from the source code of this package.
>>
>> Interestingly enough, it seems they provide 3 different methods (svd,
>> eigen values, and Cholesky).
>> I don't have the time now to dive in the assessments of pros and cons of
>> those three. Maybe one works for our problem, but I didn't check yet.
>>
>> Pierre
>>
>
>
> For some alternatives to numpy's multivariate_normal, see
> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
> (Cholesky and eigh) are just a couple lines of code.

Thanks both,

The main point is that it is a "Needs decision"

Robert argued several times on the mailing list why he chose svd.
(with svd covariance can be closer to singular then with cholesky)

In statsmodels we usually just use Cholesky for similar
transformation, and I use occasionally an eigh version. (I need to
look up the thread but I got puzzled about results with eig and
multiplicity of eigenvalues before.)

The R code is GPL, but the few lines of code look standard without any
special provision for non-deterministic linear algebra.

If multivariate_normal switches from svd to cholesky or eigh, we still
need to check that we don't run into similar "determinacy" problems
with numpy's linalg (I think in statsmodels we use mostly scipy, so I
don't know.)

Josef

>
> Warren
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 11:30 AM,   wrote:
> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
>  wrote:
>>
>>
>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig 
>> wrote:
>>>
>>> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>>>
 I don't see any way to fix multivariate_normal for this case, except
 for dropping svd or for random perturbing a covariance matrix with
 multiplicity of singular values.
>>>
>>> Hi,
>>> I just made a quick search in what R guys are doing. It happens there are
>>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
>>> instance, mvtnorm
>>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
>>> the related function from the source code of this package.
>>>
>>> Interestingly enough, it seems they provide 3 different methods (svd,
>>> eigen values, and Cholesky).
>>> I don't have the time now to dive in the assessments of pros and cons of
>>> those three. Maybe one works for our problem, but I didn't check yet.
>>>
>>> Pierre
>>>
>>
>>
>> For some alternatives to numpy's multivariate_normal, see
>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
>> (Cholesky and eigh) are just a couple lines of code.
>
> Thanks both,
>
> The main point is that it is a "Needs decision"
>
> Robert argued several times on the mailing list why he chose svd.
> (with svd covariance can be closer to singular then with cholesky)
>
> In statsmodels we usually just use Cholesky for similar
> transformation, and I use occasionally an eigh version. (I need to
> look up the thread but I got puzzled about results with eig and
> multiplicity of eigenvalues before.)
>
> The R code is GPL, but the few lines of code look standard without any
> special provision for non-deterministic linear algebra.
>
> If multivariate_normal switches from svd to cholesky or eigh, we still
> need to check that we don't run into similar "determinacy" problems
> with numpy's linalg (I think in statsmodels we use mostly scipy, so I
> don't know.)

np.linalg.eigh always produces the same eigenvectors, both running
repeatedly in the same session and running the script several times on
the command line.

so eigh looks good as alternative to svd for this case, I don't know
if we buy numerical problems in other corner cases, but for near
singularity it's always possible to check the smallest eigenvalue

Josef

>
> Josef
>
>>
>> Warren
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
 wrote:
> If non-contributing users came along on the Cython list demanding that
> we set up a system to select non-developers along on a board that would
> have discussions in order to veto pull requests, I don't know whether
> we'd ignore it or ridicule it or try to show some patience, but we
> certainly wouldn't take it seriously.

I'm not really worried about the Continuum having some nefarious
"corporate" intent. But I am worried about how these plans will affect
numpy, and I think there serious risks if we don't think about
process. Money has a dramatic effect on FOSS development, and not
always in a positive way, even when -- or *especially* when --
everyone has the best of intentions. I'm actually *more* worried about
altruistic full-time developers doing work on behalf of the community
than I am about developers who are working strictly in some company's
interests.

Finding a good design for software is like a nasty optimization
problem -- it's easy to get stuck in local maxima, and any one person
has only an imperfect, noisy estimate of the objective function. So
you need lots of eyes to catch mistakes, filter out the noise, and
explore multiple maxima in parallel.

The classic FOSS model of volunteer developers who are in charge of
project direction does a *great* job of solving this problem. (Linux
beat all the classic Unixen on technical quality, and it did it using
college students and volunteers -- it's not like Sun, IBM, HP etc.
couldn't afford better engineers! But they still lost.) Volunteers are
intimately familiar with the itch they're trying to scratch and the
trade-offs involved in doing so, and they need to work together to
produce anything major, so you get lots of different, high-quality
perspectives to help you figure out which approach is best.

Developers who are working for some corporate interest alter this
balance, because in a "do-ocracy", someone who can throw a few
full-time developers at something suddenly is suddenly has effectively
complete control over project direction. There's no moral problem here
when the "dictator" is benevolent, but suddenly you have an
informational bottleneck -- even benevolent dictators make mistakes,
and they certainly aren't omniscient. Even this isn't *so* bad though,
so long as the corporation is scratching their own itch -- at least
you can be pretty sure that whatever they produce will at least make
them happy, which implies a certain level of utility.

The riskiest case is paying developers to scratch someone else's itch.
IIUC, that's a major goal of Travis's here, to find a way to pay
developers to make numpy better for everyone. But, now you need some
way for the community to figure out what "better" means, because the
developers themselves don't necessarily know. It's not their itch
anymore. Running a poll or whatever might be a nice start, but we all
know how tough it is to extract useful design information from users.
You need a lot more than that if you want to keep the quality up.

Travis's proposal is that we go from a large number of self-selecting
people putting in little bits of time to a small number of designated
people putting in lots of time. There's a major win in terms of total
effort, but you inevitably lose a lot of diversity of viewpoints. My
feeling is it will only be a net win if the new employees put serious,
bend-over-backwards effort into taking advantage of the volunteer
community's wisdom.

This is why the NA discussion seems so relevant to me here -- everyone
involved absolutely had good intentions, excellent skills, etc., and
yet the outcome is still a huge unresolved mess. It was supposed to
make numpy more attractive for a certain set of applications, like
statistical analysis, where R is currently preferred. Instead, there
have been massive changes merged into numpy mainline, but most of the
intended "target market" for these changes is indifferent to them;
they don't solve the problem they're supposed to. And along the way
we've not just spent a bunch of Enthought's money, but also wasted
dozens of hours of volunteer time while seriously alienating some of
numpy's most dedicated advocates in that "target market". We could
debate about blame, and I'm sure there's plenty to spread around, but
I also think the fundamental problem isn't one of blame at all -- it's
that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the
NA functionality is not something they actually need themselves. Which
means they're fighting uphill when trying to find the best solutions,
and haven't managed it yet. And were working on a deadline, to boot.

> It's obvious that one should try for consensus as long as possible,
> including listening to users. But in the very end, when agreement can't
> be reached by other means, the developers are the one making the calls.
> (This is simply a consequence that they are the only ones who can
> credibly threaten to fork the pr

Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 2:08 PM, Pauli Virtanen  wrote:
> 16.02.2012 14:54, josef.p...@gmail.com kirjoitti:
> [clip]
>> If I interpret you correctly, this should be a svd ticket, or an svd
>> ticket as "duplicate" ?
>
> I think it should be a multivariate normal ticket.
>
> "Fixing" SVD is in my opinion not sensible: its only guarantee is that A
> = U S V^H down to numerical precision and S are sorted.

I agree, but the behavior is still surprising -- people reasonably
expect something like svd to be deterministic. So there's probably a
doc bug for alerting people that their reasonable expectation is, in
fact, wrong :-).

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 11:47 AM,   wrote:
> On Thu, Feb 16, 2012 at 11:30 AM,   wrote:
>> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
>>  wrote:
>>>
>>>
>>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig 
>>> wrote:

 Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :

> I don't see any way to fix multivariate_normal for this case, except
> for dropping svd or for random perturbing a covariance matrix with
> multiplicity of singular values.

 Hi,
 I just made a quick search in what R guys are doing. It happens there are
 several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
 instance, mvtnorm
 (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
 the related function from the source code of this package.

 Interestingly enough, it seems they provide 3 different methods (svd,
 eigen values, and Cholesky).
 I don't have the time now to dive in the assessments of pros and cons of
 those three. Maybe one works for our problem, but I didn't check yet.

 Pierre

>>>
>>>
>>> For some alternatives to numpy's multivariate_normal, see
>>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
>>> (Cholesky and eigh) are just a couple lines of code.
>>
>> Thanks both,
>>
>> The main point is that it is a "Needs decision"
>>
>> Robert argued several times on the mailing list why he chose svd.
>> (with svd covariance can be closer to singular then with cholesky)
>>
>> In statsmodels we usually just use Cholesky for similar
>> transformation, and I use occasionally an eigh version. (I need to
>> look up the thread but I got puzzled about results with eig and
>> multiplicity of eigenvalues before.)
>>
>> The R code is GPL, but the few lines of code look standard without any
>> special provision for non-deterministic linear algebra.
>>
>> If multivariate_normal switches from svd to cholesky or eigh, we still
>> need to check that we don't run into similar "determinacy" problems
>> with numpy's linalg (I think in statsmodels we use mostly scipy, so I
>> don't know.)
>
> np.linalg.eigh always produces the same eigenvectors, both running
> repeatedly in the same session and running the script several times on
> the command line.
>
> so eigh looks good as alternative to svd for this case, I don't know
> if we buy numerical problems in other corner cases, but for near
> singularity it's always possible to check the smallest eigenvalue

cholesky is also deterministic in my runs

What I would suggest is to use cholesky first, catch the singular
exception and then use eigh. With eigh we would get perfectly
correlated random variables.

Again if my reading of linalg comments is correct, cholesky is the
fastest way to detect singularity of a matrix, and is faster then eigh
in the non-singular case.

I have no idea if there is an almost singular case, where cholesky
fails, but the current svd would produce a not perfectly correlated
random sample (up to numerical precision).

(Alternative, which I don't think I like so much, is to use a small
Ridge correction (multiply diagonal by 1 + x*nulp ? This would bound
it away from perfect correlation, I guess.)

Josef


>
> Josef
>
>>
>> Josef
>>
>>>
>>> Warren
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Strange PyArray_FromObject() behavior

2012-02-16 Thread Spotz, William F
I have a user who is reporting tests that are failing on his platform.  I have 
not been able to reproduce the error on my system, but working with him, we 
have isolated the problem to unexpected results when PyArray_FromObject() is 
called.  Here is the chain of events:

In python, an integer is calculated.  Specifically, it is

len(result.errors) + len(result.failures)

where result is a unit test result object from the unittest module.  I had him 
verify that this value was in fact a python integer.  In my extension module, 
this PyObject gets passed to the PyArray_FromObject() function in a routine 
that comes from numpy.i.  What I expect, and what I typically get, is a numpy 
scalar array of type C long.  I had my user print the result using 
PyObject_Print() and what he got was

array([0:00:00], dtype=timedelta64[us])

I am stuck as to why this might be happening.  Any ideas?

Thanks

** Bill Spotz  **
** Sandia National Laboratories  Voice: (505)845-0170  **
** P.O. Box 5800 Fax:   (505)284-0154  **
** Albuquerque, NM 87185-0370Email: 
wfsp...@sandia.gov **

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Robert Kern
On Thu, Feb 16, 2012 at 17:07,   wrote:

> cholesky is also deterministic in my runs

We will need to check a variety of builds with different LAPACK
libraries and also different matrix sizes to be sure. Alas!

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Travis Vaught
On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:

> Travis's proposal is that we go from a large number of self-selecting
> people putting in little bits of time to a small number of designated
> people putting in lots of time.

That's not what Travis, or anyone else, proposed.

Travis V.___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 10:07 AM,  wrote:

> On Thu, Feb 16, 2012 at 11:47 AM,   wrote:
> > On Thu, Feb 16, 2012 at 11:30 AM,   wrote:
> >> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
> >>  wrote:
> >>>
> >>>
> >>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig <
> pierre.haes...@crans.org>
> >>> wrote:
> 
>  Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
> 
> > I don't see any way to fix multivariate_normal for this case, except
> > for dropping svd or for random perturbing a covariance matrix with
> > multiplicity of singular values.
> 
>  Hi,
>  I just made a quick search in what R guys are doing. It happens there
> are
>  several codes (http://cran.r-project.org/web/views/Multivariate.html). 
>  For
>  instance, mvtnorm
>  (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've
> attached
>  the related function from the source code of this package.
> 
>  Interestingly enough, it seems they provide 3 different methods (svd,
>  eigen values, and Cholesky).
>  I don't have the time now to dive in the assessments of pros and cons
> of
>  those three. Maybe one works for our problem, but I didn't check yet.
> 
>  Pierre
> 
> >>>
> >>>
> >>> For some alternatives to numpy's multivariate_normal, see
> >>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
> >>> (Cholesky and eigh) are just a couple lines of code.
> >>
> >> Thanks both,
> >>
> >> The main point is that it is a "Needs decision"
> >>
> >> Robert argued several times on the mailing list why he chose svd.
> >> (with svd covariance can be closer to singular then with cholesky)
> >>
> >> In statsmodels we usually just use Cholesky for similar
> >> transformation, and I use occasionally an eigh version. (I need to
> >> look up the thread but I got puzzled about results with eig and
> >> multiplicity of eigenvalues before.)
> >>
> >> The R code is GPL, but the few lines of code look standard without any
> >> special provision for non-deterministic linear algebra.
> >>
> >> If multivariate_normal switches from svd to cholesky or eigh, we still
> >> need to check that we don't run into similar "determinacy" problems
> >> with numpy's linalg (I think in statsmodels we use mostly scipy, so I
> >> don't know.)
> >
> > np.linalg.eigh always produces the same eigenvectors, both running
> > repeatedly in the same session and running the script several times on
> > the command line.
> >
> > so eigh looks good as alternative to svd for this case, I don't know
> > if we buy numerical problems in other corner cases, but for near
> > singularity it's always possible to check the smallest eigenvalue
>
> cholesky is also deterministic in my runs
>
> What I would suggest is to use cholesky first, catch the singular
> exception and then use eigh. With eigh we would get perfectly
> correlated random variables.
>
> Again if my reading of linalg comments is correct, cholesky is the
> fastest way to detect singularity of a matrix, and is faster then eigh
> in the non-singular case.
>
> I have no idea if there is an almost singular case, where cholesky
> fails, but the current svd would produce a not perfectly correlated
> random sample (up to numerical precision).
>
> (Alternative, which I don't think I like so much, is to use a small
> Ridge correction (multiply diagonal by 1 + x*nulp ? This would bound
> it away from perfect correlation, I guess.)
>
>
Cholesky doesn't do any reordering of the matrix, but proceeds downward
factoring row by row, so to speak. It is Gauss elimination without row
pivoting and is only stable when the matrix is positive definite.
Fortunately, failure of positive definiteness shows up an attempt to take
the real square root of a negative number, so is detected.

The problem with svd is that the singular values are always non-negative,
hence the resulting factorization isn't always of the form R^T * R, which
is easily understood because that form is necessarily non-negative definite.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Pauli Virtanen
Hi,

16.02.2012 18:00, Nathaniel Smith kirjoitti:
[clip]
> I agree, but the behavior is still surprising -- people reasonably
> expect something like svd to be deterministic. So there's probably a
> doc bug for alerting people that their reasonable expectation is, in
> fact, wrong :-).

The problem here is that these warnings should in principle appear in
the documentation of every numerical algorithm that contains branches
chosen on the basis of floating point data. For example, optimization
algorithms --- they terminate after a tolerance is satisfied, and so the
results can contain similar quasi-random error much larger than the
rounding error, tol > |err| >> eps.

Floating point sucks, it's full of gotchas for all ages :(

Something like a FAQ could be good place to answer this, alongside more
basic floating point questions.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Chris Barker
On Wed, Feb 15, 2012 at 11:23 AM, Inati, Souheil (NIH/NIMH) [E] > As
great and trustworthy as Travis is, there is a very real
> potential for conflict of interest here. He is going to be leading an
> organization to raise and distribute funding and at the same time > leading a 
> commercial for profit enterprise that would apply to this
> foundation for funds, as well as being a major player in the
> direction of the open source project that his company is building
> on.
>
> This is not in and of itself a problem, but the boundaries have to
> be very clear and laid out in advance.

I disagree here -- a business that contributes to an Open-Source
project is really no different than an individual that contributes --
it (or he or she) contributes because it sees a benefit -- that could
be financial, that could be just for fun, whatever. Sometime
individuals get paid to contribute, sometimes companies do  -- why is
there a difference?

To be personal about it -- Continuum writing a bunch of numpy code
will be no different than when Travis personally on his own time wrote
bunch of numpy code -- and I think we all agree that the project and
the community is very grateful he did that when he did.

If anyone (company or individual) goes off on their own and writes a
bunch of code that community doesn't embrace, then we have a fork --
sometimes that for the better, but for the most part, I think everyone
involved does not want to see that  happen -- and I think there is a
general consensus that a more formal governing stucture is a good
idea, in part to prevent that.

HOwever -- it's still an open source project -- no onde (or
institution) can tell anyone else what to do or how to do it, the the
project will be moved forward by those that actually do stuff:
  - write core code
  - write supporting code
  - document stuff
  - test stuff
  - package stuff
  - contribute tech support on the list
  - contribute to the conversion about development issues
  - ...

and yes -- actually getting around to forming a foundation, or
securing funding, or other institutional activities.

So while it may seem like a small group of people kind of went off on
their own to form a foundation -- that's the only way things ever get
done on an open-source project! There may or may not be a lot of
discussion about something first, but it only gets down when someone
sits down and does it.

So Bravo for moving the project forward!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 10:20 AM, Pauli Virtanen  wrote:

> Hi,
>
> 16.02.2012 18:00, Nathaniel Smith kirjoitti:
> [clip]
> > I agree, but the behavior is still surprising -- people reasonably
> > expect something like svd to be deterministic. So there's probably a
> > doc bug for alerting people that their reasonable expectation is, in
> > fact, wrong :-).
>
> The problem here is that these warnings should in principle appear in
> the documentation of every numerical algorithm that contains branches
> chosen on the basis of floating point data. For example, optimization
> algorithms --- they terminate after a tolerance is satisfied, and so the
> results can contain similar quasi-random error much larger than the
> rounding error, tol > |err| >> eps.
>
> Floating point sucks, it's full of gotchas for all ages :(
>
>
:). I believe that was one of the reasons that Von Neumann thought everyone
should work with integers and scaling factors. But he lost that battle to
the sheer convenience of floating point.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Chris Barker
On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett
> Personally, I
> would say that making the founder of a company, which is > working to
> make money from Numpy, the only decision maker on numpy -
> is - scary.

not to me:
 -- power always goes to those that actually write the code
 -- as far as I can recall, there has never been a large group of
folks contributing to the core code

so have a company being the primary contributor and decision maker is
no different that what we've always had -- particularly when Travis
pretty much single-handedly re-factored Numeric to give us numpy.

Oh -- one difference -- having a company with more that one coder
means more code!

If continuum does indeed develop a bloated, ugly mess to meet their
client's needs -- none of us have to use it.

-Chris





> But maybe it's the best way.   But, again, we're all high-functioning
> sensible people, I'm sure it's possible for us to formulate what the
> risks are, what the potential solutions are, and come up with the best
> - maybe short-term - solution,
>
> See you,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith  wrote:

> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
>  wrote:
> > If non-contributing users came along on the Cython list demanding that
> > we set up a system to select non-developers along on a board that would
> > have discussions in order to veto pull requests, I don't know whether
> > we'd ignore it or ridicule it or try to show some patience, but we
> > certainly wouldn't take it seriously.
>
> I'm not really worried about the Continuum having some nefarious
> "corporate" intent. But I am worried about how these plans will affect
> numpy, and I think there serious risks if we don't think about
> process. Money has a dramatic effect on FOSS development, and not
> always in a positive way, even when -- or *especially* when --
> everyone has the best of intentions. I'm actually *more* worried about
> altruistic full-time developers doing work on behalf of the community
> than I am about developers who are working strictly in some company's
> interests.
>
> Finding a good design for software is like a nasty optimization
> problem -- it's easy to get stuck in local maxima, and any one person
> has only an imperfect, noisy estimate of the objective function. So
> you need lots of eyes to catch mistakes, filter out the noise, and
> explore multiple maxima in parallel.
>
> The classic FOSS model of volunteer developers who are in charge of
> project direction does a *great* job of solving this problem. (Linux
> beat all the classic Unixen on technical quality, and it did it using
> college students and volunteers -- it's not like Sun, IBM, HP etc.
> couldn't afford better engineers! But they still lost.) Volunteers are
> intimately familiar with the itch they're trying to scratch and the
> trade-offs involved in doing so, and they need to work together to
> produce anything major, so you get lots of different, high-quality
> perspectives to help you figure out which approach is best.
>
>
Linux is probably a bad choice as example here. Right up to about 2002
Linus was pretty much the only entry point into mainline as he applied all
the patches by hand and reviewed all of them. This of course slowed Linux
development considerably. I also had the opportunity to fix up some of the
drivers for my own machine and can testify that the code quality of the
patches was mixed. Now, of course, with 1 or more patches going in
during the open period of each development cycle, Linus relies on
lieutenants to handle the subsystems, but he can be damn scathing when he
takes an interest in some code and doesn't like what he sees. And he *can*
be scathing, not just because he started the whole thing, but because he is
darn good and the other developers respect that. But my point here is that
Linus pretty much shapes Linux.


Developers who are working for some corporate interest alter this
> balance, because in a "do-ocracy", someone who can throw a few
> full-time developers at something suddenly is suddenly has effectively
> complete control over project direction. There's no moral problem here
> when the "dictator" is benevolent, but suddenly you have an
> informational bottleneck -- even benevolent dictators make mistakes,
> and they certainly aren't omniscient. Even this isn't *so* bad though,
> so long as the corporation is scratching their own itch -- at least
> you can be pretty sure that whatever they produce will at least make
> them happy, which implies a certain level of utility.
>
>
Linus deals with this by saying, fork, fork, fork. Of course the gpl makes
that a more viable response.


> The riskiest case is paying developers to scratch someone else's itch.
> IIUC, that's a major goal of Travis's here, to find a way to pay
> developers to make numpy better for everyone. But, now you need some
> way for the community to figure out what "better" means, because the
> developers themselves don't necessarily know. It's not their itch
> anymore. Running a poll or whatever might be a nice start, but we all
> know how tough it is to extract useful design information from users.
> You need a lot more than that if you want to keep the quality up.
>
> Travis's proposal is that we go from a large number of self-selecting
> people putting in little bits of time to a small number of designated
> people putting in lots of time. There's a major win in terms of total
> effort, but you inevitably lose a lot of diversity of viewpoints. My
> feeling is it will only be a net win if the new employees put serious,
> bend-over-backwards effort into taking advantage of the volunteer
> community's wisdom.
>
> This is why the NA discussion seems so relevant to me here -- everyone
> involved absolutely had good intentions, excellent skills, etc., and
> yet the outcome is still a huge unresolved mess. It was supposed to
> make numpy more attractive for a certain set of applications, like
> statistical analysis, where R is currently preferred. Instead, there
> have been massive

Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Gael Varoquaux
On Thu, Feb 16, 2012 at 05:00:29PM +, Nathaniel Smith wrote:
> I agree, but the behavior is still surprising -- people reasonably
> expect something like svd to be deterministic.

People are wrong then. Trust me, I work enough with ill-conditionned
problems, including SVDs, to know that the algorithms are not
deterministic. You can improve them by controlling the random starting
point, but in many case it is not enough. Decreasing the tolerance on the
algorithm may help (I don't know if we can control that with the lapack
interface), but at the cost of a lot of computing time.

G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 12:53 PM, Charles R Harris
 wrote:
>
>
> On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith  wrote:
>>
>> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
>>  wrote:
>> > If non-contributing users came along on the Cython list demanding that
>> > we set up a system to select non-developers along on a board that would
>> > have discussions in order to veto pull requests, I don't know whether
>> > we'd ignore it or ridicule it or try to show some patience, but we
>> > certainly wouldn't take it seriously.
>>
>> I'm not really worried about the Continuum having some nefarious
>> "corporate" intent. But I am worried about how these plans will affect
>> numpy, and I think there serious risks if we don't think about
>> process. Money has a dramatic effect on FOSS development, and not
>> always in a positive way, even when -- or *especially* when --
>> everyone has the best of intentions. I'm actually *more* worried about
>> altruistic full-time developers doing work on behalf of the community
>> than I am about developers who are working strictly in some company's
>> interests.
>>
>> Finding a good design for software is like a nasty optimization
>> problem -- it's easy to get stuck in local maxima, and any one person
>> has only an imperfect, noisy estimate of the objective function. So
>> you need lots of eyes to catch mistakes, filter out the noise, and
>> explore multiple maxima in parallel.
>>
>> The classic FOSS model of volunteer developers who are in charge of
>> project direction does a *great* job of solving this problem. (Linux
>> beat all the classic Unixen on technical quality, and it did it using
>> college students and volunteers -- it's not like Sun, IBM, HP etc.
>> couldn't afford better engineers! But they still lost.) Volunteers are
>> intimately familiar with the itch they're trying to scratch and the
>> trade-offs involved in doing so, and they need to work together to
>> produce anything major, so you get lots of different, high-quality
>> perspectives to help you figure out which approach is best.
>>
>
> Linux is probably a bad choice as example here. Right up to about 2002 Linus
> was pretty much the only entry point into mainline as he applied all the
> patches by hand and reviewed all of them. This of course slowed Linux
> development considerably. I also had the opportunity to fix up some of the
> drivers for my own machine and can testify that the code quality of the
> patches was mixed. Now, of course, with 1 or more patches going in
> during the open period of each development cycle, Linus relies on
> lieutenants to handle the subsystems, but he can be damn scathing when he
> takes an interest in some code and doesn't like what he sees. And he *can*
> be scathing, not just because he started the whole thing, but because he is
> darn good and the other developers respect that. But my point here is that
> Linus pretty much shapes Linux.
>
>
>> Developers who are working for some corporate interest alter this
>> balance, because in a "do-ocracy", someone who can throw a few
>> full-time developers at something suddenly is suddenly has effectively
>> complete control over project direction. There's no moral problem here
>> when the "dictator" is benevolent, but suddenly you have an
>> informational bottleneck -- even benevolent dictators make mistakes,
>> and they certainly aren't omniscient. Even this isn't *so* bad though,
>> so long as the corporation is scratching their own itch -- at least
>> you can be pretty sure that whatever they produce will at least make
>> them happy, which implies a certain level of utility.
>>
>
> Linus deals with this by saying, fork, fork, fork. Of course the gpl makes
> that a more viable response.
>
>>
>> The riskiest case is paying developers to scratch someone else's itch.
>> IIUC, that's a major goal of Travis's here, to find a way to pay
>> developers to make numpy better for everyone. But, now you need some
>> way for the community to figure out what "better" means, because the
>> developers themselves don't necessarily know. It's not their itch
>> anymore. Running a poll or whatever might be a nice start, but we all
>> know how tough it is to extract useful design information from users.
>> You need a lot more than that if you want to keep the quality up.
>>
>> Travis's proposal is that we go from a large number of self-selecting
>> people putting in little bits of time to a small number of designated
>> people putting in lots of time. There's a major win in terms of total
>> effort, but you inevitably lose a lot of diversity of viewpoints. My
>> feeling is it will only be a net win if the new employees put serious,
>> bend-over-backwards effort into taking advantage of the volunteer
>> community's wisdom.
>>
>> This is why the NA discussion seems so relevant to me here -- everyone
>> involved absolutely had good intentions, excellent skills, etc., and
>> yet the outcome is still a huge unresolved mess. It was supp

Re: [Numpy-discussion] Migrating issues to GitHub

2012-02-16 Thread Ralf Gommers
On Wed, Feb 15, 2012 at 12:11 PM, Thouis (Ray) Jones wrote:

> On Sat, Feb 11, 2012 at 21:54, Fernando Perez 
> wrote:
> > On Sat, Feb 11, 2012 at 12:36 PM, Pauli Virtanen  wrote:
> >> The lack of attachments is the main problem with this transition. It's
> >> not so seldom that numerical input data or scripts demonstrating an
> >> issue come useful. This is probably less of an issue for Numpy than for
> >> Scipy, though.
> >
> > We've taken to using gist for scripts/data and free image hosting
> > sites for screenshots, using
> >
> > http://cellprofiler.org/issues/
> example issue: https://github.com/CellProfiler/CellProfiler/issues/260
>
> It was pretty simple to put together (a few hours by one developer).
> The need for a github account keeps it from being spammed.  If that's
> too much of a bar, I expect it could have a branch-cut in behavior:
> witha  github account, the report goes straight into the Github
> issues, otherwise, it gets queued for review and mail sent to some
> (hopefully low-traffic) list.
>
> Ray Jones
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 11:09 AM,  wrote:

> On Thu, Feb 16, 2012 at 12:53 PM, Charles R Harris
>  wrote:
> >
> >
> > On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith  wrote:
> >>
> >> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
> >>  wrote:
> >> > If non-contributing users came along on the Cython list demanding that
> >> > we set up a system to select non-developers along on a board that
> would
> >> > have discussions in order to veto pull requests, I don't know whether
> >> > we'd ignore it or ridicule it or try to show some patience, but we
> >> > certainly wouldn't take it seriously.
> >>
> >> I'm not really worried about the Continuum having some nefarious
> >> "corporate" intent. But I am worried about how these plans will affect
> >> numpy, and I think there serious risks if we don't think about
> >> process. Money has a dramatic effect on FOSS development, and not
> >> always in a positive way, even when -- or *especially* when --
> >> everyone has the best of intentions. I'm actually *more* worried about
> >> altruistic full-time developers doing work on behalf of the community
> >> than I am about developers who are working strictly in some company's
> >> interests.
> >>
> >> Finding a good design for software is like a nasty optimization
> >> problem -- it's easy to get stuck in local maxima, and any one person
> >> has only an imperfect, noisy estimate of the objective function. So
> >> you need lots of eyes to catch mistakes, filter out the noise, and
> >> explore multiple maxima in parallel.
> >>
> >> The classic FOSS model of volunteer developers who are in charge of
> >> project direction does a *great* job of solving this problem. (Linux
> >> beat all the classic Unixen on technical quality, and it did it using
> >> college students and volunteers -- it's not like Sun, IBM, HP etc.
> >> couldn't afford better engineers! But they still lost.) Volunteers are
> >> intimately familiar with the itch they're trying to scratch and the
> >> trade-offs involved in doing so, and they need to work together to
> >> produce anything major, so you get lots of different, high-quality
> >> perspectives to help you figure out which approach is best.
> >>
> >
> > Linux is probably a bad choice as example here. Right up to about 2002
> Linus
> > was pretty much the only entry point into mainline as he applied all the
> > patches by hand and reviewed all of them. This of course slowed Linux
> > development considerably. I also had the opportunity to fix up some of
> the
> > drivers for my own machine and can testify that the code quality of the
> > patches was mixed. Now, of course, with 1 or more patches going in
> > during the open period of each development cycle, Linus relies on
> > lieutenants to handle the subsystems, but he can be damn scathing when he
> > takes an interest in some code and doesn't like what he sees. And he
> *can*
> > be scathing, not just because he started the whole thing, but because he
> is
> > darn good and the other developers respect that. But my point here is
> that
> > Linus pretty much shapes Linux.
> >
> >
> >> Developers who are working for some corporate interest alter this
> >> balance, because in a "do-ocracy", someone who can throw a few
> >> full-time developers at something suddenly is suddenly has effectively
> >> complete control over project direction. There's no moral problem here
> >> when the "dictator" is benevolent, but suddenly you have an
> >> informational bottleneck -- even benevolent dictators make mistakes,
> >> and they certainly aren't omniscient. Even this isn't *so* bad though,
> >> so long as the corporation is scratching their own itch -- at least
> >> you can be pretty sure that whatever they produce will at least make
> >> them happy, which implies a certain level of utility.
> >>
> >
> > Linus deals with this by saying, fork, fork, fork. Of course the gpl
> makes
> > that a more viable response.
> >
> >>
> >> The riskiest case is paying developers to scratch someone else's itch.
> >> IIUC, that's a major goal of Travis's here, to find a way to pay
> >> developers to make numpy better for everyone. But, now you need some
> >> way for the community to figure out what "better" means, because the
> >> developers themselves don't necessarily know. It's not their itch
> >> anymore. Running a poll or whatever might be a nice start, but we all
> >> know how tough it is to extract useful design information from users.
> >> You need a lot more than that if you want to keep the quality up.
> >>
> >> Travis's proposal is that we go from a large number of self-selecting
> >> people putting in little bits of time to a small number of designated
> >> people putting in lots of time. There's a major win in terms of total
> >> effort, but you inevitably lose a lot of diversity of viewpoints. My
> >> feeling is it will only be a net win if the new employees put serious,
> >> bend-over-backwards effort into taking advantage of the volunteer
> >> c

Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted  wrote:
> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:
>
>> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
>>> But in the very end, when agreement can't
>>> be reached by other means, the developers are the one making the calls.
>>> (This is simply a consequence that they are the only ones who can
>>> credibly threaten to fork the project.)
>>
>> Interesting point.  I hope I'm not pitching a log onto the fire here,
>> but in numpy's case, there are very many capable developers on other
>> projects who depend on numpy who could credibly threaten a fork if they
>> felt numpy was drastically going wrong.
>
> Jason, that there capable developers out there that are able to fork NumPy 
> (or any other project you can realize) is a given.  The point Dag was 
> signaling is that this threaten is more probable to happen *inside* the 
> community.
>
> And you pointed out an important aspect too by saying "if they felt numpy was 
> drastically going wrong".  It makes me the impression that some people is 
> very frightened about something really bad would happen, well before it 
> happens.  While I agree that this is *possible*, I'd also advocate to give 
> Travis the benefit of doubt.  I'm convinced he (and Continuum as a whole) is 
> making things happen that will benefit the entire NumPy community; but in 
> case something gets really wrong and catastrophic, it is always a relief to 
> know that things can be reverted in the pure open source tradition (by either 
> doing a fork, creating a new foundation, or even better, proposing a new way 
> to do things).  What it does not sound reasonable to me is to allow fear to 
> block Continuum efforts for making a better NumPy.  I think it is better to 
> relax a bit, see how things are going, and then judge by looking at the 
> *results*.

I'm finding this conversation a bit frustrating.

The question on the table as I understand it, is just the following:

Is there any governance structure / procedure / set of guidelines that
would help ensure the long-term health of the numpy project?

The subtext of your response is that you regard *any structure at all*
as damaging to the numpy effort and in particular, as damaging to the
efforts of Continuum.  It seems to me that is a very extreme point of
view, and I think, honestly, it is not tenable.

But surely - surely - the best thing to do here is to formulate
something that might be acceptable, and for everyone to say what they
think the problems would be.  Do you agree?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] test errors on deprecation/runtime warnings

2012-02-16 Thread Ralf Gommers
Hi,

Last week we merged https://github.com/numpy/numpy/pull/201, which causes
DeprecationWarning's and RuntimeWarning's to be converted to errors if they
occur when running the test suite. The purpose of that is to make sure that
code that still uses other deprecated code (or code that for some reason
generates warnings) to be cleaned up. In principle this is a good idea
IMHO, but after merging we quickly found some problems with failing tests
in scipy.

Because this potentially affects any other projects or users that use the
numpy NoseTester test runner, here's a proposal to deal with this issue:
- make this behavior configurable by a keyword in `NoseTester.__init__()`
- default to raising an error in numpy master
- when making a branch for release, immediately set the default to not
raise. Do this not only for the 1.7 release, but for any future release (at
least until the oldest numpy version still in use has the keyword). The
reason for this is that otherwise a new numpy release will trigger test
failures in older releases of scipy or other packages.

The pros:
- We find issues that otherwise get ignored (see the issues Christoph just
found when compiling with MSVC).
- We're forced to clean up code when submitting PRs, instead of letting the
warnings accumulate and having to deal with them just before release time.

The con:
- You may see test errors if you run a released version of scipy with a
development version of numpy.

Opinions? Concerns?

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Ralf Gommers
On Thu, Feb 16, 2012 at 8:03 PM, Matthew Brett wrote:

> Hi,
>
> On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted 
> wrote:
> > On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:
> >
> >> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
> >>> But in the very end, when agreement can't
> >>> be reached by other means, the developers are the one making the calls.
> >>> (This is simply a consequence that they are the only ones who can
> >>> credibly threaten to fork the project.)
> >>
> >> Interesting point.  I hope I'm not pitching a log onto the fire here,
> >> but in numpy's case, there are very many capable developers on other
> >> projects who depend on numpy who could credibly threaten a fork if they
> >> felt numpy was drastically going wrong.
> >
> > Jason, that there capable developers out there that are able to fork
> NumPy (or any other project you can realize) is a given.  The point Dag was
> signaling is that this threaten is more probable to happen *inside* the
> community.
> >
> > And you pointed out an important aspect too by saying "if they felt
> numpy was drastically going wrong".  It makes me the impression that some
> people is very frightened about something really bad would happen, well
> before it happens.  While I agree that this is *possible*, I'd also
> advocate to give Travis the benefit of doubt.  I'm convinced he (and
> Continuum as a whole) is making things happen that will benefit the entire
> NumPy community; but in case something gets really wrong and catastrophic,
> it is always a relief to know that things can be reverted in the pure open
> source tradition (by either doing a fork, creating a new foundation, or
> even better, proposing a new way to do things).  What it does not sound
> reasonable to me is to allow fear to block Continuum efforts for making a
> better NumPy.  I think it is better to relax a bit, see how things are
> going, and then judge by looking at the *results*.
>
> I'm finding this conversation a bit frustrating.
>
> The question on the table as I understand it, is just the following:
>
> Is there any governance structure / procedure / set of guidelines that
> would help ensure the long-term health of the numpy project?
>
> The subtext of your response is that you regard *any structure at all*
> as damaging to the numpy effort and in particular, as damaging to the
> efforts of Continuum.  It seems to me that is a very extreme point of
> view, and I think, honestly, it is not tenable.
>

That's not exactly how I'd interpret Peter's answer.

>
> But surely - surely - the best thing to do here is to formulate
> something that might be acceptable, and for everyone to say what they
> think the problems would be.  Do you agree?
>
>
David has made a concrete proposal for a procedure. It looks to me like
that's an appropriate and adequate safeguard against Continuum pushing
things into Numpy. Would that be enough for you? If not, would it at least
be a good start?

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Chris Barker
On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett  wrote:
> But surely - surely - the best thing to do here is to formulate
> something that might be acceptable, and for everyone to say what they
> think the problems would be.  Do you agree?

Absolutely -- but just like anything else in open source -- nothing
gets done because people think it should get done -- it gets done
because someone sits down and does it.

having a governance structure is not my itch -- I'm not going to
scratch it -- is someone itchy enough to scratch? if so, do so --
while we all continue to talk about what color the bicycle shed behind
the foundation offices should be ;-)

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Nathaniel Smith
On Wed, Feb 15, 2012 at 7:46 PM, Benjamin Root  wrote:
> Why not the NA discussion?  Would we really want to have that happen again?
> Note that it still isn't fully resolved and progress still needs to be made
> (I think the last thread did an excellent job of fleshing out the ideas, but
> it became too much to digest.  We may need to have someone go through the
> information, reduce it down and make one last push to bring it to a
> conclusion).

BTW, this is still on my todo list -- sorry for dropping the ball
here. Perhaps once I find a flat here in Edinburgh.

> The NA discussion is the perfect example where a governance
> structure would help resolve disputes.

I think the important question is, in an ideal world, what would have
been done to help resolve this dispute? My best idea was to try and
organize a document articulating points of consensus -- I'm not sure
what sort of governance structure would have helped with that. A
committee with an odd number of members is good at voting on things,
but would a vote have helped? I dunno, I'm not saying it wouldn't --
just that it's something we might want to think about before we start
writing bylaws.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Christopher Jordan-Squire
On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett  wrote:
> Hi,
>
> On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted  wrote:
>> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:
>>
>>> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
 But in the very end, when agreement can't
 be reached by other means, the developers are the one making the calls.
 (This is simply a consequence that they are the only ones who can
 credibly threaten to fork the project.)
>>>
>>> Interesting point.  I hope I'm not pitching a log onto the fire here,
>>> but in numpy's case, there are very many capable developers on other
>>> projects who depend on numpy who could credibly threaten a fork if they
>>> felt numpy was drastically going wrong.
>>
>> Jason, that there capable developers out there that are able to fork NumPy 
>> (or any other project you can realize) is a given.  The point Dag was 
>> signaling is that this threaten is more probable to happen *inside* the 
>> community.
>>
>> And you pointed out an important aspect too by saying "if they felt numpy 
>> was drastically going wrong".  It makes me the impression that some people 
>> is very frightened about something really bad would happen, well before it 
>> happens.  While I agree that this is *possible*, I'd also advocate to give 
>> Travis the benefit of doubt.  I'm convinced he (and Continuum as a whole) is 
>> making things happen that will benefit the entire NumPy community; but in 
>> case something gets really wrong and catastrophic, it is always a relief to 
>> know that things can be reverted in the pure open source tradition (by 
>> either doing a fork, creating a new foundation, or even better, proposing a 
>> new way to do things).  What it does not sound reasonable to me is to allow 
>> fear to block Continuum efforts for making a better NumPy.  I think it is 
>> better to relax a bit, see how things are going, and then judge by looking 
>> at the *results*.
>
> I'm finding this conversation a bit frustrating.
>
> The question on the table as I understand it, is just the following:
>
> Is there any governance structure / procedure / set of guidelines that
> would help ensure the long-term health of the numpy project?
>
> The subtext of your response is that you regard *any structure at all*
> as damaging to the numpy effort and in particular, as damaging to the
> efforts of Continuum.  It seems to me that is a very extreme point of
> view, and I think, honestly, it is not tenable.
>
> But surely - surely - the best thing to do here is to formulate
> something that might be acceptable, and for everyone to say what they
> think the problems would be.  Do you agree?
>

Perhaps I'm mistaken, but I think the subtext has more been that
worrying about potential problems--which aren't yet actual
problems--isn't terribly productive. Particularly when the people
involved are smart, invested in the success of the broader numpy
package, and very deserving of the benefit of the doubt.

Also, as Ralf said, David made a concrete proposal. What are your
comments on his proposal?

-Chris JS


> Best,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught  wrote:
> On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
>
>> Travis's proposal is that we go from a large number of self-selecting
>> people putting in little bits of time to a small number of designated
>> people putting in lots of time.
>
>
> That's not what Travis, or anyone else, proposed.

Maybe I was unclear -- all I mean here is that if we suddenly have a
few people working full-time on numpy (as Travis proposed), then that
will cause two things:
  -- a massive increase in the total number of person-hours going into numpy
  -- a smaller group of people will be responsible for a much larger
proportion of those person-hours
(and this is leaving aside the other ways that it can be difficult for
full-time developers and volunteers to interact -- the volunteers
aren't in the office, the full-timers may not have the patience to
wait for a long email-paced conversation before making a decision,
etc.)

I think Travis' proposal is potentially a great thing, but it's not as
simple as just saying "hey we hired some people now our software will
be better". Ask Fred Brooks ;-)

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Benjamin Root
On Thu, Feb 16, 2012 at 2:13 PM, Nathaniel Smith  wrote:

> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught  wrote:
> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
> >
> >> Travis's proposal is that we go from a large number of self-selecting
> >> people putting in little bits of time to a small number of designated
> >> people putting in lots of time.
> >
> >
> > That's not what Travis, or anyone else, proposed.
>
> Maybe I was unclear -- all I mean here is that if we suddenly have a
> few people working full-time on numpy (as Travis proposed), then that
> will cause two things:
>  -- a massive increase in the total number of person-hours going into numpy
>  -- a smaller group of people will be responsible for a much larger
> proportion of those person-hours
> (and this is leaving aside the other ways that it can be difficult for
> full-time developers and volunteers to interact -- the volunteers
> aren't in the office, the full-timers may not have the patience to
> wait for a long email-paced conversation before making a decision,
> etc.)
>
> I think Travis' proposal is potentially a great thing, but it's not as
> simple as just saying "hey we hired some people now our software will
> be better". Ask Fred Brooks ;-)
>
> -- Nathaniel
>

Just a thought I had.

>From the perspective of any company, they do not want to devote developer
resources to an open-source project if the features are going to get
rejected (either by the core-devs or by community backlash).  Maybe the
governance structure could be more along the lines of an advise/consent
process for NEPs.  This way, a company puts together a plan of action for
some features and submits it to the central body (however that is
defined).  Comments and revisions are done.  Finally, if the plan is
approved, the company can feel confident that their efforts and resources
won't get rejected after committing to the changes.

Small changes and bugfixes are not effected by this.  Large changes need
planning and commentary anyway.  This allows some official representation
of the community to have some sort of light-handed control over the vision
of the project.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith  wrote:

> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught  wrote:
> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
> >
> >> Travis's proposal is that we go from a large number of self-selecting
> >> people putting in little bits of time to a small number of designated
> >> people putting in lots of time.
> >
> >
> > That's not what Travis, or anyone else, proposed.
>
> Maybe I was unclear -- all I mean here is that if we suddenly have a
> few people working full-time on numpy (as Travis proposed), then that
> will cause two things:
>  -- a massive increase in the total number of person-hours going into numpy
>  -- a smaller group of people will be responsible for a much larger
> proportion of those person-hours
> (and this is leaving aside the other ways that it can be difficult for
> full-time developers and volunteers to interact -- the volunteers
> aren't in the office, the full-timers may not have the patience to
> wait for a long email-paced conversation before making a decision,
> etc.)
>
> I think Travis' proposal is potentially a great thing, but it's not as
> simple as just saying "hey we hired some people now our software will
> be better". Ask Fred Brooks ;-)
>
>
What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't
OS/360.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 5:20 PM, Pauli Virtanen  wrote:
> Hi,
>
> 16.02.2012 18:00, Nathaniel Smith kirjoitti:
> [clip]
>> I agree, but the behavior is still surprising -- people reasonably
>> expect something like svd to be deterministic. So there's probably a
>> doc bug for alerting people that their reasonable expectation is, in
>> fact, wrong :-).
>
> The problem here is that these warnings should in principle appear in
> the documentation of every numerical algorithm that contains branches
> chosen on the basis of floating point data. For example, optimization
> algorithms --- they terminate after a tolerance is satisfied, and so the
> results can contain similar quasi-random error much larger than the
> rounding error, tol > |err| >> eps.
>
> Floating point sucks, it's full of gotchas for all ages :(

Yes, and maybe I'm just projecting my own particular naivete... I'm
very familiar with numerical stability and rounding as issues, and of
course optimization-based algorithms have the issue you raise. I'm
still surprised to learn that on a single machine, with bit-identical
inputs, using a mature low-level routine like svd, you can get
*qualitatively* different results depending on memory alignment. (I
wouldn't expect dense SVD to use a fixed tolerance optimization
routine either!)

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris
 wrote:
>
>
> On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith  wrote:
>>
>> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught  wrote:
>> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
>> >
>> >> Travis's proposal is that we go from a large number of self-selecting
>> >> people putting in little bits of time to a small number of designated
>> >> people putting in lots of time.
>> >
>> >
>> > That's not what Travis, or anyone else, proposed.
>>
>> Maybe I was unclear -- all I mean here is that if we suddenly have a
>> few people working full-time on numpy (as Travis proposed), then that
>> will cause two things:
>>  -- a massive increase in the total number of person-hours going into
>> numpy
>>  -- a smaller group of people will be responsible for a much larger
>> proportion of those person-hours
>> (and this is leaving aside the other ways that it can be difficult for
>> full-time developers and volunteers to interact -- the volunteers
>> aren't in the office, the full-timers may not have the patience to
>> wait for a long email-paced conversation before making a decision,
>> etc.)
>>
>> I think Travis' proposal is potentially a great thing, but it's not as
>> simple as just saying "hey we hired some people now our software will
>> be better". Ask Fred Brooks ;-)
>>
>
> What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't
> OS/360.

For the general idea that you can't just translate person-hours of
effort into results? Yes, though do note the winky emoticon, which is
used to indicate that a statement is somewhat tongue in cheek ;-).

Do you have any thoughts on the actual content of my concerns? Do you
agree that there's a risk that in Travis's plan, you'll be losing out
on valuable input from non-core-contributors who are nonetheless
experts in particular areas?

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 11:54 AM, Christopher Jordan-Squire
 wrote:
> On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett  
> wrote:
>> Hi,
>>
>> On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted  
>> wrote:
>>> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote:
>>>
 On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote:
> But in the very end, when agreement can't
> be reached by other means, the developers are the one making the calls.
> (This is simply a consequence that they are the only ones who can
> credibly threaten to fork the project.)

 Interesting point.  I hope I'm not pitching a log onto the fire here,
 but in numpy's case, there are very many capable developers on other
 projects who depend on numpy who could credibly threaten a fork if they
 felt numpy was drastically going wrong.
>>>
>>> Jason, that there capable developers out there that are able to fork NumPy 
>>> (or any other project you can realize) is a given.  The point Dag was 
>>> signaling is that this threaten is more probable to happen *inside* the 
>>> community.
>>>
>>> And you pointed out an important aspect too by saying "if they felt numpy 
>>> was drastically going wrong".  It makes me the impression that some people 
>>> is very frightened about something really bad would happen, well before it 
>>> happens.  While I agree that this is *possible*, I'd also advocate to give 
>>> Travis the benefit of doubt.  I'm convinced he (and Continuum as a whole) 
>>> is making things happen that will benefit the entire NumPy community; but 
>>> in case something gets really wrong and catastrophic, it is always a relief 
>>> to know that things can be reverted in the pure open source tradition (by 
>>> either doing a fork, creating a new foundation, or even better, proposing a 
>>> new way to do things).  What it does not sound reasonable to me is to allow 
>>> fear to block Continuum efforts for making a better NumPy.  I think it is 
>>> better to relax a bit, see how things are going, and then judge by looking 
>>> at the *results*.
>>
>> I'm finding this conversation a bit frustrating.
>>
>> The question on the table as I understand it, is just the following:
>>
>> Is there any governance structure / procedure / set of guidelines that
>> would help ensure the long-term health of the numpy project?
>>
>> The subtext of your response is that you regard *any structure at all*
>> as damaging to the numpy effort and in particular, as damaging to the
>> efforts of Continuum.  It seems to me that is a very extreme point of
>> view, and I think, honestly, it is not tenable.
>>
>> But surely - surely - the best thing to do here is to formulate
>> something that might be acceptable, and for everyone to say what they
>> think the problems would be.  Do you agree?
>>
>
> Perhaps I'm mistaken, but I think the subtext has more been that
> worrying about potential problems--which aren't yet actual
> problems--isn't terribly productive. Particularly when the people
> involved are smart, invested in the success of the broader numpy
> package, and very deserving of the benefit of the doubt.

OK - that is one point of view.  I'll state the most extreme version thus:

"There is no possible benefit to thinking of a governance structure
before problems arise".

That seems to me to be untenable.  As others have pointed out, the
kind of problems that might arise are fairly obvious, these kinds of
things have been thought about before, and designing a solution to
these problems after they have arisen may be considerably harder than
doing it before.

Here's another version of the argument:

"It is not possible to imagine a governance structure that would be
better than the current one".

That does seem to me to be extreme, and untenable until various
schemes have been seriously considered.

A more reasonable version of the same argument might be:

"The costs of working on an a governance structure are greater than
the potential benefits".

Is that defensible?  I don't think so.  But if it is, what are the
costs, exactly?

> Also, as Ralf said, David made a concrete proposal. What are your
> comments on his proposal?

Right - and so did Alan Isaac some time ago, which I responded to, and
got no reply.

I think - as Benjamin has said previously, we first have to establish
that *any* governance structure is worth discussing.  Including the
current one.  I'm hearing a lot of "no" to that.

Do you think that any governance structure is worth discussing?

If so, which do you prefer?

For David's proposal - I think it is likely to be impractical because,
at the moment, almost all agreement is informal and sometimes
off-list.  Given that situation, it would take an enormous amount of
balls to reject a Continuum pull-request.   If Continuum started to
lose interest in that mechanism, if I was them, I'd make the pull
requests unmanageably large and therefore impossible to review.  It
would require one person to be the gatekeeper for the 

Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Christopher Jordan-Squire
On Thu, Feb 16, 2012 at 12:45 PM, Nathaniel Smith  wrote:
> On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris
>  wrote:
>>
>>
>> On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith  wrote:
>>>
>>> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught  wrote:
>>> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
>>> >
>>> >> Travis's proposal is that we go from a large number of self-selecting
>>> >> people putting in little bits of time to a small number of designated
>>> >> people putting in lots of time.
>>> >
>>> >
>>> > That's not what Travis, or anyone else, proposed.
>>>
>>> Maybe I was unclear -- all I mean here is that if we suddenly have a
>>> few people working full-time on numpy (as Travis proposed), then that
>>> will cause two things:
>>>  -- a massive increase in the total number of person-hours going into
>>> numpy
>>>  -- a smaller group of people will be responsible for a much larger
>>> proportion of those person-hours
>>> (and this is leaving aside the other ways that it can be difficult for
>>> full-time developers and volunteers to interact -- the volunteers
>>> aren't in the office, the full-timers may not have the patience to
>>> wait for a long email-paced conversation before making a decision,
>>> etc.)
>>>
>>> I think Travis' proposal is potentially a great thing, but it's not as
>>> simple as just saying "hey we hired some people now our software will
>>> be better". Ask Fred Brooks ;-)
>>>
>>
>> What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't
>> OS/360.
>
> For the general idea that you can't just translate person-hours of
> effort into results? Yes, though do note the winky emoticon, which is
> used to indicate that a statement is somewhat tongue in cheek ;-).
>
> Do you have any thoughts on the actual content of my concerns? Do you
> agree that there's a risk that in Travis's plan, you'll be losing out
> on valuable input from non-core-contributors who are nonetheless
> experts in particular areas?
>

I'm not really sure how. All the developers involved are attentive
enough to make announcements of pull requests and requests for
comments on proposed changes. So if there's expert opinion to be had
easily, i.e. through the mailing list, then I can only imagine they'd
go out and get it.

This also jibes with Benjamin Root's comment. Major changes will be
discussed anyways. So I'm not sure how this particular objection is
relevant.

-Chris

> -- Nathaniel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 1:45 PM, Nathaniel Smith  wrote:

> On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris
>  wrote:
> >
> >
> > On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith  wrote:
> >>
> >> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught 
> wrote:
> >> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
> >> >
> >> >> Travis's proposal is that we go from a large number of self-selecting
> >> >> people putting in little bits of time to a small number of designated
> >> >> people putting in lots of time.
> >> >
> >> >
> >> > That's not what Travis, or anyone else, proposed.
> >>
> >> Maybe I was unclear -- all I mean here is that if we suddenly have a
> >> few people working full-time on numpy (as Travis proposed), then that
> >> will cause two things:
> >>  -- a massive increase in the total number of person-hours going into
> >> numpy
> >>  -- a smaller group of people will be responsible for a much larger
> >> proportion of those person-hours
> >> (and this is leaving aside the other ways that it can be difficult for
> >> full-time developers and volunteers to interact -- the volunteers
> >> aren't in the office, the full-timers may not have the patience to
> >> wait for a long email-paced conversation before making a decision,
> >> etc.)
> >>
> >> I think Travis' proposal is potentially a great thing, but it's not as
> >> simple as just saying "hey we hired some people now our software will
> >> be better". Ask Fred Brooks ;-)
> >>
> >
> > What, you are invoking Fred Brooks for a team of, maybe, four? Numpy
> ain't
> > OS/360.
>
> For the general idea that you can't just translate person-hours of
> effort into results? Yes, though do note the winky emoticon, which is
> used to indicate that a statement is somewhat tongue in cheek ;-).
>
> Do you have any thoughts on the actual content of my concerns? Do you
> agree that there's a risk that in Travis's plan, you'll be losing out
> on valuable input from non-core-contributors who are nonetheless
> experts in particular areas?
>

I'd be more concerned if I saw more input from non-core-contributors. The
sticky issues I see are more along the lines of

1) Trademarking Numpy (TM), which probably needs doing, but who holds the
trademark?

2) Distribution of money, accounting, and maybe meeting minutes. If
donations are targeted to specific uses, that probably isn't a problem.
Advertizing income could be in issue, though. I don't know how much
transparency is required by 501(c), probably not much judging by the
organizations that have that status.

I think Mark's proposal to revisit the issue if/when the number of core
contributors reaches maybe 5 is a good one. But in order to attract that
many developers long term requires making the code more attractive and
laying out a direction. I hope that the initial work along that line is
soon published on the list, the sooner the better IMHO.

It's not difficult to become a core developer at this point, apart from the
non-trivial task of understanding the code and wanting to scratch an itch,
since we are pretty desperate for developers. That is to say, the barriers
are technical, not social.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Travis Oliphant
This has been a clarifying discussion for some people.   I'm glad people are 
speaking up.   I believe in the value of consensus and the value of users 
opinions.I want to make sure that people who use NumPy and haven't yet 
learned how to contribute, feel like they have a voice.   I have always been 
very open about adding people to the lists that I have influence over and 
giving people permissions to contribute even when they disagree with me.   I 
recognize the value of multiple points of view. 

That is why in addition to creating the company (with a goal to allow at least 
some people to spend their day-job working on NumPy), I've pushed to organize a 
Foundation whose essential mission is to make sure that the core tools used for 
Python in Science stay open, maintained, and available. I will work very 
hard to do all I can to make these ventures successful. I had thought I 
would be able to spend more time on NumPy and SciPy over the past 4 years.   
This did not work out --- which is why I made a career change.All I can 
point to is my previous work and say thank you to all who have done so much for 
the communities I have been able to participate in.   

I believe in the power of community development, but I also believe in the 
power of directed development towards solving people's problems in an open 
market where people can choose to either interact with the provider or find 
another supplier.Having two organizations that I support helps me direct my 
energies towards both of those values.  I resonate with Linus' s individual 
leanings.   I'm not a big fan of design-by-committee as I haven't seen it be 
very successful in creating new technologies.   It is pretty good at enforcing 
the status-quo.  If I felt like that is what NumPy needed I would be fine with 
it.

However, I feel that NumPy is going to be surpassed with other solutions if 
steps are not taken to improve the code-base *and* add new features.   I'm very 
interested in discussions about how this work is to be accomplished.I'm 
with Mark that I believe this discussion will be more useful in 6 months when 
we have made it easier for more people to get involved with core code 
development.   

At the end of the day it is about people and what they spend there time doing.  
 Whatever I do, inside or outside the community, people are free to accept or 
reject.I can only promise to do my best.It's all I ask of everyone I 
work with. 

It is gratifying to see that NumPy has become a well-used project and that 
there are significant numbers of stake-holders who want to see the project 
continue to succeed and be useful for them.My goal with the Foundation, 
with the Company and with my professional life is to see that the growth of 
Python in Science, Technology, and Data Analysis continues and even 
accelerates. 

My view right now is similar to Mark's in that we don't have enough core 
developers.Charles and Ralf and Pauli and David before them have done an 
amazing job at "pruning, cleaning, and maintaining what is there".
Obviously, Jim, Perry, Todd, Rick, Konrad, David A., and Paul Dubois have had 
significant impact before them.NumPy has always been a community project, 
but it needs some energy put into it.   As someone who is intimately familiar 
with the code-base (having worked on Numeric as early as 1998 as well as been 
part of many discussions about Scientific Computing on Python),  I'm trying to 
infuse that energy as best I can.  NumPy has a chance to be far more than 
it is.   There are people using inferior solutions because of missing features 
in NumPy and the lack of awareness of how to use NumPy. There are other 
important use-cases that NumPy is an "almost-there" solution for.As it 
solves these problems, even more users will come to our community, and 
 there needs to be a way to hear their voice as well. 

Just for the record, I don't believe the NA discussion has been finalized.   In 
fact, the NA discussion this summer was one of the factors that led to my 
decision to put myself back into NumPy development full time --- I just had to 
figure out how to do it in a way that my family could accept.I think I 
could have contributed to that discussion as someone who understands both the 
code and how it is and has been used.  

For the next 6-12 months, I am comfortable taking the "benevolent dictator 
role".   During that time, I hope we can find many more core developers and 
then re-visit the discussion.  My view is that design decisions should be a 
consensus based on current contributors to the code base and major users.   To 
continue to be relevant, NumPy has to serve it's customers.   They are the ones 
who will have the final say.   If others feel like they can do better, a fork 
is an option.  I don't want that to happen, but it is the only effective and 
practical "governance" structure that exists in my mind outside of the 
self

Re: [Numpy-discussion] Migrating issues to GitHub

2012-02-16 Thread Thouis (Ray) Jones
On Thu, Feb 16, 2012 at 19:25, Ralf Gommers  wrote:
> In another thread Jira was proposed as an alternative to Trac. Can you point
> out some of its strengths and weaknesses, and tell us why you decided to
> move away from it?

The two primary reasons were that our Jira server was behind a
firewall and we wanted to open it up, and the integration with github,
where we were moving our source.

My own impression is that Jira is much more complicated.  It was nice
that it was integrated with Fisheye and some reporting tools, but I
found them so complicated to deal with that I usually didn't go beyond
"show me my bugs", some bulk bug editing, and adding users to
projects.  As a group, we had difficulties keeping track of how we
were indicating priority and planned work, even with wiki pages to
tell us what we intended the different labels to mean.  Jira's
integration with other tools (Fisheye, Crucible) was useful in some
ways, but in no way critical.  There were all kinds of reports (LOC,
bug count, etc.) that one could get from these, but nothing that
couldn't be created with pylab and a free hour or two.

I like github's issues for their simplicity and the http-based API.
We miss having direct attachements, but we have a workaround.  It
would be nice if the github issues page were more customizable, but
with the API, a motivated group could create whatever frontend they
wanted.

Github's issues remind me of python, Jira reminded me of Java.  I
guess Jira would be more suited to a large developments effort with
multiple groups of programmers, which we were not.  Moving bugs from
Jira to github wasn't too bad (we dropped most of the metadata, except
for our current/next/future label for which release fixes would go
into).  I think it would be easier to move from github to Jira,
primarily because github has fewer possible bits of metadata on each
bug.

As I said, I avoided using Jira for anything really complicated, so
perhaps I just needed to spend more time with it.  My opinion should
probably not be given undue weight.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Migrating issues to GitHub

2012-02-16 Thread Ralf Gommers
On Thu, Feb 16, 2012 at 10:20 PM, Thouis (Ray) Jones wrote:

> On Thu, Feb 16, 2012 at 19:25, Ralf Gommers 
> wrote:
> > In another thread Jira was proposed as an alternative to Trac. Can you
> point
> > out some of its strengths and weaknesses, and tell us why you decided to
> > move away from it?
>
> 




> Jira reminded me of Java.
>

OK, you convinced me:)

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi,

Just for my own sake, can I clarify what you are saying here?

On Thu, Feb 16, 2012 at 1:11 PM, Travis Oliphant  wrote:
>  I'm not a big fan of design-by-committee as I haven't seen it be very 
> successful in creating new technologies.   It is pretty good at enforcing the 
> status-quo.  If I felt like that is what NumPy needed I would be fine with it.

Was it your impression that what was being proposed, was design by committee?

> However, I feel that NumPy is going to be surpassed with other solutions if 
> steps are not taken to improve the code-base *and* add new features.

As far as you are concerned, is there any controversy about that?

> For the next 6-12 months, I am comfortable taking the "benevolent dictator 
> role".   During that time, I hope we can find many more core developers and 
> then re-visit the discussion.  My view is that design decisions should be a 
> consensus based on current contributors to the code base and major users.   
> To continue to be relevant, NumPy has to serve it's customers.   They are the 
> ones who will have the final say.   If others feel like they can do better, a 
> fork is an option.  I don't want that to happen, but it is the only effective 
> and practical "governance" structure that exists in my mind outside of the 
> self-governance of the people that participate.

To confirm, you are saying that you can imagine no improvement in the
current governance structure?

> No organizational structure can make up for the lack of great people putting 
> their hearts and efforts into a great cause.

But you agree that there might be an organizational structure that
would make this harder or easier?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread Travis Oliphant
Mark Wiebe and I have been discussing off and on (as well as talking with 
Charles) a good way forward to balance two competing desires: 

* addition of new features that are needed in NumPy
* improving the code-base generally and moving towards a more 
maintainable NumPy

I know there are load voices for just focusing on the second of these and 
avoiding the first until we have finished that.  I recognize the need to 
improve the code base, but I will also be pushing for improvements to the 
feature-set and user experience in the process. 

As a result, I am proposing a rough outline for releases over the next year: 

* NumPy 1.7 to come out as soon as the serious bugs can be eliminated.  
Bryan, Francesc, Mark, and I are able to help triage some of those.

* NumPy 1.8 to come out in July which will have as many ABI-compatible 
feature enhancements as we can add while improving test coverage and code 
cleanup.   I will post to this list more details of what we plan to address 
with it later.Included for possible inclusion are: 
* resolving the NA/missing-data issues
* finishing group-by
* incorporating the start of label arrays
* incorporating a meta-object 
* a few new dtypes (variable-length string, varialbe-length unicode and 
an enum type)
* adding ufunc support for flexible dtypes and possibly structured 
arrays
* allowing generalized ufuncs to work on more kinds of arrays besides 
just contiguous
* improving the ability for NumPy to receive JIT-generated function 
pointers for ufuncs and other calculation opportunities
* adding "filters" to Input and Output
* simple computed fields for dtypes
* accepting a Data-Type specification as a class or JSON file
* work towards improving the dtype-addition mechanism
* re-factoring of code so that it can compile with a C++ compiler and 
be minimally dependent on Python data-structures.  

* NumPy 2.0 to come out in January of 2013.   Mark Wiebe and I will 
post to this list a document that explains some of it's proposed features and 
enhancements.I won't steal his thunder for some of the things he is working 
on. 

If there are code issues people would like to see addressed, it would be a 
great time to speak up and/or propose something that you would like to see. 

In general NumPy 1.8 will have new features that need to be explored in order 
that NumPy 2.0 has enough code "experience" in order to be as useful as 
possible.   I recognize that NumPy 1.8 has quite a few proposed features.   
These have been building up and are the big reason I've committed so many 
resources to NumPy.   The feature-list did not just come out of my head.   They 
are the result of talking and interacting with many NumPy users and watching 
the code get used (and not used) in the real world.This will be a faster 
pace of development.   But, all of this will be in the open.If the NumPy 
2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order 
to allow features to come out. 

Thanks,

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread Warren Weckesser
On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant wrote:

> Mark Wiebe and I have been discussing off and on (as well as talking with
> Charles) a good way forward to balance two competing desires:
>
>* addition of new features that are needed in NumPy
>* improving the code-base generally and moving towards a more
> maintainable NumPy
>
> I know there are load voices for just focusing on the second of these and
> avoiding the first until we have finished that.  I recognize the need to
> improve the code base, but I will also be pushing for improvements to the
> feature-set and user experience in the process.
>
> As a result, I am proposing a rough outline for releases over the next
> year:
>
>* NumPy 1.7 to come out as soon as the serious bugs can be
> eliminated.  Bryan, Francesc, Mark, and I are able to help triage some of
> those.
>
>* NumPy 1.8 to come out in July which will have as many
> ABI-compatible feature enhancements as we can add while improving test
> coverage and code cleanup.   I will post to this list more details of what
> we plan to address with it later.Included for possible inclusion are:
>* resolving the NA/missing-data issues
>* finishing group-by
>* incorporating the start of label arrays
>* incorporating a meta-object
>* a few new dtypes (variable-length string, varialbe-length unicode
> and an enum type)
>* adding ufunc support for flexible dtypes and possibly structured
> arrays
>* allowing generalized ufuncs to work on more kinds of arrays
> besides just contiguous
>* improving the ability for NumPy to receive JIT-generated function
> pointers for ufuncs and other calculation opportunities
>* adding "filters" to Input and Output
>* simple computed fields for dtypes
>* accepting a Data-Type specification as a class or JSON file
>* work towards improving the dtype-addition mechanism
>* re-factoring of code so that it can compile with a C++ compiler
> and be minimally dependent on Python data-structures.
>
>* NumPy 2.0 to come out in January of 2013.   Mark Wiebe and I will
> post to this list a document that explains some of it's proposed features
> and enhancements.I won't steal his thunder for some of the things he is
> working on.
>
> If there are code issues people would like to see addressed, it would be a
> great time to speak up and/or propose something that you would like to see.
>


The above list looks great.  Another request that comes up occasionally on
the mailing list is for the efficient computation of order statistics, the
simplest case being a combined min/max function.  Longish thread starts
here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

Warren



> In general NumPy 1.8 will have new features that need to be explored in
> order that NumPy 2.0 has enough code "experience" in order to be as useful
> as possible.   I recognize that NumPy 1.8 has quite a few proposed
> features.   These have been building up and are the big reason I've
> committed so many resources to NumPy.   The feature-list did not just come
> out of my head.   They are the result of talking and interacting with many
> NumPy users and watching the code get used (and not used) in the real
> world.This will be a faster pace of development.   But, all of this
> will be in the open.If the NumPy 2.0 schedule is too aggressive, then
> we will have a NumPy 1.9 release in order to allow features to come out.
>
> Thanks,
>
> -Travis
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser
 wrote:
>
>
> On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant 
> wrote:
>>
>> Mark Wiebe and I have been discussing off and on (as well as talking with
>> Charles) a good way forward to balance two competing desires:
>>
>>        * addition of new features that are needed in NumPy
>>        * improving the code-base generally and moving towards a more
>> maintainable NumPy
>>
>> I know there are load voices for just focusing on the second of these and
>> avoiding the first until we have finished that.  I recognize the need to
>> improve the code base, but I will also be pushing for improvements to the
>> feature-set and user experience in the process.
>>
>> As a result, I am proposing a rough outline for releases over the next
>> year:
>>
>>        * NumPy 1.7 to come out as soon as the serious bugs can be
>> eliminated.  Bryan, Francesc, Mark, and I are able to help triage some of
>> those.
>>
>>        * NumPy 1.8 to come out in July which will have as many
>> ABI-compatible feature enhancements as we can add while improving test
>> coverage and code cleanup.   I will post to this list more details of what
>> we plan to address with it later.    Included for possible inclusion are:
>>        * resolving the NA/missing-data issues
>>        * finishing group-by
>>        * incorporating the start of label arrays
>>        * incorporating a meta-object
>>        * a few new dtypes (variable-length string, varialbe-length unicode
>> and an enum type)
>>        * adding ufunc support for flexible dtypes and possibly structured
>> arrays
>>        * allowing generalized ufuncs to work on more kinds of arrays
>> besides just contiguous
>>        * improving the ability for NumPy to receive JIT-generated function
>> pointers for ufuncs and other calculation opportunities
>>        * adding "filters" to Input and Output
>>        * simple computed fields for dtypes
>>        * accepting a Data-Type specification as a class or JSON file
>>        * work towards improving the dtype-addition mechanism
>>        * re-factoring of code so that it can compile with a C++ compiler
>> and be minimally dependent on Python data-structures.
>>
>>        * NumPy 2.0 to come out in January of 2013.   Mark Wiebe and I will
>> post to this list a document that explains some of it's proposed features
>> and enhancements.    I won't steal his thunder for some of the things he is
>> working on.
>>
>> If there are code issues people would like to see addressed, it would be a
>> great time to speak up and/or propose something that you would like to see.
>
>
>
> The above list looks great.  Another request that comes up occasionally on
> the mailing list is for the efficient computation of order statistics, the
> simplest case being a combined min/max function.  Longish thread starts
> here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/

The list looks great, but for the time table I expect there will be at
least a 1.9 and 1.10 necessary to improve what "we didn't get quite
right in the first place", or what not many users had time to try out.

Josef

>
> Warren
>
>
>>
>> In general NumPy 1.8 will have new features that need to be explored in
>> order that NumPy 2.0 has enough code "experience" in order to be as useful
>> as possible.   I recognize that NumPy 1.8 has quite a few proposed features.
>>   These have been building up and are the big reason I've committed so many
>> resources to NumPy.   The feature-list did not just come out of my head.
>> They are the result of talking and interacting with many NumPy users and
>> watching the code get used (and not used) in the real world.    This will be
>> a faster pace of development.   But, all of this will be in the open.    If
>> the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9
>> release in order to allow features to come out.
>>
>> Thanks,
>>
>> -Travis
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread Charles R Harris
On Thu, Feb 16, 2012 at 4:20 PM,  wrote:

> On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser
>  wrote:
> >
> >
> > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant 
> > wrote:
> >>
> >> Mark Wiebe and I have been discussing off and on (as well as talking
> with
> >> Charles) a good way forward to balance two competing desires:
> >>
> >>* addition of new features that are needed in NumPy
> >>* improving the code-base generally and moving towards a more
> >> maintainable NumPy
> >>
> >> I know there are load voices for just focusing on the second of these
> and
> >> avoiding the first until we have finished that.  I recognize the need to
> >> improve the code base, but I will also be pushing for improvements to
> the
> >> feature-set and user experience in the process.
> >>
> >> As a result, I am proposing a rough outline for releases over the next
> >> year:
> >>
> >>* NumPy 1.7 to come out as soon as the serious bugs can be
> >> eliminated.  Bryan, Francesc, Mark, and I are able to help triage some
> of
> >> those.
> >>
> >>* NumPy 1.8 to come out in July which will have as many
> >> ABI-compatible feature enhancements as we can add while improving test
> >> coverage and code cleanup.   I will post to this list more details of
> what
> >> we plan to address with it later.Included for possible inclusion
> are:
> >>* resolving the NA/missing-data issues
> >>* finishing group-by
> >>* incorporating the start of label arrays
> >>* incorporating a meta-object
> >>* a few new dtypes (variable-length string, varialbe-length
> unicode
> >> and an enum type)
> >>* adding ufunc support for flexible dtypes and possibly
> structured
> >> arrays
> >>* allowing generalized ufuncs to work on more kinds of arrays
> >> besides just contiguous
> >>* improving the ability for NumPy to receive JIT-generated
> function
> >> pointers for ufuncs and other calculation opportunities
> >>* adding "filters" to Input and Output
> >>* simple computed fields for dtypes
> >>* accepting a Data-Type specification as a class or JSON file
> >>* work towards improving the dtype-addition mechanism
> >>* re-factoring of code so that it can compile with a C++ compiler
> >> and be minimally dependent on Python data-structures.
> >>
> >>* NumPy 2.0 to come out in January of 2013.   Mark Wiebe and I
> will
> >> post to this list a document that explains some of it's proposed
> features
> >> and enhancements.I won't steal his thunder for some of the things
> he is
> >> working on.
> >>
> >> If there are code issues people would like to see addressed, it would
> be a
> >> great time to speak up and/or propose something that you would like to
> see.
> >
> >
> >
> > The above list looks great.  Another request that comes up occasionally
> on
> > the mailing list is for the efficient computation of order statistics,
> the
> > simplest case being a combined min/max function.  Longish thread starts
> > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/
>
> The list looks great, but for the time table I expect there will be at
> least a 1.9 and 1.10 necessary to improve what "we didn't get quite
> right in the first place", or what not many users had time to try out.
>
>

That's my sense also. I think the long list needs to be prioritized and
broken up into smaller chunks.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Chris Ball
Ralf Gommers  googlemail.com> writes: 
...
> While we're at it, our buildbot situation is much worse than our issue
> tracker situation. This also looks good (and free):
> http://www.jetbrains.com/teamcity/

I'd like to help with the NumPy Buildbot situation, and below I propose
a plan for myself to do this. However, I realize there are people who
know more about continuous integration than I do. So, if someone is
already planning to do something, I'd be happy to help with a different
plan instead!


I know how to set up and run Buildbot (and how much effort that takes),
but I'm not familiar with the alternatives, so I can only propose one
concrete plan:

I'll find a machine on which to run a build master, then start to add
slaves (real machines or virtual machines). At first I'll focus on the
NumPy master branch, (a) testing it over different operating systems and
versions of Python and (b) reporting things such as test coverage. I'll
keep the Buildbot configuration in a github project, along with
documentation (in case I disappear...).

After getting to this initial stage, I'll discuss about adding more
features (such as testing pull requests, performance testing, building
binaries on the different operating systems, etc). Also, if it's working
well, this Buildbot setup could replace/be merged with the one at
buildbot.scipy.org (I don't know who is currently running that).


Buildbot is used by some big projects (e.g. Python, Chromium, and
Mozilla), but I'm aware that several projects in the scientific/numeric
Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
often using a hosted Jenkins solution such as Shining Panda. A difficult
part of running a Buildbot service is finding hardware for the slaves
and keeping them alive, so a hosted solution sounds wonderful (assuming
hosted solutions offer an adequate range of operating systems etc).
Also, earlier in the "Issue Tracking" thread some commercial packages
were mentioned; I don't know anything about those. So, as I said at the
beginning, if someone is already planning to do something (or wants to)
I'd be happy to help with a different plan instead! Otherwise, I can
proceed with the plan I suggested.

Chris

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Travis Oliphant

Matthew,

What you should take from my post is that I appreciate your concern for the 
future of the NumPy project, and am grateful that you have an eye to the sort 
of things that can go wrong --- it will help ensure they don't go wrong.  

But, I personally don't agree that it is necessary to put any more formal 
structure in place at this time, and we should wait for 6-12 months, and see 
where we are at while doing everything we can to get more people interested in 
contributing to the project. I'm comfortable playing the role of BDF12 with 
a cadre of developers/contributors who seeks to come to consensus.I believe 
there are sufficient checks on the process that will make it quite difficult 
for me to *abuse* that in the short term.   Charles, Rolf, Mark, David, Robert, 
Josef, you, and many others are already quite adept at calling me out when I do 
things they don't like or think are problematic.I encourage them to 
continue this.   I can't promise I'll do everything you want, but I can promise 
I will listen and take your opinions seriously --- just like I take the 
opinions of every contributor to the NumPy and SciPy lists seriously (though 
weighted by the work-effort they have put on the project).
 We can all only continue to do our best to help out wherever we can. 

Just so we are clear:  Continuum's current major client  is the larger 
NumPy/SciPy community itself and this will remain the case for at least several 
months.You have nothing to fear from "other clients" we are trying to 
please.   Thus, we are incentivized to keep as many people happy as possible.   
 In the second place, the Foundation's major client is the same community (and 
even broader) and the rest of the board is committed to the overall success of 
the ecosystem.   There is a reason the board is comprised of a 
wide-representation of that eco-system.   I am very hopeful that numfocus will 
evolve over time to have an active community of people who participate in it's 
processes and plans to support as many projects as it can given the bandwidth 
and funding available to it.   

So, if I don't participate in this discussion, anymore, it's because I am 
working on some open-source things I'd like to show at PyCon, and time is 
clicking down.If you really feel strongly about this, then I would suggest 
that you come up with a proposal for governance that you would like us all to 
review.  At the SciPy conference in Austin this summer we can talk about it --- 
when many of us will be face-to-face.

Best regards,

-Travis



On Feb 16, 2012, at 4:32 PM, Matthew Brett wrote:

> Hi,
> 
> Just for my own sake, can I clarify what you are saying here?
> 
> On Thu, Feb 16, 2012 at 1:11 PM, Travis Oliphant  wrote:
>> I'm not a big fan of design-by-committee as I haven't seen it be very 
>> successful in creating new technologies.   It is pretty good at enforcing 
>> the status-quo.  If I felt like that is what NumPy needed I would be fine 
>> with it.
> 
> Was it your impression that what was being proposed, was design by committee?


> 
>> However, I feel that NumPy is going to be surpassed with other solutions if 
>> steps are not taken to improve the code-base *and* add new features.
> 
> As far as you are concerned, is there any controversy about that?


> 
>> For the next 6-12 months, I am comfortable taking the "benevolent dictator 
>> role".   During that time, I hope we can find many more core developers and 
>> then re-visit the discussion.  My view is that design decisions should be a 
>> consensus based on current contributors to the code base and major users.   
>> To continue to be relevant, NumPy has to serve it's customers.   They are 
>> the ones who will have the final say.   If others feel like they can do 
>> better, a fork is an option.  I don't want that to happen, but it is the 
>> only effective and practical "governance" structure that exists in my mind 
>> outside of the self-governance of the people that participate.
> 
> To confirm, you are saying that you can imagine no improvement in the
> current governance structure?
> 
>> No organizational structure can make up for the lack of great people putting 
>> their hearts and efforts into a great cause.
> 
> But you agree that there might be an organizational structure that
> would make this harder or easier?
> 
> Best,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Travis Oliphant
We never turn down good help like this.  Thank's Chris.   I have applied for an 
unlimited license for TeamCity for the NumPy project.   I have heard good 
things about TeamCity, although getting the slaves cranking and staying 
cranking is the goal and not the CI architecture.If you know build-bot, 
it's a good place to start.  

I have heard very positive things about Jenkins.   I also think that hosted 
solutions are going to be easier to manage over time.  But, your offer is very 
generous. 

-Travis


On Feb 16, 2012, at 5:52 PM, Chris Ball wrote:

> Ralf Gommers  googlemail.com> writes: 
> ...
>> While we're at it, our buildbot situation is much worse than our issue
>> tracker situation. This also looks good (and free):
>> http://www.jetbrains.com/teamcity/
> 
> I'd like to help with the NumPy Buildbot situation, and below I propose
> a plan for myself to do this. However, I realize there are people who
> know more about continuous integration than I do. So, if someone is
> already planning to do something, I'd be happy to help with a different
> plan instead!
> 
> 
> I know how to set up and run Buildbot (and how much effort that takes),
> but I'm not familiar with the alternatives, so I can only propose one
> concrete plan:
> 
> I'll find a machine on which to run a build master, then start to add
> slaves (real machines or virtual machines). At first I'll focus on the
> NumPy master branch, (a) testing it over different operating systems and
> versions of Python and (b) reporting things such as test coverage. I'll
> keep the Buildbot configuration in a github project, along with
> documentation (in case I disappear...).
> 
> After getting to this initial stage, I'll discuss about adding more
> features (such as testing pull requests, performance testing, building
> binaries on the different operating systems, etc). Also, if it's working
> well, this Buildbot setup could replace/be merged with the one at
> buildbot.scipy.org (I don't know who is currently running that).
> 
> 
> Buildbot is used by some big projects (e.g. Python, Chromium, and
> Mozilla), but I'm aware that several projects in the scientific/numeric
> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
> often using a hosted Jenkins solution such as Shining Panda. A difficult
> part of running a Buildbot service is finding hardware for the slaves
> and keeping them alive, so a hosted solution sounds wonderful (assuming
> hosted solutions offer an adequate range of operating systems etc).
> Also, earlier in the "Issue Tracking" thread some commercial packages
> were mentioned; I don't know anything about those. So, as I said at the
> beginning, if someone is already planning to do something (or wants to)
> I'd be happy to help with a different plan instead! Otherwise, I can
> proceed with the plan I suggested.
> 
> Chris
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Thomas Kluyver
On 16 February 2012 23:52, Chris Ball  wrote:
> I'm aware that several projects in the scientific/numeric
> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
> often using a hosted Jenkins solution such as Shining Panda. A difficult
> part of running a Buildbot service is finding hardware for the slaves
> and keeping them alive, so a hosted solution sounds wonderful (assuming
> hosted solutions offer an adequate range of operating systems etc).

We're using ShiningPanda's hosted CI for IPython:
https://jenkins.shiningpanda.com/ipython/

It has a number of things going for it - not least that the basic
service is free for FLOSS - but, to misquote, you can have any OS you
like, so long as it's Debian 6. I get the feeling that they're still
developing things, so maybe there will be more options in the future,
but that's the state now.

You may notice an ipython-mac job in the list - one of our
contributors kindly set up his Mac to run the test suite overnight,
and we have ShiningPanda download the zipped results. It's a neat
trick, but it's not really a solution if you're testing many OS
flavours.

Thomas
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Ognen Duzlevski
On Thu, Feb 16, 2012 at 6:07 PM, Thomas Kluyver  wrote:
> On 16 February 2012 23:52, Chris Ball  wrote:
>> I'm aware that several projects in the scientific/numeric
>> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
>> often using a hosted Jenkins solution such as Shining Panda. A difficult
>> part of running a Buildbot service is finding hardware for the slaves
>> and keeping them alive, so a hosted solution sounds wonderful (assuming
>> hosted solutions offer an adequate range of operating systems etc).
>
> We're using ShiningPanda's hosted CI for IPython:
> https://jenkins.shiningpanda.com/ipython/
>
> It has a number of things going for it - not least that the basic
> service is free for FLOSS - but, to misquote, you can have any OS you
> like, so long as it's Debian 6. I get the feeling that they're still
> developing things, so maybe there will be more options in the future,
> but that's the state now.
>
> You may notice an ipython-mac job in the list - one of our
> contributors kindly set up his Mac to run the test suite overnight,
> and we have ShiningPanda download the zipped results. It's a neat
> trick, but it's not really a solution if you're testing many OS
> flavours.

You may also set up a few (free) EC2 instances on Amazon, some with
Linux and some with Windows Server 2008 and install your own Jenkins
CI solution on them. Unfortunately, OS X is not offered on Amazon
either...

Many ways to skin a cat.. ;-)

Ognen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Nathaniel Smith
On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball  wrote:
> Buildbot is used by some big projects (e.g. Python, Chromium, and
> Mozilla), but I'm aware that several projects in the scientific/numeric
> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
> often using a hosted Jenkins solution such as Shining Panda. A difficult
> part of running a Buildbot service is finding hardware for the slaves
> and keeping them alive, so a hosted solution sounds wonderful (assuming
> hosted solutions offer an adequate range of operating systems etc).

A quick look at Shining Panda suggests that you get no coverage for
anything but Linux, which is a good start but rather limiting. IME by
far the most annoying part of a useful buildbot setup is keeping all
the build slaves up and working. It's one thing to set up a build
environment in one OS, it's quite another to keep like 5 of them
working, each on a different volunteered machine where you don't have
root and the person who does isn't answering email... the total effort
isn't large, but it's really poorly suited to the nature of volunteer
labor, because it needs prompt attention at random intervals. (Also,
this doesn't become obvious until after one's already gotten
everything set up, so then you're stuck limping along because who
wants to start over and build something more maintainable...)

If anyone has existing sysadmin resources then keeping build-slaves
running is a place where they'd be a huge contribution.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 3:58 PM, Travis Oliphant  wrote:
>
> Matthew,
>
> What you should take from my post is that I appreciate your concern for the 
> future of the NumPy project, and am grateful that you have an eye to the sort 
> of things that can go wrong --- it will help ensure they don't go wrong.
>
> But, I personally don't agree that it is necessary to put any more formal 
> structure in place at this time, and we should wait for 6-12 months, and see 
> where we are at while doing everything we can to get more people interested 
> in contributing to the project.     I'm comfortable playing the role of BDF12 
> with a cadre of developers/contributors who seeks to come to consensus.    I 
> believe there are sufficient checks on the process that will make it quite 
> difficult for me to *abuse* that in the short term.   Charles, Rolf, Mark, 
> David, Robert, Josef, you, and many others are already quite adept at calling 
> me out when I do things they don't like or think are problematic.    I 
> encourage them to continue this.   I can't promise I'll do everything you 
> want, but I can promise I will listen and take your opinions seriously --- 
> just like I take the opinions of every contributor to the NumPy and SciPy 
> lists seriously (though weighted by the work-effort they have put on the 
> project).
>  We can all only continue to do our best to help out wherever we can.
>
> Just so we are clear:  Continuum's current major client  is the larger 
> NumPy/SciPy community itself and this will remain the case for at least 
> several months.    You have nothing to fear from "other clients" we are 
> trying to please.   Thus, we are incentivized to keep as many people happy as 
> possible.    In the second place, the Foundation's major client is the same 
> community (and even broader) and the rest of the board is committed to the 
> overall success of the ecosystem.   There is a reason the board is comprised 
> of a wide-representation of that eco-system.   I am very hopeful that 
> numfocus will evolve over time to have an active community of people who 
> participate in it's processes and plans to support as many projects as it can 
> given the bandwidth and funding available to it.
>
> So, if I don't participate in this discussion, anymore, it's because I am 
> working on some open-source things I'd like to show at PyCon, and time is 
> clicking down.    If you really feel strongly about this, then I would 
> suggest that you come up with a proposal for governance that you would like 
> us all to review.  At the SciPy conference in Austin this summer we can talk 
> about it --- when many of us will be face-to-face.

This has not been an encouraging episode in striving for consensus.

I see virtually no movement from your implied position at the
beginning of this thread, other than the following 1) yes you are in
charge 2) you'll consider other options in 6 to 12 months.

I think you're saying here that you won't reply any more on this
thread, and I suppose that reflects the importance you attach to this
problem.

I will not myself propose a governance model because I do not consider
myself to have enough influence (on various metrics) to make it likely
it would be supported.  I wish that wasn't my perception of how things
are done here.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking)

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 4:12 PM, Nathaniel Smith  wrote:
> On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball  wrote:
>> Buildbot is used by some big projects (e.g. Python, Chromium, and
>> Mozilla), but I'm aware that several projects in the scientific/numeric
>> Python ecosystem use Jenkins (including Cython, IPython, and SymPy),
>> often using a hosted Jenkins solution such as Shining Panda. A difficult
>> part of running a Buildbot service is finding hardware for the slaves
>> and keeping them alive, so a hosted solution sounds wonderful (assuming
>> hosted solutions offer an adequate range of operating systems etc).
>
> A quick look at Shining Panda suggests that you get no coverage for
> anything but Linux, which is a good start but rather limiting. IME by
> far the most annoying part of a useful buildbot setup is keeping all
> the build slaves up and working. It's one thing to set up a build
> environment in one OS, it's quite another to keep like 5 of them
> working, each on a different volunteered machine where you don't have
> root and the person who does isn't answering email... the total effort
> isn't large, but it's really poorly suited to the nature of volunteer
> labor, because it needs prompt attention at random intervals. (Also,
> this doesn't become obvious until after one's already gotten
> everything set up, so then you're stuck limping along because who
> wants to start over and build something more maintainable...)
>
> If anyone has existing sysadmin resources then keeping build-slaves
> running is a place where they'd be a huge contribution.

Yup - keeping the slaves running is the big problem.

We do have various slaves running here that at least are all
accessible by me, and Jarrod, and (at a pinch) Fernando, Stefan and
others.

These are:

XP (when I'm not using the machine, which is the large majority of the time)
OSX 10.5
OSX 10.4 PPC
Linux 32 bit
Linux 64 bit

These are all real machines not virtual machines.  I'm happy to give
some reliable person ssh access to the buildslave user on these
machines.  They won't necessarily be available for all time, they are
dotted around campus doing various jobs like being gateways, project
machines, occasional desktops.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread David Gowers (kampu)
On Fri, Feb 17, 2012 at 9:09 AM, Travis Oliphant  wrote:

>   * incorporating a meta-object
>   * a few new dtypes (variable-length string, varialbe-length unicode and 
> an enum type)
>   * simple computed fields for dtypes

>From the sound of that, I'm certainly looking forward to seeing some details
(like: Do you mean Pascal (length, content) style strings, AKA struct
code 'p'?; Read-only dtype fields computed via a callback function?).

>        * accepting a Data-Type specification as a class or JSON file

On that subject, I incidentally have implemented a pair of functions
(freeze()/thaw()) that make de/serialization to JSON or YAML fairly
simple.

(currently they leave fundamental dtypes as is. Basically the only
thing that would be necessary to render the result serializable
to/from JSON, is representing fundamental dtypes as JSON-safe objects
.. a string would probably do.)

http://paste.pocoo.org/show/552311/

(Modified slightly from code in my project here:
https://gitorious.org/bits/bits/blobs/master/dtype.py)

I've tried and failed to find a bug report for dtype serialization.
Should I create a new ticket for JSON deserialization?

(serialization wouldn't hurt either, since that would let us store
both an array's data/shape/etc and its dtype in the same JSON
document.)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread David Warde-Farley
On 2012-02-16, at 1:28 PM, Charles R Harris wrote:

> I think this is a good point, which is why the idea of a long term release is 
> appealing. That release should be stodgy and safe, while the ongoing 
> development can be much more radical in making changes.

I sort of thought this *was* the state of affairs re: NumPy 2.0.

> And numpy really does need a fairly radical rewrite, just to clarify and 
> simplify the base code easier if nothing else. New features I'm more leery 
> about, at least until the code base is improved, which would be my short term 
> priority.

As someone who has now thrice ventured into the NumPy C code (twice to add 
features, once to fix a nasty bug I encountered) I simply could not agree more. 
While it's not a completely hopeless exercise for someone comfortable with C to 
get themselves acquainted with NumPy's C internals, the code base as is could 
be simpler. 

A refactoring and *documentation* effort would be a good way to get more people 
contributing to this side of NumPy. I believe the suggestion of seeing more of 
the C moved to Cython has also been floated before.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Alan G Isaac
On 2/16/2012 7:22 PM, Matthew Brett wrote:
> This has not been an encouraging episode in striving for consensus.

I disagree.
Failure to reach consensus does not imply lack of striving.

I see parallels with a recent hiring decision process
I observed.  There were fundamentally different
views of how to rank the top candidates.  Because those
involved value consensus, these views were
extensively aired and discussed. *That* is where the
commitment to consensus showed.  It proved not possible
to reach a consensus on the candidate choice, so the
decision of the search committee carried the day. (Even
there, there was not consensus.)  In the end,
there is work to be done, and getting the work done
is something *everyone* agrees trumps other disagreements.

Striving for consensus does not mean that a minority
automatically gets veto rights.

Cheers,
Alan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Proposed Roadmap Overview

2012-02-16 Thread Benjamin Root
On Thursday, February 16, 2012, Warren Weckesser wrote:

>
>
> On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant wrote:
>
>> Mark Wiebe and I have been discussing off and on (as well as talking with
>> Charles) a good way forward to balance two competing desires:
>>
>>* addition of new features that are needed in NumPy
>>* improving the code-base generally and moving towards a more
>> maintainable NumPy
>>
>> I know there are load voices for just focusing on the second of these and
>> avoiding the first until we have finished that.  I recognize the need to
>> improve the code base, but I will also be pushing for improvements to the
>> feature-set and user experience in the process.
>>
>> As a result, I am proposing a rough outline for releases over the next
>> year:
>>
>>* NumPy 1.7 to come out as soon as the serious bugs can be
>> eliminated.  Bryan, Francesc, Mark, and I are able to help triage some of
>> those.
>>
>>* NumPy 1.8 to come out in July which will have as many
>> ABI-compatible feature enhancements as we can add while improving test
>> coverage and code cleanup.   I will post to this list more details of what
>> we plan to address with it later.Included for possible inclusion are:
>>* resolving the NA/missing-data issues
>>* finishing group-by
>>* incorporating the start of label arrays
>>* incorporating a meta-object
>>* a few new dtypes (variable-length string, varialbe-length
>> unicode and an enum type)
>>* adding ufunc support for flexible dtypes and possibly structured
>> arrays
>>* allowing generalized ufuncs to work on more kinds of arrays
>> besides just contiguous
>>* improving the ability for NumPy to receive JIT-generated
>> function pointers for ufuncs and other calculation opportunities
>>* adding "filters" to Input and Output
>>* simple computed fields for dtypes
>>* accepting a Data-Type specification as a class or JSON file
>>* work towards improving the dtype-addition mechanism
>>* re-factoring of code so that it can compile with a C++ compiler
>> and be minimally dependent on Python data-structures.
>>
>>* NumPy 2.0 to come out in January of 2013.   Mark Wiebe and I
>> will post to this list a document that explains some of it's proposed
>> features and enhancements.I won't steal his thunder for some of the
>> things he is working on.
>>
>> If there are code issues people would like to see addressed, it would be
>> a great time to speak up and/or propose something that you would like to
>> see.
>>
>
>
> The above list looks great.  Another request that comes up occasionally on
> the mailing list is for the efficient computation of order statistics, the
> simplest case being a combined min/max function.  Longish thread starts
> here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/
>
> Warren
>
>
>
+1 on this.  Also, before I forget, it looks like as of matlab 2011, they
also have a "minmax" function, but for the neural network toolbox. Also,
what it does is so constrained and different that at the very least, a note
about it should go into the "numpy for matlab users" webpage.

Ben Root


>
>> In general NumPy 1.8 will have new features that need to be explored in
>> order that NumPy 2.0 has enough code "experience" in order to be as useful
>> as possible.   I recognize that NumPy 1.8 has quite a few proposed
>> features.   These have been building up and are the big reason I've
>> committed so many resources to NumPy.   The feature-list did not just come
>> out of my head.   They are the result of talking and interacting with many
>> NumPy users and watching the code get used (and not used) in the real
>> world.This will be a faster pace of development.   But, all of this
>> will be in the open.If the NumPy 2.0 schedule is too aggressive, then
>> we will have a NumPy 1.9 release in order to allow features to come out.
>>
>> Thanks,
>>
>> -Travis
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread Matthew Brett
Hi,

On Thu, Feb 16, 2012 at 5:26 PM, Alan G Isaac  wrote:
> On 2/16/2012 7:22 PM, Matthew Brett wrote:
>> This has not been an encouraging episode in striving for consensus.

> Striving for consensus does not mean that a minority
> automatically gets veto rights.

'Striving' for consensus does imply some attempt to get to grips with
the arguments, and working on some compromise to accommodate both
parties.

It seems to me there was very great latitude for finding such a
comprise here, but Travis has terminated the discussion and I see no
sign of a compromise.

"Striving for consensus" can't of course be regulated.  The desire has
to be there.   It's probably true, as Nathaniel says, that there isn't
much you can do to legislate on that.  We can only try to persuade.  I
was trying to do that, I failed, I'll have to look back and see if
there was something else I could have done that would have been more
useful to the same end,

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-16 Thread John Hunter
On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac  wrote:

> On 2/16/2012 7:22 PM, Matthew Brett wrote:
> > This has not been an encouraging episode in striving for consensus.
>
> I disagree.
> Failure to reach consensus does not imply lack of striving.
>
>
Hey Alan, thanks for your thoughtful and nuanced views.  I agree  with
everything you've said, but have a few additional points.

At the risk of wading into a thread that has grown far too long, and
echoing Eric's comments that the idea of governance is murky at best
when there is no provision for enforceability, I have a few comments.
Full disclosure: Travis has asked me and I have agreed to to serve on
a board for "numfocus", the not-for-profit arm of his efforts to
promote numpy and related tools.  Although I have no special numpy
developer chops, as the original author of matplotlib, which is one of
the leading "numpy clients", he asked me to join his organization as a
"community representative".  I support his efforts, and so agreed to
join the numfocus board.

My first and most important point is that the subtext of many postings here
about the fear of undue and inappropriate influence of Continuum under
Travis' leadership is far overblown.  Travis created numpy -- it is
his baby.  Undeniably, he created it by standing on the shoulders of
giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and
many others.  But the idea that we need to guard against the
possibility that his corporate interests will compromise his interests
in "what is best for numpy" is academic at best.

As someone who has created a significant project in the realm of
"scientific computing in Python", I can tell you that it is something
I take quite a bit of pride in and it is very important to me that the
project thrives as it was intended to: as a free, open-source,
best-practice way of doing science.  I know Travis well enough to know
he feels the same way -- numpy doing well is *at least* important to
him his company doing well.  All of his recent actions to start a
company and foundation which focuses resources on numpy and related
tools reinforce that view.  If he had a different temperament, he
wouldn't have devoted five to ten years of is life to Numeric, scipy
and numpy.  He is a BDFL for a reason: he has earned our trust.

And he has proven his ability to lead when *almost everyone* was
against him.  At the height of the Numeric/numarray split, and I was
deeply involved in this as the mpl author because we had a "numerix"
compatibility layer to allow users to use one or the other, Travis
proposed writing numpy to solve both camp's problems.  I really can't
remember a single individual who supported him.  What I remember is
the cacophony of voices who though this was a bad idea, because of the
"third fork" problem.  But Travis forged ahead, on his own, wrote
numpy, and re-united the Numeric and numarray camps.  And
all-the-while he maintained his friendship with the numarray
developers (Perry Greenfield who led the numarray development effort
has also been invited by Travis to the numfocus board, as has Fernando
Perez and Jarrod Millman).  Although MPL at the time agreed to support
a third version in its numerix compatibility layer for numpy, I can
thankfully say we have since dropped support for the compatibility
layer entirely as we all use numpy now.  This to me is the distilled
essence of leadership, against the voices of the masses, and it bears
remembering.

I have two more points I want to make: one is on democracy, and one is
on corporate control.  On corporate control: there have been a number
of posts in this thread about the worries and dangers that Continuum
poses as the corporate sponser of numpy development, about how this
may cause numpy to shift from a model of a few loosely connected,
decentralized cadre of volunteers to a centrally controlled steering
committee of programmers who are controlled by corporate headquarters
and who make all their decisions around the water cooler unobserved by
the community of users.

I want to make a connection to something that happened in the history
of matplotlib development, something that is not strictly analogous
but I think close enough to be informative.  Sometime around 2005,
Perry Greenfield, who heads the development team of the Space
Telescope Science Institute (STScI) that is charged with processing
the Hubble image pipeline, emailed me that he was considering using
matplotlib as their primary image visualization tool.  I can't tell
you how excited I was at the time.  The idea of having institutional
sponsorship from someone as prestigious and resourceful as STScI was
hugely motivating.  I worked feverishly for months to add stuff they
needed: better rendering, better image support, mathtext and lots
more.  But more importantly, Perry was offering to bring institutional
support to my project: well qualified full-time employees who
dedicated a significant part of their time to matplotlib
development. H