[Numpy-discussion] Cross-covariance function

2012-01-18 Thread Elliot Saba
Greetings,

I recently needed to calculate the cross-covariance of two random vectors,
(e.g. I have two matricies, X and Y, the columns of which are observations
of one variable, and I wish to generate a matrix pairing each value of X
and Y) and so I wrote a small utility function to do so, and I'd like to
try and get it integrated into numpy core, if it is deemed useful.

I have never submitted a patch to numpy before, so I'm not sure as to the
protocol; do I ask someone on this list to review the code?  Are there
conventions I should be aware of?  Etc...

Thank you all,
-E
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Download page still points to SVN

2012-01-18 Thread Fernando Perez
On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair
 wrote:
> I think (as usual), the problem is that fixing the situation lies on
> the shoulders of people who are already heavily overburdened..

I certainly understand that problem, as I'm eternally behind on a
million things regarding ipython.

But the only solution to these problems is delegation, not asking the
already overburdened few to work even harder than they already do.  I
wonder if we could distribute the process of managing the websites a
little more for numpy/scipy, so this didn't bottleneck as much.

Furthermore, managing those is the kind of task that can be
accomplished by someone who may not feel comfortable touching the
numpy C core, and yet it's a *great* way to help the project out.

In ipython, we've moved to github-pages hosting for everything, which
means that now having a web team is as easy as clicking on the github
interface a couple of times, and that's one more task we can get help
on from others.   In fairness, right now the ipython-web team is the
same people as the core, but at least things are in place to accept
new hands helping should they become available, without any conflict
with core development.

Just a thought.

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Download page still points to SVN

2012-01-18 Thread Scott Sinclair
On 19 January 2012 00:44, Fernando Perez  wrote:
> On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair
>  wrote:
>> It's rather confusing having two websites. The "official" page at
>> http://www.scipy.org/Download points to github.
>
> The problem is that this page, which looks pretty official to just about 
> anyone:
>
> http://numpy.scipy.org/
>
> takes you to the one at new.scipy...  So as far as traps for the
> unwary go, this one was pretty cleverly laid out ;)

It certainly is.

I think (as usual), the problem is that fixing the situation lies on
the shoulders of people who are already heavily overburdened..

There is a pull request updating the offending page at
https://github.com/scipy/scipy.org-new/pull/1 if any overburdened
types feel like merging, building and uploading the revised html.

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012

2012-01-18 Thread Chao YUE
Does anybody know if there is similar chance for training in Paris? (or
other places of France)/
the price is nice, just because it's in US

thanks,

Chao

2012/1/18 Olivier Grisel 

> Hi all,
>
> Just a quick email to advertise this year's PyCon tutorials as they
> are very focused on HPC & data analytics. In particular the numpy /
> scipy ecosystem is well covered, see:
>
>  https://us.pycon.org/2012/schedule/tutorials/
>
> Here is a selection of tutorials with an abstracts that mention numpy
> or a related project (scipy, ipython, matplotlib...):
>
> - Bayesian statistics made (as) simple (as possible) - Allen Downey
> https://us.pycon.org/2012/schedule/presentation/10/
>
> - IPython in-depth: high-productivity interactive and parallel python
> - Fernando Pérez , Brian E. Granger , Min Ragan-Kelley
> https://us.pycon.org/2012/schedule/presentation/121/
>
> - Faster Python Programs through Optimization - Mike Müller
> https://us.pycon.org/2012/schedule/presentation/245/
>
> - Graph Analysis from the Ground Up - Van Lindberg
> https://us.pycon.org/2012/schedule/presentation/228/
>
> - Data analysis in Python with pandas - Wes McKinney
> https://us.pycon.org/2012/schedule/presentation/427/
>
> - Social Network Analysis with Python - Maksim Tsvetovat
> https://us.pycon.org/2012/schedule/presentation/15/
>
> - High Performance Python I - Ian Ozsvald
> https://us.pycon.org/2012/schedule/presentation/174/
>
> - Plotting with matplotlib - Mike Müller
> https://us.pycon.org/2012/schedule/presentation/238/
>
> - Introduction to Interactive Predictive Analytics in Python with
> scikit-learn - Olivier Grisel
> https://us.pycon.org/2012/schedule/presentation/195/
>
> - High Performance Python II - Travis Oliphant
> https://us.pycon.org/2012/schedule/presentation/343/
>
> Also the main conference has also very interesting talks:
>
>  https://us.pycon.org/2012/schedule/
>
> The early birds rate for the PyCOn ends on Jan 25.
>
> See you in PyCon in March,
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Download page still points to SVN

2012-01-18 Thread Fernando Perez
On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair
 wrote:
> It's rather confusing having two websites. The "official" page at
> http://www.scipy.org/Download points to github.

The problem is that this page, which looks pretty official to just about anyone:

http://numpy.scipy.org/

takes you to the one at new.scipy...  So as far as traps for the
unwary go, this one was pretty cleverly laid out ;)

Best,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012

2012-01-18 Thread Olivier Grisel
Hi all,

Just a quick email to advertise this year's PyCon tutorials as they
are very focused on HPC & data analytics. In particular the numpy /
scipy ecosystem is well covered, see:

  https://us.pycon.org/2012/schedule/tutorials/

Here is a selection of tutorials with an abstracts that mention numpy
or a related project (scipy, ipython, matplotlib...):

- Bayesian statistics made (as) simple (as possible) - Allen Downey
https://us.pycon.org/2012/schedule/presentation/10/

- IPython in-depth: high-productivity interactive and parallel python
- Fernando Pérez , Brian E. Granger , Min Ragan-Kelley
https://us.pycon.org/2012/schedule/presentation/121/

- Faster Python Programs through Optimization - Mike Müller
https://us.pycon.org/2012/schedule/presentation/245/

- Graph Analysis from the Ground Up - Van Lindberg
https://us.pycon.org/2012/schedule/presentation/228/

- Data analysis in Python with pandas - Wes McKinney
https://us.pycon.org/2012/schedule/presentation/427/

- Social Network Analysis with Python - Maksim Tsvetovat
https://us.pycon.org/2012/schedule/presentation/15/

- High Performance Python I - Ian Ozsvald
https://us.pycon.org/2012/schedule/presentation/174/

- Plotting with matplotlib - Mike Müller
https://us.pycon.org/2012/schedule/presentation/238/

- Introduction to Interactive Predictive Analytics in Python with
scikit-learn - Olivier Grisel
https://us.pycon.org/2012/schedule/presentation/195/

- High Performance Python II - Travis Oliphant
https://us.pycon.org/2012/schedule/presentation/343/

Also the main conference has also very interesting talks:

  https://us.pycon.org/2012/schedule/

The early birds rate for the PyCOn ends on Jan 25.

See you in PyCon in March,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

2012-01-18 Thread Robert Kern
On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds
 wrote:
>>
>> I suspect that you are obtaining the numpy object (1 Py_INCREF) before
>> you split into multiple threads but releasing them in each thread
>> (multiple Py_DECREFs). This is probably being hidden from you by the
>> boost.python interface and/or the boost::detail::sp_counted_impl_p<>
>> smart(ish) pointer. Check the backtrace where your code starts to
>> verify if this looks to be the case.
>
> Thankyou for your quick reply. This makes a lot of sense, I'm just
> having trouble seeing where this could be happening as everything I
> pass into each parallel computation strand is pass down as either
> pointer-to-consts or reference-to-const - the only things that need to
> be modified (for example random number generator objects) are created
> uniquely inside each iteration of the for loop so it can't be that.

My C++-fu is fairly weak, so I'm never really sure what the smart
pointers are doing when. If there are tracing features that you can
turn on, try that. Is this deallocation of the smart pointer to the
"garf::multivariate_normal const" being done inside the loop
or outside back in the main thread? Where did it get created?

> This information about which object has the reference count problem
> helps though, I will keep digging. I'm vaguely planning on trying to
> track every incref and decref so I can pin down which object has an
> unbalanced amount - to do this I want to know the address of the
> array, rather than the associated datatype descriptor - I assume I
> want to pay attention to the (self=0x117e0e850) in this line, and that
> is the address of the array I am mishandling?
>
> #1  0x000102897fc4 in array_dealloc (self=0x117e0e850) at 
> arrayobject.c:271

Yes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

2012-01-18 Thread Malcolm Reynolds
>
> I suspect that you are obtaining the numpy object (1 Py_INCREF) before
> you split into multiple threads but releasing them in each thread
> (multiple Py_DECREFs). This is probably being hidden from you by the
> boost.python interface and/or the boost::detail::sp_counted_impl_p<>
> smart(ish) pointer. Check the backtrace where your code starts to
> verify if this looks to be the case.

Thankyou for your quick reply. This makes a lot of sense, I'm just
having trouble seeing where this could be happening as everything I
pass into each parallel computation strand is pass down as either
pointer-to-consts or reference-to-const - the only things that need to
be modified (for example random number generator objects) are created
uniquely inside each iteration of the for loop so it can't be that.

This information about which object has the reference count problem
helps though, I will keep digging. I'm vaguely planning on trying to
track every incref and decref so I can pin down which object has an
unbalanced amount - to do this I want to know the address of the
array, rather than the associated datatype descriptor - I assume I
want to pay attention to the (self=0x117e0e850) in this line, and that
is the address of the array I am mishandling?

#1  0x000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271

Malcolm

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

2012-01-18 Thread Robert Kern
On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds
 wrote:
> Hi,
>
> I've built a system which allocates numpy arrays and processes them in
> C++ code (this is because I'm building a native code module using
> boost.python and it makes sense to use numpy data storage to then deal
> with outputs in python, without having to do any copying). Everything
> seems fine except when I parallelise the main loop, (openmp and TBB
> give the same results) in which case I see a whole bunch of messages
> saying
>
> "reference count error detected: an attempt was made to deallocate 12 (d)"
>
> sometimes during the running of the program, sometimes all at the end
> (presumably when all the destructors in my program run).
>
> To clarify, the loop I am now running parallel takes read-only
> parameters (enforced by the C++ compiler using 'const') and as far as
> I can tell there are no race conditions with multiple threads writing
> to the same numpy arrays at once or anything obvious like that.
>
> I recompiled numpy (I'm using 1.6.1 from the official git repository)
> to print out some extra information with the reference count message,
> namely a pointer to the thing which is being erroneously deallocated.
> Surprisingly, it is always the same address for any run of the
> program, considering this is a message printed out hundreds of times.
>
> I've looked into this a little with GDB and as far as I can see the
> object which the message pertains to is an "array descriptor", or at
> least that's what I conclude from backtraces similar to the following:
>
> Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> 1501            fprintf(stderr, "*** Reference count error detected: \n" \
> (gdb) bt
> #0  arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> #1  0x000102897fc4 in array_dealloc (self=0x117e0e850) at 
> arrayobject.c:271
> #2  0x000103e592d7 in
> boost::detail::sp_counted_impl_p
> const>::dispose (this= optimizations>) at refcount.hpp:36
> #3  my code

I suspect there is some problem with the reference counting that you
are doing at the C++ level that is causing you to do too many
Py_DECREFs to the numpy objects, and this is being identified by the
arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level
implementation of dtype objects.) Reading the comments just before
descriptor.c:1501 points out that this warning is being printed
because something is trying to deallocate the builtin np.dtype('d') ==
np.dtype('float64') dtype. This should never happen. The refcount for
these objects should always be > 0 because numpy itself holds
references to them.

I suspect that you are obtaining the numpy object (1 Py_INCREF) before
you split into multiple threads but releasing them in each thread
(multiple Py_DECREFs). This is probably being hidden from you by the
boost.python interface and/or the boost::detail::sp_counted_impl_p<>
smart(ish) pointer. Check the backtrace where your code starts to
verify if this looks to be the case.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)

2012-01-18 Thread Malcolm Reynolds
Hi,

I've built a system which allocates numpy arrays and processes them in
C++ code (this is because I'm building a native code module using
boost.python and it makes sense to use numpy data storage to then deal
with outputs in python, without having to do any copying). Everything
seems fine except when I parallelise the main loop, (openmp and TBB
give the same results) in which case I see a whole bunch of messages
saying

"reference count error detected: an attempt was made to deallocate 12 (d)"

sometimes during the running of the program, sometimes all at the end
(presumably when all the destructors in my program run).

To clarify, the loop I am now running parallel takes read-only
parameters (enforced by the C++ compiler using 'const') and as far as
I can tell there are no race conditions with multiple threads writing
to the same numpy arrays at once or anything obvious like that.

I recompiled numpy (I'm using 1.6.1 from the official git repository)
to print out some extra information with the reference count message,
namely a pointer to the thing which is being erroneously deallocated.
Surprisingly, it is always the same address for any run of the
program, considering this is a message printed out hundreds of times.

I've looked into this a little with GDB and as far as I can see the
object which the message pertains to is an "array descriptor", or at
least that's what I conclude from backtraces similar to the following:

Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
1501fprintf(stderr, "*** Reference count error detected: \n" \
(gdb) bt
#0  arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
#1  0x000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
#2  0x000103e592d7 in
boost::detail::sp_counted_impl_p
const>::dispose (this=) at refcount.hpp:36
#3  my code

Obviously I can turn off the parallelism to make this problem go away,
but since my underlying algorithm is trivially parallelisable I was
counting on being able to achieve linear speedup across cores..
Currently I can, and as far as I know there are no actual incorrect
results being produced by the program. However, in my field (Machine
Learning) it's difficult enough to know whether the numbers calculated
are sensible even without the presence of these kind of warnings, so
I'd like to get a handle on at least why this is happening so I'd know
know whether I can safely ignore it.

My guess at what might be happening is that the multiple threads are
dealing with some object concurrently and the updates to the reference
count are not processed atomically, meaning that there are too many
DECREFs which happen later on. I had presumed that allocated different
numpy matrices in different threads, and then all reading from central
numpy matrices would work fine, but apparently there is something I
missed, pertaining to descriptors..

Can anyone offer any guidance, or at least tell me this is safe to
ignore? I can reproduce the problem reliably, so if you need me to do
some digging with GDB at the point the error takes place I can do
that.

Many thanks,

Malcolm
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays

2012-01-18 Thread Robert Kern
On Wed, Jan 18, 2012 at 10:19, Peter
 wrote:
> Sending this again (sorry Robert, this will be the second time
> for you) since I sent from a non-subscribed email address the
> first time.
>
> On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote:
>> On Sun, Jan 15, 2012 at 19:10, Peter wrote:
>>> Hello all,
>>>
>>> Is there a recommended (and ideally cross platform)
>>> way to load the frames of a QuickTime movie (*.mov
>>> file) in Python as NumPy arrays? ...
>>
>> I've had luck with pyffmpeg, though I haven't tried
>> QuickTime .mov files:
>>
>>  http://code.google.com/p/pyffmpeg/
>
> Thanks for the suggestion.
>
> Sadly right now pyffmpeg won't install on Mac OS X,
> at least not with the version of Cython I have installed:
> http://code.google.com/p/pyffmpeg/issues/detail?id=44
>
> There doesn't seem to have been any activity on the
> official repository for some time either.

Oh, right, I had to fix those, too. I've attached the patches that I
used. I used MacPorts to install the ffmpeg libraries, so I modified
the paths in the setup.py appropriately.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


setup-fix.diff
Description: Binary data


cinit-fix.diff
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays

2012-01-18 Thread Peter
Sending this again (sorry Robert, this will be the second time
for you) since I sent from a non-subscribed email address the
first time.

On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote:
> On Sun, Jan 15, 2012 at 19:10, Peter wrote:
>> Hello all,
>>
>> Is there a recommended (and ideally cross platform)
>> way to load the frames of a QuickTime movie (*.mov
>> file) in Python as NumPy arrays? ...
>
> I've had luck with pyffmpeg, though I haven't tried
> QuickTime .mov files:
>
>  http://code.google.com/p/pyffmpeg/

Thanks for the suggestion.

Sadly right now pyffmpeg won't install on Mac OS X,
at least not with the version of Cython I have installed:
http://code.google.com/p/pyffmpeg/issues/detail?id=44

There doesn't seem to have been any activity on the
official repository for some time either.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Download page still points to SVN

2012-01-18 Thread Scott Sinclair
On 18 January 2012 11:22, Fernando Perez  wrote:
> I was just pointing a colleague to the 'official download page' for
> numpy so he could find how to grab current sources:
>
> http://new.scipy.org/download.html
>
> but I was quite surprised to find that it still points to SVN for both
> numpy and scipy.  It would probably not be a bad idea to update those
> and point them to github...

It's rather confusing having two websites. The "official" page at
http://www.scipy.org/Download points to github.

There hasn't been much maintenance effort for new.scipy.org, and there
was some recent discussion about taking it offline. I'm not sure if a
firm conclusion was reached.

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Counting the Colors of RGB-Image

2012-01-18 Thread apo

Sorry, 

that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker.
When iam direct answering on Your e-mail i get an error 5.
I think i did a mistake.

Your ideas are very helpfull and the code is very fast.

Thank You

elodw 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Download page still points to SVN

2012-01-18 Thread Fernando Perez
Hi folks,

I was just pointing a colleague to the 'official download page' for
numpy so he could find how to grab current sources:

http://new.scipy.org/download.html

but I was quite surprised to find that it still points to SVN for both
numpy and scipy.  It would probably not be a bad idea to update those
and point them to github...

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion