[Numpy-discussion] Cross-covariance function
Greetings, I recently needed to calculate the cross-covariance of two random vectors, (e.g. I have two matricies, X and Y, the columns of which are observations of one variable, and I wish to generate a matrix pairing each value of X and Y) and so I wrote a small utility function to do so, and I'd like to try and get it integrated into numpy core, if it is deemed useful. I have never submitted a patch to numpy before, so I'm not sure as to the protocol; do I ask someone on this list to review the code? Are there conventions I should be aware of? Etc... Thank you all, -E ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Download page still points to SVN
On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair wrote: > I think (as usual), the problem is that fixing the situation lies on > the shoulders of people who are already heavily overburdened.. I certainly understand that problem, as I'm eternally behind on a million things regarding ipython. But the only solution to these problems is delegation, not asking the already overburdened few to work even harder than they already do. I wonder if we could distribute the process of managing the websites a little more for numpy/scipy, so this didn't bottleneck as much. Furthermore, managing those is the kind of task that can be accomplished by someone who may not feel comfortable touching the numpy C core, and yet it's a *great* way to help the project out. In ipython, we've moved to github-pages hosting for everything, which means that now having a web team is as easy as clicking on the github interface a couple of times, and that's one more task we can get help on from others. In fairness, right now the ipython-web team is the same people as the core, but at least things are in place to accept new hands helping should they become available, without any conflict with core development. Just a thought. Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Download page still points to SVN
On 19 January 2012 00:44, Fernando Perez wrote: > On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair > wrote: >> It's rather confusing having two websites. The "official" page at >> http://www.scipy.org/Download points to github. > > The problem is that this page, which looks pretty official to just about > anyone: > > http://numpy.scipy.org/ > > takes you to the one at new.scipy... So as far as traps for the > unwary go, this one was pretty cleverly laid out ;) It certainly is. I think (as usual), the problem is that fixing the situation lies on the shoulders of people who are already heavily overburdened.. There is a pull request updating the offending page at https://github.com/scipy/scipy.org-new/pull/1 if any overburdened types feel like merging, building and uploading the revised html. Cheers, Scott ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
Does anybody know if there is similar chance for training in Paris? (or other places of France)/ the price is nice, just because it's in US thanks, Chao 2012/1/18 Olivier Grisel > Hi all, > > Just a quick email to advertise this year's PyCon tutorials as they > are very focused on HPC & data analytics. In particular the numpy / > scipy ecosystem is well covered, see: > > https://us.pycon.org/2012/schedule/tutorials/ > > Here is a selection of tutorials with an abstracts that mention numpy > or a related project (scipy, ipython, matplotlib...): > > - Bayesian statistics made (as) simple (as possible) - Allen Downey > https://us.pycon.org/2012/schedule/presentation/10/ > > - IPython in-depth: high-productivity interactive and parallel python > - Fernando Pérez , Brian E. Granger , Min Ragan-Kelley > https://us.pycon.org/2012/schedule/presentation/121/ > > - Faster Python Programs through Optimization - Mike Müller > https://us.pycon.org/2012/schedule/presentation/245/ > > - Graph Analysis from the Ground Up - Van Lindberg > https://us.pycon.org/2012/schedule/presentation/228/ > > - Data analysis in Python with pandas - Wes McKinney > https://us.pycon.org/2012/schedule/presentation/427/ > > - Social Network Analysis with Python - Maksim Tsvetovat > https://us.pycon.org/2012/schedule/presentation/15/ > > - High Performance Python I - Ian Ozsvald > https://us.pycon.org/2012/schedule/presentation/174/ > > - Plotting with matplotlib - Mike Müller > https://us.pycon.org/2012/schedule/presentation/238/ > > - Introduction to Interactive Predictive Analytics in Python with > scikit-learn - Olivier Grisel > https://us.pycon.org/2012/schedule/presentation/195/ > > - High Performance Python II - Travis Oliphant > https://us.pycon.org/2012/schedule/presentation/343/ > > Also the main conference has also very interesting talks: > > https://us.pycon.org/2012/schedule/ > > The early birds rate for the PyCOn ends on Jan 25. > > See you in PyCon in March, > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Download page still points to SVN
On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair wrote: > It's rather confusing having two websites. The "official" page at > http://www.scipy.org/Download points to github. The problem is that this page, which looks pretty official to just about anyone: http://numpy.scipy.org/ takes you to the one at new.scipy... So as far as traps for the unwary go, this one was pretty cleverly laid out ;) Best, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
Hi all, Just a quick email to advertise this year's PyCon tutorials as they are very focused on HPC & data analytics. In particular the numpy / scipy ecosystem is well covered, see: https://us.pycon.org/2012/schedule/tutorials/ Here is a selection of tutorials with an abstracts that mention numpy or a related project (scipy, ipython, matplotlib...): - Bayesian statistics made (as) simple (as possible) - Allen Downey https://us.pycon.org/2012/schedule/presentation/10/ - IPython in-depth: high-productivity interactive and parallel python - Fernando Pérez , Brian E. Granger , Min Ragan-Kelley https://us.pycon.org/2012/schedule/presentation/121/ - Faster Python Programs through Optimization - Mike Müller https://us.pycon.org/2012/schedule/presentation/245/ - Graph Analysis from the Ground Up - Van Lindberg https://us.pycon.org/2012/schedule/presentation/228/ - Data analysis in Python with pandas - Wes McKinney https://us.pycon.org/2012/schedule/presentation/427/ - Social Network Analysis with Python - Maksim Tsvetovat https://us.pycon.org/2012/schedule/presentation/15/ - High Performance Python I - Ian Ozsvald https://us.pycon.org/2012/schedule/presentation/174/ - Plotting with matplotlib - Mike Müller https://us.pycon.org/2012/schedule/presentation/238/ - Introduction to Interactive Predictive Analytics in Python with scikit-learn - Olivier Grisel https://us.pycon.org/2012/schedule/presentation/195/ - High Performance Python II - Travis Oliphant https://us.pycon.org/2012/schedule/presentation/343/ Also the main conference has also very interesting talks: https://us.pycon.org/2012/schedule/ The early birds rate for the PyCOn ends on Jan 25. See you in PyCon in March, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)
On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds wrote: >> >> I suspect that you are obtaining the numpy object (1 Py_INCREF) before >> you split into multiple threads but releasing them in each thread >> (multiple Py_DECREFs). This is probably being hidden from you by the >> boost.python interface and/or the boost::detail::sp_counted_impl_p<> >> smart(ish) pointer. Check the backtrace where your code starts to >> verify if this looks to be the case. > > Thankyou for your quick reply. This makes a lot of sense, I'm just > having trouble seeing where this could be happening as everything I > pass into each parallel computation strand is pass down as either > pointer-to-consts or reference-to-const - the only things that need to > be modified (for example random number generator objects) are created > uniquely inside each iteration of the for loop so it can't be that. My C++-fu is fairly weak, so I'm never really sure what the smart pointers are doing when. If there are tracing features that you can turn on, try that. Is this deallocation of the smart pointer to the "garf::multivariate_normal const" being done inside the loop or outside back in the main thread? Where did it get created? > This information about which object has the reference count problem > helps though, I will keep digging. I'm vaguely planning on trying to > track every incref and decref so I can pin down which object has an > unbalanced amount - to do this I want to know the address of the > array, rather than the associated datatype descriptor - I assume I > want to pay attention to the (self=0x117e0e850) in this line, and that > is the address of the array I am mishandling? > > #1 0x000102897fc4 in array_dealloc (self=0x117e0e850) at > arrayobject.c:271 Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)
> > I suspect that you are obtaining the numpy object (1 Py_INCREF) before > you split into multiple threads but releasing them in each thread > (multiple Py_DECREFs). This is probably being hidden from you by the > boost.python interface and/or the boost::detail::sp_counted_impl_p<> > smart(ish) pointer. Check the backtrace where your code starts to > verify if this looks to be the case. Thankyou for your quick reply. This makes a lot of sense, I'm just having trouble seeing where this could be happening as everything I pass into each parallel computation strand is pass down as either pointer-to-consts or reference-to-const - the only things that need to be modified (for example random number generator objects) are created uniquely inside each iteration of the for loop so it can't be that. This information about which object has the reference count problem helps though, I will keep digging. I'm vaguely planning on trying to track every incref and decref so I can pin down which object has an unbalanced amount - to do this I want to know the address of the array, rather than the associated datatype descriptor - I assume I want to pay attention to the (self=0x117e0e850) in this line, and that is the address of the array I am mishandling? #1 0x000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 Malcolm > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)
On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds wrote: > Hi, > > I've built a system which allocates numpy arrays and processes them in > C++ code (this is because I'm building a native code module using > boost.python and it makes sense to use numpy data storage to then deal > with outputs in python, without having to do any copying). Everything > seems fine except when I parallelise the main loop, (openmp and TBB > give the same results) in which case I see a whole bunch of messages > saying > > "reference count error detected: an attempt was made to deallocate 12 (d)" > > sometimes during the running of the program, sometimes all at the end > (presumably when all the destructors in my program run). > > To clarify, the loop I am now running parallel takes read-only > parameters (enforced by the C++ compiler using 'const') and as far as > I can tell there are no race conditions with multiple threads writing > to the same numpy arrays at once or anything obvious like that. > > I recompiled numpy (I'm using 1.6.1 from the official git repository) > to print out some extra information with the reference count message, > namely a pointer to the thing which is being erroneously deallocated. > Surprisingly, it is always the same address for any run of the > program, considering this is a message printed out hundreds of times. > > I've looked into this a little with GDB and as far as I can see the > object which the message pertains to is an "array descriptor", or at > least that's what I conclude from backtraces similar to the following: > > Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 > 1501 fprintf(stderr, "*** Reference count error detected: \n" \ > (gdb) bt > #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 > #1 0x000102897fc4 in array_dealloc (self=0x117e0e850) at > arrayobject.c:271 > #2 0x000103e592d7 in > boost::detail::sp_counted_impl_p > const>::dispose (this= optimizations>) at refcount.hpp:36 > #3 my code I suspect there is some problem with the reference counting that you are doing at the C++ level that is causing you to do too many Py_DECREFs to the numpy objects, and this is being identified by the arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level implementation of dtype objects.) Reading the comments just before descriptor.c:1501 points out that this warning is being printed because something is trying to deallocate the builtin np.dtype('d') == np.dtype('float64') dtype. This should never happen. The refcount for these objects should always be > 0 because numpy itself holds references to them. I suspect that you are obtaining the numpy object (1 Py_INCREF) before you split into multiple threads but releasing them in each thread (multiple Py_DECREFs). This is probably being hidden from you by the boost.python interface and/or the boost::detail::sp_counted_impl_p<> smart(ish) pointer. Check the backtrace where your code starts to verify if this looks to be the case. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] "Reference count error detected" bug appears with multithreading (OpenMP & TBB)
Hi, I've built a system which allocates numpy arrays and processes them in C++ code (this is because I'm building a native code module using boost.python and it makes sense to use numpy data storage to then deal with outputs in python, without having to do any copying). Everything seems fine except when I parallelise the main loop, (openmp and TBB give the same results) in which case I see a whole bunch of messages saying "reference count error detected: an attempt was made to deallocate 12 (d)" sometimes during the running of the program, sometimes all at the end (presumably when all the destructors in my program run). To clarify, the loop I am now running parallel takes read-only parameters (enforced by the C++ compiler using 'const') and as far as I can tell there are no race conditions with multiple threads writing to the same numpy arrays at once or anything obvious like that. I recompiled numpy (I'm using 1.6.1 from the official git repository) to print out some extra information with the reference count message, namely a pointer to the thing which is being erroneously deallocated. Surprisingly, it is always the same address for any run of the program, considering this is a message printed out hundreds of times. I've looked into this a little with GDB and as far as I can see the object which the message pertains to is an "array descriptor", or at least that's what I conclude from backtraces similar to the following: Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 1501fprintf(stderr, "*** Reference count error detected: \n" \ (gdb) bt #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 #1 0x000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 #2 0x000103e592d7 in boost::detail::sp_counted_impl_p const>::dispose (this=) at refcount.hpp:36 #3 my code Obviously I can turn off the parallelism to make this problem go away, but since my underlying algorithm is trivially parallelisable I was counting on being able to achieve linear speedup across cores.. Currently I can, and as far as I know there are no actual incorrect results being produced by the program. However, in my field (Machine Learning) it's difficult enough to know whether the numbers calculated are sensible even without the presence of these kind of warnings, so I'd like to get a handle on at least why this is happening so I'd know know whether I can safely ignore it. My guess at what might be happening is that the multiple threads are dealing with some object concurrently and the updates to the reference count are not processed atomically, meaning that there are too many DECREFs which happen later on. I had presumed that allocated different numpy matrices in different threads, and then all reading from central numpy matrices would work fine, but apparently there is something I missed, pertaining to descriptors.. Can anyone offer any guidance, or at least tell me this is safe to ignore? I can reproduce the problem reliably, so if you need me to do some digging with GDB at the point the error takes place I can do that. Many thanks, Malcolm ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays
On Wed, Jan 18, 2012 at 10:19, Peter wrote: > Sending this again (sorry Robert, this will be the second time > for you) since I sent from a non-subscribed email address the > first time. > > On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote: >> On Sun, Jan 15, 2012 at 19:10, Peter wrote: >>> Hello all, >>> >>> Is there a recommended (and ideally cross platform) >>> way to load the frames of a QuickTime movie (*.mov >>> file) in Python as NumPy arrays? ... >> >> I've had luck with pyffmpeg, though I haven't tried >> QuickTime .mov files: >> >> http://code.google.com/p/pyffmpeg/ > > Thanks for the suggestion. > > Sadly right now pyffmpeg won't install on Mac OS X, > at least not with the version of Cython I have installed: > http://code.google.com/p/pyffmpeg/issues/detail?id=44 > > There doesn't seem to have been any activity on the > official repository for some time either. Oh, right, I had to fix those, too. I've attached the patches that I used. I used MacPorts to install the ffmpeg libraries, so I modified the paths in the setup.py appropriately. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco setup-fix.diff Description: Binary data cinit-fix.diff Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of arrays
Sending this again (sorry Robert, this will be the second time for you) since I sent from a non-subscribed email address the first time. On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote: > On Sun, Jan 15, 2012 at 19:10, Peter wrote: >> Hello all, >> >> Is there a recommended (and ideally cross platform) >> way to load the frames of a QuickTime movie (*.mov >> file) in Python as NumPy arrays? ... > > I've had luck with pyffmpeg, though I haven't tried > QuickTime .mov files: > > http://code.google.com/p/pyffmpeg/ Thanks for the suggestion. Sadly right now pyffmpeg won't install on Mac OS X, at least not with the version of Cython I have installed: http://code.google.com/p/pyffmpeg/issues/detail?id=44 There doesn't seem to have been any activity on the official repository for some time either. Peter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Download page still points to SVN
On 18 January 2012 11:22, Fernando Perez wrote: > I was just pointing a colleague to the 'official download page' for > numpy so he could find how to grab current sources: > > http://new.scipy.org/download.html > > but I was quite surprised to find that it still points to SVN for both > numpy and scipy. It would probably not be a bad idea to update those > and point them to github... It's rather confusing having two websites. The "official" page at http://www.scipy.org/Download points to github. There hasn't been much maintenance effort for new.scipy.org, and there was some recent discussion about taking it offline. I'm not sure if a firm conclusion was reached. Cheers, Scott ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Counting the Colors of RGB-Image
Sorry, that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker. When iam direct answering on Your e-mail i get an error 5. I think i did a mistake. Your ideas are very helpfull and the code is very fast. Thank You elodw ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Download page still points to SVN
Hi folks, I was just pointing a colleague to the 'official download page' for numpy so he could find how to grab current sources: http://new.scipy.org/download.html but I was quite surprised to find that it still points to SVN for both numpy and scipy. It would probably not be a bad idea to update those and point them to github... Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion