Re: [Numpy-discussion] Overlapping ranges
On Mon, Mar 16, 2009 at 5:29 PM, Robert Kern wrote: > 2009/3/16 Peter Saffrey : > >> At the moment, I'm using a fairly naive approach that finds roughly in the >> genome (which gene) each point might be and then checking it against the >> bins in that gene. If I split the problem into chromosomes, I feel sure >> there must be some super-fast matrix approach I can apply using numpy, but >> I'm struggling a bit. Can anybody suggest something? > > You probably need something algorithmically better, like interval > trees. There are a couple of C/Python implementations floating around. > If I understand your problem correctly, then with a smaller scaled problem something like this should work {{{ import numpy as np B = np.array([[1,3],[2,5],[7,10], [6,15],[14,20]]) # bins P = np.c_[np.arange(1,16), 4+np.arange(1,16)] # points #mask = (~(P[:,0:1]>D[:,1:2].T)) * (~(P[:,1:2]B[:,1:2].T), (P[:,1:2]http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping ranges
2009/3/16 Peter Saffrey : > At the moment, I'm using a fairly naive approach that finds roughly in the > genome (which gene) each point might be and then checking it against the > bins in that gene. If I split the problem into chromosomes, I feel sure > there must be some super-fast matrix approach I can apply using numpy, but > I'm struggling a bit. Can anybody suggest something? You probably need something algorithmically better, like interval trees. There are a couple of C/Python implementations floating around. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Overlapping ranges
I'm trying to file a set of data points, defined by genome coordinates, into bins, also based on genome coordinates. Each data point is (chromosome, start, end, point) and each bin is (chromosome, start, end). I have about 140 million points to file into around 100,000 bins. Both are (roughly) evenly distributed over the 24 chromosomes (1-22, X and Y). Genome coordinates are integers and my data points are floats. For each data point, (end - start) is roughly 1000, but the bins are are of uneven widths. Bins might have also overlap - in that case, I need to know all the bins that a point overlaps. By overlap, I mean the start or end of the data point (or both) is inside the bin or that the point entirely covers the bin. At the moment, I'm using a fairly naive approach that finds roughly in the genome (which gene) each point might be and then checking it against the bins in that gene. If I split the problem into chromosomes, I feel sure there must be some super-fast matrix approach I can apply using numpy, but I'm struggling a bit. Can anybody suggest something? Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 1.3.x branch created - trunk now opened for 1.4
Hi, I have just started the 1.3.x branch - as such, any change done to the trunk will not end up in the 1.3 release. I will announce the 1.3 beta release within the day, hopefully, cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] svn and tickets email status
2009/3/16 Ryan May > Hi, > > What's the status on SVN and ticket email notifications? The only messages > I'm seeing since the switch is the occasional spam. Should I try > re-subscribing? > I get the ticket notifications but I think the svn notifications are still broken. I needed to update my email address to receive ticket notifications, the mail was going to an old address after the change. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Superfluous array transpose (cf. ticket #1054)
On Mon, March 16, 2009 4:05 pm, Sturla Molden wrote: > On 3/16/2009 9:27 AM, Pearu Peterson wrote: > >> If a operation produces new array then the new array should have the >> storage properties of the lhs operand. > > That would not be enough, as 1+a would behave differently from a+1. The > former would change storage order and the latter would not. Actually, 1+a would be handled by __radd__ method and hence the storage order would be defined by the rhs (lhs of the __radd__ method). > Broadcasting arrays adds futher to the complexity of the problem. I guess, similar rules should be applied to storage order then. Pearu ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] svn and tickets email status
Hi, What's the status on SVN and ticket email notifications? The only messages I'm seeing since the switch is the occasional spam. Should I try re-subscribing? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from: Norman Oklahoma United States. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Superfluous array transpose (cf. ticket #1054)
On 3/16/2009 9:27 AM, Pearu Peterson wrote: > If a operation produces new array then the new array should have the > storage properties of the lhs operand. That would not be enough, as 1+a would behave differently from a+1. The former would change storage order and the latter would not. Broadcasting arrays adds futher to the complexity of the problem. It seems necessary to something like this to avoid the trap when using f2py: def some_fortran_function(x): if x.flags['C_CONTIGUOUS']: shape = x.shape[::-1] _x = x.reshape(shape, order='F') _y = _f2py_wrapper(_x) shape = _y.shape[::-1] return y.reshape(shape, order='C') else: return _f2py_wrapper(x) And then preferably never use Fortran ordered arrays directly. Sturla Molden ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Superfluous array transpose (cf. ticket #1054)
On Sun, March 15, 2009 8:57 pm, Sturla Molden wrote: > > Regarding ticket #1054. What is the reason for this strange behaviour? > a = np.zeros((10,10),order='F') a.flags > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False (a+1).flags > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False I wonder if this behavior could be considered as a bug because it does not seem to have any advantages but only hides the storage order change and that may introduce inefficiencies. If a operation produces new array then the new array should have the storage properties of the lhs operand. That would allow writing code a = zeros(, order='F') b = a + 1 instead of a = zeros(, order='F') b = a[:] b += 1 to keep the storage properties in operations. Regards, Pearu ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
On 20-Feb-09, at 6:41 AM, Olivier Grisel wrote: > Alright, thanks for the reply. > > Is there a canonical way /sample code to gain low level access to > blas / lapack > atlas routines using ctypes from numpy / scipy code? > > I don't mind fixing the dimensions and the ndtype of my array if it > can > decrease the memory overhead. I got some clarification from Pearu Peterson off-list. For gemm the issue is that if the matrix C is not Fortran-ordered, it will be copied, and that copy will be over-written. order='F' when creating the array being overwritten will fix this. DWF ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion