Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Mon, Jan 23, 2017 at 9:41 AM, Nadav Har'El wrote: > > On Mon, Jan 23, 2017 at 4:52 PM, aleba...@gmail.com wrote: >> >> 2017-01-23 15:33 GMT+01:00 Robert Kern : >>> >>> I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. >>> >> FWIW, I totally agree with Robert > > With my own background (MSc. in Mathematics), I agree that this algorithm is indeed the most natural one. And as I said, when I wanted to implement something myself when I wanted to choose random combinations (k out of n items), I wrote exactly the same one. But when it didn't produce the desired probabilities (even in cases where I knew that doing this was possible), I wrongly assumed numpy would do things differently - only to realize it uses exactly the same algorithm. So clearly, the documentation didn't quite explain what it does or doesn't do. In my experience, I have seen "without replacement" mean only one thing. If the docstring had said "returns unique items", I'd agree that it doesn't explain what it does or doesn't do. The only issue is that "without replacement" is jargon, and it is good to recapitulate the definitions of such terms for those who aren't familiar with them. > Also, Robert, I'm curious: beyond explaining why the existing algorithm is reasonable (which I agree), could you give me an example of where it is actually *useful* for sampling? The references I previously quoted list a few. One is called "multistage sampling proportional to size". The idea being that you draw (without replacement) from a larger units (say, congressional districts) before sampling within them. It is similar to the situation you outline, but it is probably more useful at a different scale, like lots of larger units (where your algorithm is likely to provide no solution) rather than a handful. It is probably less useful in terms of survey design, where you are trying to *design* a process to get a result, than it is in queueing theory and related fields, where you are trying to *describe* and simulate a process that is pre-defined. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Mon, Jan 23, 2017 at 9:22 AM, Anne Archibald wrote: > > > On Mon, Jan 23, 2017 at 3:34 PM Robert Kern wrote: >> >> I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. > > > It is what I would have expected too, but we have a concrete example of a user who expected otherwise; where one user speaks up, there are probably more who didn't (some of whom probably have code that's not doing what they think it does). So for the cost of adding a Note, why not help some of them? That's why I said I'm fine with adding a Note. I'm just suggesting a re-wording so that the cautious language doesn't lead anyone who is familiar with the jargon to think we're doing something ad hoc while still providing the details for those who aren't so familiar. > As for the standardness of the definition: I don't know, have you a reference where it is defined? More natural to me would be to have a list of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time). I'm hesitant to claim ours is a standard definition unless it's in a textbook somewhere. But I don't insist on my phrasing. Textbook, I'm not so sure, but it is the *only* definition I've ever encountered in the literature: http://epubs.siam.org/doi/abs/10.1137/0209009 http://www.sciencedirect.com/science/article/pii/S002001900500298X -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question about numpy.random.choice with probabilties
On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald wrote: > > On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El wrote: >> >> On Wed, Jan 18, 2017 at 4:30 PM, wrote: >>> >>>> Having more sampling schemes would be useful, but it's not possible to implement sampling schemes with impossible properties. >>> >>> BTW: sampling 3 out of 3 without replacement is even worse >>> >>> No matter what sampling scheme and what selection probabilities we use, we always have every element with probability 1 in the sample. >> >> I agree. The random-sample function of the type I envisioned will be able to reproduce the desired probabilities in some cases (like the example I gave) but not in others. Because doing this correctly involves a set of n linear equations in comb(n,k) variables, it can have no solution, or many solutions, depending on the n and k, and the desired probabilities. A function of this sort could return an error if it can't achieve the desired probabilities. > > It seems to me that the basic problem here is that the numpy.random.choice docstring fails to explain what the function actually does when called with weights and without replacement. Clearly there are different expectations; I think numpy.random.choice chose one that is easy to explain and implement but not necessarily what everyone expects. So the docstring should be clarified. Perhaps a Notes section: > > When numpy.random.choice is called with replace=False and non-uniform probabilities, the resulting distribution of samples is not obvious. numpy.random.choice effectively follows the procedure: when choosing the kth element in a set, the probability of element i occurring is p[i] divided by the total probability of all not-yet-chosen (and therefore eligible) elements. This approach is always possible as long as the sample size is no larger than the population, but it means that the probability that element i occurs in the sample is not exactly p[i]. I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve
On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat wrote: > > Yes, that's precisely the case but when we know the structure we can just choose the appropriate solver anyhow with a little bit of overhead. What I mean is that, to my knowledge, FORTRAN routines for checking for triangularness etc. are absent. I'm responding to that. The reason that they don't have those FORTRAN routines for testing for structure inside of a generic dense matrix is that in FORTRAN it's more natural (and efficient) to just use the explicit packed structure and associated routines instead. You would only use a generic dense matrix if you know that there isn't structure in the matrix. So there are no routines for detecting that structure in generic dense matrices. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve
On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat wrote: > So every test in the polyalgorithm is cheaper than the next one. I'm not exactly sure what might be the best strategy yet hence the question. It's really interesting that LAPACK doesn't have this type of fast checks. In Fortran LAPACK, if you have a special structured matrix, you usually explicitly use packed storage and call the appropriate function type on it. It's only when you go to a system that only has a generic, unstructured dense matrix data type that it makes sense to do those kinds of checks. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] array comprehension
On Fri, Nov 4, 2016 at 6:36 AM, Neal Becker wrote: > > Francesc Alted wrote: > > > 2016-11-04 13:06 GMT+01:00 Neal Becker : > > > >> I find I often write: > >> np.array ([some list comprehension]) > >> > >> mainly because list comprehensions are just so sweet. > >> > >> But I imagine this isn't particularly efficient. > >> > > > > Right. Using a generator and np.fromiter() will avoid the creation of the > > intermediate list. Something like: > > > > np.fromiter((i for i in range(x))) # use xrange for Python 2 > > > > > Does this generalize to >1 dimensions? No. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] missing from contributor list?
Because Github (or maybe git) doesn't track the history of the file through all of the renames. It is only reporting the contributors of changes to the file at its current location. If you go back to the time just prior to the commit that renamed the file, you do show up in the list: https://github.com/numpy/numpy/blob/f179ec92d8ddb0dc5f7445255022be5c4765a704/numpy/build_utils/src/apple_sgemv_fix.c On Wed, Nov 2, 2016 at 3:38 PM, Sturla Molden wrote: > Why am I missing from the contributor hist here? > > https://github.com/numpy/numpy/blob/master/numpy/_ > build_utils/src/apple_sgemv_fix.c > > > Sturla > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Thu, Oct 27, 2016 at 10:45 AM, Todd wrote: > > On Thu, Oct 27, 2016 at 12:12 PM, Nathaniel Smith wrote: >> >> Ever notice how Anaconda doesn't provide pyfftw? They can't legally ship both MKL and pyfftw, and they picked MKL. > > Anaconda does ship GPL code [1]. They even ship GPL code that depends on numpy, such as cvxcanon and pystan, and there doesn't seem to be anything that prevents me from installing them alongside the MKL version of numpy. So I don't see how it would be any different for pyfftw. I think we've exhausted the relevance of this tangent to Oleksander's contributions. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser < warren.weckes...@gmail.com> wrote: > > On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith wrote: >> The patch also adds ~10,000 lines of code; here's an example of what >> some of it looks like: >> >> https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833 >> >> I don't see how we can realistically commit to maintaining this. > > FYI: numpy already maintains code exactly like that: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397 > > Perhaps the point should be that the numpy devs won't want to maintain two nearly identical versions of that code. Indeed. That's how the algorithm was published. The /* sigh ... */ is my own. ;-) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Wed, Oct 26, 2016 at 9:36 AM, Sebastian Berg wrote: > > On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote: > > On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor > mail.com> wrote: > > > > > > On 10/26/2016 06:00 PM, Julian Taylor wrote: > > > > >> I prefer for the full functionality of numpy to stay available > > with a > > >> stack of community owned software, even if it may be less powerful > > that > > >> way. > > > > > > But then if this is really just the same random numbers numpy > > already provides just faster, it is probably acceptable in principle. > > I haven't actually looked at the PR yet. > > > > I think the stream is different in some places, at least. And it's > > not a silent backend drop-in like np.linalg being built against an > > optimized BLAS, just a separate module that is inoperative without > > MKL. > > I might be swayed, but my gut feeling would be that a backend change > (if the default stream changes, an explicit one, though maybe one could > make a "fastest") would be the only reasonable way to provide such a > thing in numpy itself. That mostly argues for distributing it as a separate package, not part of numpy at all. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor < jtaylor.deb...@googlemail.com> wrote: > > On 10/26/2016 06:00 PM, Julian Taylor wrote: >> I prefer for the full functionality of numpy to stay available with a >> stack of community owned software, even if it may be less powerful that >> way. > > But then if this is really just the same random numbers numpy already provides just faster, it is probably acceptable in principle. I haven't actually looked at the PR yet. I think the stream is different in some places, at least. And it's not a silent backend drop-in like np.linalg being built against an optimized BLAS, just a separate module that is inoperative without MKL. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Tue, Oct 25, 2016 at 10:22 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > > On Tue, Oct 25, 2016 at 10:41 PM, Robert Kern wrote: >> >> On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: >> > >> > Hi All, >> > >> > There is a proposed random number package PR now up on github: https://github.com/numpy/numpy/pull/8209. It is from >> > oleksandr-pavlyk and implements the number random number package using MKL for increased speed. I think we are definitely interested in the improved speed, but I'm not sure numpy is the best place to put the package. I'd welcome any comments on the PR itself, as well as any thoughts on the best way organize or use of this work. Maybe scikit-random >> >> This is what ng-numpy-randomstate is for. >> >> https://github.com/bashtage/ng-numpy-randomstate > > Interesting, despite old fashioned original ziggurat implementation of the normal and gnu c style... Does that project seek to preserve all the bytestreams or is it still in flux? I would assume some flux for now, but you can ask the author by submitting a corrected ziggurat PR as a trial balloon. ;-) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Intel random number package
On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris wrote: > > Hi All, > > There is a proposed random number package PR now up on github: https://github.com/numpy/numpy/pull/8209. It is from > oleksandr-pavlyk and implements the number random number package using MKL for increased speed. I think we are definitely interested in the improved speed, but I'm not sure numpy is the best place to put the package. I'd welcome any comments on the PR itself, as well as any thoughts on the best way organize or use of this work. Maybe scikit-random This is what ng-numpy-randomstate is for. https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Preserving NumPy views when pickling
On Tue, Oct 25, 2016 at 7:05 PM, Feng Yu wrote: > > Hi, > > Just another perspective. base' and 'data' in PyArrayObject are two > separate variables. > > base can point to any PyObject, but it is `data` that defines where > data is accessed in memory. > > 1. There is no clear way to pickle a pointer (`data`) in a meaningful > way. In order for `data` member to make sense we still need to > 'readout' the values stored at `data` pointer in the pickle. > > 2. By definition base is not necessary a numpy array but it is just > some other object for managing the memory. In general, yes, but most often it's another ndarray, and the child is related to the parent by a slice operation that could be computed by comparing the `data` tuples. The exercise here isn't to always represent the general case in this way, but to see what can be done opportunistically and if that actually helps solve a practical problem. > 3. One can surely pickle the `base` object as a reference, but it is > useless if the data memory has been reconstructed independently during > unpickling. > > 4. Unless there is clear way to notify the referencing numpy array of > the new data pointer. There probably isn't. > > BTW, is the stride information is lost during pickling, too? The > behavior shall probably be documented if not yet. The stride information may be lost, yes. We reserve the right to retain it, though (for example, if .T is contiguous then we might well serialize the transposed data linearly and return a view on that data upon deserialization). I don't believe that we guarantee that the unpickled result is contiguous. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Preserving NumPy views when pickling
On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan < harrigan.matt...@gmail.com> wrote: > > It seems pickle keeps track of references for basic python types. > > x = [1] > y = [x] > x,y = pickle.loads(pickle.dumps((x,y))) > x.append(2) > print(y) > >>> [[1,2]] > > Numpy arrays are different but references are forgotten after pickle/unpickle. Shared objects do not remain shared. Based on the quote below it could be considered bug with numpy/pickle. Not a bug, but an explicit design decision on numpy's part. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Preserving NumPy views when pickling
On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer wrote: > > On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith wrote: >> >> Concretely, what do would you suggest should happen with: >> >> base = np.zeros(1) >> view = base[:10] >> >> # case 1 >> pickle.dump(view, file) >> >> # case 2 >> pickle.dump(base, file) >> pickle.dump(view, file) >> >> # case 3 >> pickle.dump(view, file) >> pickle.dump(base, file) >> >> ? > > I see what you're getting at here. We would need a rule for when to include the base in the pickle and when not to. Otherwise, pickle.dump(view, file) always contains data from the base pickle, even with view is much smaller than base. > > The safe answer is "only use views in the pickle when base is already being pickled", but that isn't possible to check unless all the arrays are together in a custom container. So, this isn't really feasible for NumPy. It would be possible with a custom Pickler/Unpickler since they already keep track of objects previously (un)pickled. That would handle [base, view] okay but not [view, base], so it's probably not going to be all that useful outside of special situations. It would make a neat recipe, but I probably would not provide it in numpy itself. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using library-specific headers
On Thu, Sep 29, 2016 at 6:27 PM, Pavlyk, Oleksandr < oleksandr.pav...@intel.com> wrote: > Related to numpy.random_inter, I noticed that the randomstate package, which extends numpy.random was > not being made a part of numpy, but rather published on PyPI as a stand-alone module. Does that mean that > the community decided against including it in numpy's codebase? If so, I would appreciate if someone could > elaborate on or point me to the reasoning behind that decision. No, we are just working out the API and the extensibility machinery in a separate package before committing to backwards compatibility. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)
On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg wrote: > > On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote: > > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > > > > > > Actually, on those names: an alternative to your proposal would be > > > to > > > introduce only one new method which can do all types of indexing, > > > depending on a keyword argument, i.e., something like > > > ``` > > > def getitem(self, item, mode='outer'): > > > ... > > > ``` > > Have I been overthinking this, eh? Just making it `__getitem__(self, > > index, mode=...)` and then from `vindex` calling the subclasses > > `__getitem__(self, index, mode="vector")` or so would already solve > > the > > issue almost fully? Only thing I am not quite sure about: > > > > 1. Is `__getitem__` in some way special to make this difficult (also > > considering some new ideas like allowing object[a=4]? > > OK; I think the C-side slot cannot get the kwarg likely, but probably > you can find a solution for that Well, the solution is to use a different name, I think. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading in a mesh file
On Thu, Sep 1, 2016 at 3:49 PM, Florian Lindner wrote: > > Hello, > > thanks for your reply which was really helpful! > > My problem is that I discovered that the data I got is rather unordered. > > The documentation for reshape says: Read the elements of a using this index order, and place the elements into the > reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last > axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to read / write the elements using > Fortran-like index order, with the first index changing fastest, and the last index changing slowest. > > With my data both dimensions change, so there is no specific ordering of the points, just a bunch of arbitrarily mixed > "x y z value" data. > > My idea is: > > out = np.loadtxt(...) > x = np.unique(out[:,0]) > y = np.unique[out]:,1]) > xx, yy = np.meshgrid(x, y) > > values = lookup(xx, yy, out) > > lookup is ufunc (I hope that term is correct here) that looks up the value of every x and y in out, like > x_filtered = out[ out[:,0] == x, :] > y_filtered = out[ out[:,1] == y, :] > return y_filtered[2] > > (untested, just a sketch) > > Would this work? Any better way? If the (x, y) values are actually drawn from a rectilinear grid, then you can use np.lexsort() to sort the rows before reshaping. [~/scratch] |4> !cat random-mesh.txt 0.3 0.3 21 0 0 10 0 0.3 11 0.3 0.6 22 0 0.6 12 0.6 0.3 31 0.3 0 20 0.6 0.6 32 0.6 0 30 [~/scratch] |5> scrambled_nodes = np.loadtxt('random-mesh.txt') # Note! Put the "faster" column before the "slower" column! [~/scratch] |6> i = np.lexsort([scrambled_nodes[:, 1], scrambled_nodes[:, 0]]) [~/scratch] |7> sorted_nodes = scrambled_nodes[i] [~/scratch] |8> sorted_nodes array([[ 0. , 0. , 10. ], [ 0. , 0.3, 11. ], [ 0. , 0.6, 12. ], [ 0.3, 0. , 20. ], [ 0.3, 0.3, 21. ], [ 0.3, 0.6, 22. ], [ 0.6, 0. , 30. ], [ 0.6, 0.3, 31. ], [ 0.6, 0.6, 32. ]]) Then carry on with the reshape()ing as before. If the grid points that "ought to be the same" are not actually identical, then you may end up with some problems, e.g. if you had "0.3001 0.0 20.0" as a row, but all of the other "x=0.3" rows had "0.3", then that row would get sorted out of order. You would have to clean up the grid coordinates a bit first. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reading in a mesh file
On Wed, Aug 31, 2016 at 4:00 PM, Florian Lindner wrote: > > Hello, > > I have mesh (more exactly: just a bunch of nodes) description with values associated to the nodes in a file, e.g. for a > 3x3 mesh: > > 0 0 10 > 0 0.3 11 > 0 0.6 12 > 0.3 0 20 > 0.3 0.3 21 > 0.3 0.6 22 > 0.6 0 30 > 0.6 0.3 31 > 0.6 0.6 32 > > What is best way to read it in and get data structures like the ones I get from np.meshgrid? > > Of course, I know about np.loadtxt, but I'm having trouble getting the resulting arrays (x, y, values) in the right form > and to retain association to the values. For this particular case (known shape and ordering), this is what I would do. Maybe throw in a .T or three depending on exactly how you want them to be laid out. [~/scratch] |1> !cat mesh.txt 0 0 10 0 0.3 11 0 0.6 12 0.3 0 20 0.3 0.3 21 0.3 0.6 22 0.6 0 30 0.6 0.3 31 0.6 0.6 32 [~/scratch] |2> nodes = np.loadtxt('mesh.txt') [~/scratch] |3> nodes array([[ 0. , 0. , 10. ], [ 0. , 0.3, 11. ], [ 0. , 0.6, 12. ], [ 0.3, 0. , 20. ], [ 0.3, 0.3, 21. ], [ 0.3, 0.6, 22. ], [ 0.6, 0. , 30. ], [ 0.6, 0.3, 31. ], [ 0.6, 0.6, 32. ]]) [~/scratch] |4> reshaped = nodes.reshape((3, 3, -1)) [~/scratch] |5> reshaped array([[[ 0. , 0. , 10. ], [ 0. , 0.3, 11. ], [ 0. , 0.6, 12. ]], [[ 0.3, 0. , 20. ], [ 0.3, 0.3, 21. ], [ 0.3, 0.6, 22. ]], [[ 0.6, 0. , 30. ], [ 0.6, 0.3, 31. ], [ 0.6, 0.6, 32. ]]]) [~/scratch] |7> x = reshaped[..., 0] [~/scratch] |8> y = reshaped[..., 1] [~/scratch] |9> values = reshaped[..., 2] [~/scratch] |10> x array([[ 0. , 0. , 0. ], [ 0.3, 0.3, 0.3], [ 0.6, 0.6, 0.6]]) [~/scratch] |11> y array([[ 0. , 0.3, 0.6], [ 0. , 0.3, 0.6], [ 0. , 0.3, 0.6]]) [~/scratch] |12> values array([[ 10., 11., 12.], [ 20., 21., 22.], [ 30., 31., 32.]]) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Include last element when subindexing numpy arrays?
On Wed, Aug 31, 2016 at 1:34 PM, Matti Viljamaa wrote: > > On 31 Aug 2016, at 15:22, Robert Kern wrote: > > On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa wrote: > > > > Is there a clean way to include the last element when subindexing numpy arrays? > > Since the default behaviour of numpy arrays is to omit the “stop index”. > > > > So for, > > > > >>> A > > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > >>> A[0:5] > > array([0, 1, 2, 3, 4]) > > A[5:] > > -- > Robert Kern > > No that returns the subarray starting from index 5 to the end. > > What I want to be able to return > > array([0, 1, 2, 3, 4, 5]) > > (i.e. last element 5 included) > > but without the funky A[0:6] syntax, which looks like it should return > > array([0, 1, 2, 3, 4, 5, 6]) > > but since bumpy arrays omit the last index, returns > > array([0, 1, 2, 3, 4, 5]) > > which syntactically would be more reasonable to be A[0:5]. Ah, I see what you are asking now. The answer is "no"; this is just the way that slicing works in Python in general. numpy merely follows suit. It is something that you will get used to with practice. My sense of "funkiness" and "reasonableness" is the opposite of yours, for instance. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python
On Wed, Aug 31, 2016 at 12:28 PM, Michael Bieri wrote: > > Hi all > > There are several ways on how to use C/C++ code from Python with NumPy, as given in http://docs.scipy.org/doc/numpy/user/c-info.html . Furthermore, there's at least pybind11. > > I'm not quite sure which approach is state-of-the-art as of 2016. How would you do it if you had to make a C/C++ library available in Python right now? > > In my case, I have a C library with some scientific functions on matrices and vectors. You will typically call a few functions to configure the computation, then hand over some pointers to existing buffers containing vector data, then start the computation, and finally read back the data. The library also can use MPI to parallelize. I usually reach for Cython: http://cython.org/ http://docs.cython.org/en/latest/src/userguide/memoryviews.html -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why np.fft.rfftfreq only returns up to Nyqvist?
On Wed, Aug 31, 2016 at 1:14 PM, Matti Viljamaa wrote: > > What’s the reasonability of np.fft.rfftfreq returning frequencies only up to Nyquist, rather than for the full sample rate? The answer to the question that you asked is that np.fft.rfft() only computes values for frequencies only up to Nyquist, so np.fft.rfftfreq() must give you the frequencies to match. I'm not sure if there is another misunderstanding lurking that needs to be clarified. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Include last element when subindexing numpy arrays?
On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa wrote: > > Is there a clean way to include the last element when subindexing numpy arrays? > Since the default behaviour of numpy arrays is to omit the “stop index”. > > So for, > > >>> A > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > >>> A[0:5] > array([0, 1, 2, 3, 4]) A[5:] -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] coordinate bounds
On Sat, Aug 20, 2016 at 9:16 PM, Alan Isaac wrote: > > Is there a numpy equivalent to Mma's CoordinateBounds command? > http://reference.wolfram.com/language/ref/CoordinateBounds.html The first signature can be computed like so: np.transpose([coords.min(axis=0), coords.max(axis=0)]) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy set_printoptions, silent failure, bug?
On Tue, Jul 19, 2016 at 10:41 PM, John Ladasky wrote: > Should this be considered a Numpy bug, or is there some reason that set_printoptions would legitimately need to accept a dictionary as a single argument? There is no such reason. One could certainly add more validation to the arguments to np.set_printoptions(). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Design feedback solicitation
On Fri, Jul 15, 2016 at 2:53 AM, Pavlyk, Oleksandr < oleksandr.pav...@intel.com> wrote: > > Hi Robert, > > Thank you for the pointers. > > I think numpy.random should have a mechanism to choose between methods for generating the underlying randomness dynamically, at a run-time, as well as an extensible framework, where developers could add more methods. The default would be MT19937 for backwards compatibility. It is important to be able to do this at a run-time, as it would allow one to use different algorithms in different threads (like different members of the parallel Mersenne twister family of generators, see MT2203). > > The framework should allow to define randomness as a bit stream, a stream of fixed size integers, or a stream of uniform reals (32 or 64 bits). This is a lot of like MKL’s abstract method for basic pseudo-random number generation. > > Each method should provide routines to sample from uniform distributions over reals (in floats and doubles), as well as over integers. > > All remaining non-uniform distributions build on top of these uniform streams. ng-numpy-randomstate does all of these. > I think it is pretty important to refactor numpy.random to allow the underlying generators to produce a given number of independent variates at a time. There could be convenience wrapper functions to allow to get one variate for backwards compatibility, but this change in design would allow for better efficiency, as sampling a vector of random variates at once is often faster than repeated sampling of one at a time due to set-up cost, vectorization, etc. The underlying C implementation is an implementation detail, so the refactoring that you suggest has no backwards compatibility constraints. > Finally, methods to sample particular distribution should uniformly support method keyword argument. Because method names vary from distribution to distribution, it should ideally be programmatically discoverable which methods are supported for a given distribution. For instance, the standard normal distribution could support method=’Inversion’, method=’Box-Muller’, method=’Ziggurat’, method=’Box-Muller-Marsaglia’ (the one used in numpy.random right now), as well as bunch of non-named methods based on transformed rejection method (see http://statistik.wu-wien.ac.at/anuran/ ) That is one of the items under discussion. I personally prefer that one simply exposes named methods for each different scheme (e.g. ziggurat_normal(), etc.). > It would also be good if one could dynamically register a new method to sample from a non-uniform distribution. This would allow, for instance, to automatically add methods to sample certain non-uniform distribution by directly calling into MKL (or other library), when available, instead of building them from uniforms (which may remain a fall-through method). > > The linked project is a good start, but the choice of the underlying algorithm needs to be made at a run-time, That's what happens. You instantiate the RandomState class that you want. > as far as I understood, and the only provided interface to query random variates is one at a time, just like it is currently the case > in numpy.random. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Design feedback solicitation
On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr < oleksandr.pav...@intel.com> wrote: > > Hi, > > I am new to this list, so I will start with an introduction. My name is Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for Python, and previously worked at Wolfram Research for 12 years. My latest project was to write a mirror to numpy.random, named numpy.random_intel. The module uses MKL to sample from different distributions for efficiency. It provides support for different underlying algorithms for basic pseudo-random number generation, i.e. in addition to MT19937, it also provides SFMT19937, MT2203, etc. > > I recently published a blog about it: > > https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python > > I originally attempted to simply replace numpy.random in the Intel Distribution for Python with the new module, but due to fixed seed backwards incompatibility this results in numerous test failures in numpy, scipy, pandas and other modules. > > Unlike numpy.random, the new module generates a vector of random numbers at a time, which can be done faster than repeatedly generating the same number of variates one at a time. > > The source code for the new module is not upstreamed yet, and this email is meant to solicit early community feedback to allow for faster acceptance of the proposed changes. Cool! You can find pertinent discussion here: https://github.com/numpy/numpy/issues/6967 And the current effort for adding new core PRNGs here: https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Indexing with floats
On Fri, Jun 10, 2016 at 12:15 PM, Fabien wrote: > > Hi, > > I really tried to do my homework before asking this here, but I just couldn't find the relevant information anywhere... > > My question is about the rationale behind forbidding indexing with floats, i.e.: > > >>> x[2.] > __main__:1: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future > > I don't find this very handy from a user's perspective, and I'd be grateful for pointers on discussion threads and/or PRs where this has been discussed, so that I can understand why it's important. https://mail.scipy.org/pipermail/numpy-discussion/2012-December/064705.html https://github.com/numpy/numpy/issues/2810 https://github.com/numpy/numpy/pull/2891 https://github.com/numpy/numpy/pull/3243 https://mail.scipy.org/pipermail/numpy-discussion/2015-July/073125.html Note that the future is coming in the next numpy release: https://github.com/numpy/numpy/pull/6271 -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Mon, May 23, 2016 at 5:41 PM, Chris Barker wrote: > > On Sun, May 22, 2016 at 2:35 AM, Robert Kern wrote: >> >> Well, I mean, engineers want lots of things. I suspect that most engineers *really* just want to call `numpy.random.seed(8675309)` at the start and never explicitly pass around separate streams. There's an upside to that in terms of code simplicity. There are also significant limitations and constraints. Ultimately, the upside against the alternative of passing around RandomState objects is usually overweighed by the limitations, so best practice is to pass around RandomState objects. > > Could we do something like the logging module, and have numpy.random "manage" a bunch of stream objects for you -- so you could get the default single stream easily, and also get access to specific streams without needing to pass around the objects? No, I don't think so. The logging module's namespacing doesn't really have an equivalent use case for PRNGs. We would just be making a much more complicated global state to manage. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 7:56 PM, Nathaniel Smith wrote: > > On Wed, May 18, 2016 at 5:07 AM, Robert Kern wrote: > > On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: > >> ...anyway, the real reason I'm a bit grumpy is because there are solid > >> engineering reasons why users *want* this API, > > > > I remain unconvinced on this mark. Grumpily. > > Sorry for getting grumpy :-). And my apologies for some unwarranted hyperbole. I think we're both converging on a reasonable approach, though. > The engineering reasons seem pretty > obvious to me though? Well, I mean, engineers want lots of things. I suspect that most engineers *really* just want to call `numpy.random.seed(8675309)` at the start and never explicitly pass around separate streams. There's an upside to that in terms of code simplicity. There are also significant limitations and constraints. Ultimately, the upside against the alternative of passing around RandomState objects is usually overweighed by the limitations, so best practice is to pass around RandomState objects. I acknowledge that there exists an upside to the splitting API, but I don't think it's a groundbreaking improvement over the alternative current best practice. It's also unclear to me how often situations that really demonstrate the upside come into play; in my experience a lot of these situations are already structured such that preallocating N streams is the natural thing to do. The limitations and constraints are currently underexplored, IMO; and in this conservative field, pessimism is warranted. > If you have any use case for independent streams > at all, and you're writing code that's intended to live inside a > library's abstraction barrier, then you need some way to choose your > streams to avoid colliding with arbitrary other code that the end-user > might assemble alongside yours as part of their final program. So > AFAICT you have two options: either you need a "tree-style" API for > allocating these streams, or else you need to add some explicit API to > your library that lets the end-user control in detail which streams > you use. Both are possible, but the latter is obviously undesireable > if you can avoid it, since it breaks the abstraction barrier, making > your library more complicated to use and harder to evolve. ACK > >> so whether or not it > >> turns out to be possible I think we should at least be allowed to have > >> a discussion about whether there's some way to give it to them. > > > > I'm not shutting down discussion of the option. I *implemented* the option. > > I think that discussing whether it should be part of the main API is > > premature. There probably ought to be a paper or three out there supporting > > its safety and utility first. Let the utility function version flourish > > first. > > OK -- I guess this particularly makes sense given how > extra-tightly-constrained we currently are in fixing mistakes in > np.random. But I feel like in the end the right place for this really > is inside the RandomState interface, because the person implementing > RandomState is the one best placed to understand (a) the gnarly > technical details here, and (b) how those change depending on the > particular PRNG in use. I don't want to end up with a bunch of > subtly-buggy utility functions in non-specialist libraries like dask > -- so we should be trying to help downstream users figure out how to > actually get this into np.random? I think this is an open research area. An enterprising grad student could milk this for a couple of papers analyzing how to do this safely for a variety of PRNGs. I don't think we can hash this out in an email thread or PR. So yeah, eventually there might be an API on RandomState for this, but it's way too premature to do so right now, IMO. Maybe start with a specialized subclass of RandomState that adds this experimental API. In ng-numpy-randomstate. ;-) But if someone has spare time to work on numpy.random, for God's sake, use it to review @gfyoung's PRs instead. > >> It's > >> not even 100% out of the question that we conclude that existing PRNGs > >> are buggy because they don't take this use case into account -- it > >> would be far from the first time that numpy found itself going beyond > >> the limits of older numerical tools that weren't designed to build the > >> kind of large composable systems that numpy gets used for. > >> > >> MT19937's state space is large enough that you could explicitly encode > >> a "tree seed" into it, even if you don't trust the laws of probability > >> -- e.g., you start with a Rando
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 6:20 PM, wrote: > > On Wed, May 18, 2016 at 12:01 PM, Robert Kern wrote: >> >> On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> >> > engineering reasons why users *want* this API, >> > >> > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. >> > >> > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. >> > >> > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. >> >> Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. > > I'm similar to Chris, and didn't try to figure out the details of what you are talking about. > > However, if there are functions getting into numpy that help in using a best practice even if it's not bullet proof, then it's still better than home made approaches. > If it get's in soon, then we can use it in a few years (given dependency lag). At that point there should be more distributed, nested simulation based algorithms where we don't know in advance how far we have to go to get reliable numbers or convergence. > > (But I don't see anything like that right now.) Current best practice is to use PRNGs with settable streams (or fixed jumpahead for those PRNGs cursed to not have settable streams but blessed to have super-long periods). The way to get those into numpy is to help Kevin Sheppard finish: https://github.com/bashtage/ng-numpy-randomstate He's done nearly all of the hard work already. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> > engineering reasons why users *want* this API, > > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. > > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. > > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: > > On Tue, May 17, 2016 at 10:41 AM, Robert Kern wrote: > > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: > >> > >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: > >> > > >> [...] > >> > What you want is a function that returns many RandomState objects that > >> > are hopefully spread around the MT19937 space enough that they are > >> > essentially independent (in the absence of true jumpahead). The better > >> > implementation of such a function would look something like this: > >> > > >> > def spread_out_prngs(n, root_prng=None): > >> > if root_prng is None: > >> > root_prng = np.random > >> > elif not isinstance(root_prng, np.random.RandomState): > >> > root_prng = np.random.RandomState(root_prng) > >> > sprouted_prngs = [] > >> > for i in range(n): > >> > seed_array = root_prng.randint(1<<32, size=624) # > >> > dtype=np.uint32 under 1.11 > >> > sprouted_prngs.append(np.random.RandomState(seed_array)) > >> > return spourted_prngs > >> > >> Maybe a nice way to encapsulate this in the RandomState interface would be > >> a method RandomState.random_state() that generates and returns a new child > >> RandomState. > > > > I disagree. This is a workaround in the absence of proper jumpahead or > > guaranteed-independent streams. I would not encourage it. > > > >> > Internally, this generates seed arrays of about the size of the MT19937 > >> > state so make sure that you can access more of the state space. That will at > >> > least make the chance of collision tiny. And it can be easily rewritten to > >> > take advantage of one of the newer PRNGs that have true independent streams: > >> > > >> > https://github.com/bashtage/ng-numpy-randomstate > >> > >> ... But unfortunately I'm not sure how to make my interface suggestion > >> above work on top of one of these RNGs, because for RandomState.random_state > >> you really want a tree of independent RNGs and the fancy new PRNGs only > >> provide a single flat namespace :-/. And even more annoyingly, the tree API > >> is actually a nicer API, because with a flat namespace you have to know up > >> front about all possible RNGs your code will use, which is an unfortunate > >> global coupling that makes it difficult to compose programs out of > >> independent pieces, while the RandomState.random_state approach composes > >> beautifully. Maybe there's some clever way to allocate a 64-bit namespace to > >> make it look tree-like? I'm not sure 64 bits is really enough... > > > > MT19937 doesn't have a "tree" any more than the others. It's the same flat > > state space. You are just getting the illusion of a tree by hoping that you > > never collide. You ought to think about precisely the same global coupling > > issues with MT19937 as you do with guaranteed-independent streams. > > Hope-and-prayer isn't really a substitute for properly engineering your > > problem. It's just a moral hazard to promote this method to the main API. > > Nonsense. > > If your definition of "hope and prayer" includes assuming that we > won't encounter a random collision in a 2**19937 state space, then > literally all engineering is hope-and-prayer. A collision could > happen, but if it does it's overwhelmingly more likely to happen > because of a flaw in the mathematical analysis, or a bug in the > implementation, or because random quantum fluctuations caused you and > your program to suddenly be transported to a parallel world where 1 + > 1 = 1, than that you just got unlucky with your random state. And all > of these hazards apply equally to both MT19937 and more modern PRNGs. Granted. > ...anyway, the real reason I'm a bit grumpy is because there are solid > engineering reasons why users *want* this API, I remain unconvinced on this mark. Grumpily. > so whether or not it > turns out to be possible I think we should at least be allowed to have > a discussion about whether there's some way to give it to them. I'm not shutting down discussion of the option. I *implemented* the option. I think that discussing whether it should be part of the main API is premature. There probably ought to be a paper or three out there supporting its safety and utility first. Let the utility function version flourish first. > It
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: > > On May 17, 2016 1:50 AM, "Robert Kern" wrote: > > > [...] > > What you want is a function that returns many RandomState objects that are hopefully spread around the MT19937 space enough that they are essentially independent (in the absence of true jumpahead). The better implementation of such a function would look something like this: > > > > def spread_out_prngs(n, root_prng=None): > > if root_prng is None: > > root_prng = np.random > > elif not isinstance(root_prng, np.random.RandomState): > > root_prng = np.random.RandomState(root_prng) > > sprouted_prngs = [] > > for i in range(n): > > seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 under 1.11 > > sprouted_prngs.append(np.random.RandomState(seed_array)) > > return spourted_prngs > > Maybe a nice way to encapsulate this in the RandomState interface would be a method RandomState.random_state() that generates and returns a new child RandomState. I disagree. This is a workaround in the absence of proper jumpahead or guaranteed-independent streams. I would not encourage it. > > Internally, this generates seed arrays of about the size of the MT19937 state so make sure that you can access more of the state space. That will at least make the chance of collision tiny. And it can be easily rewritten to take advantage of one of the newer PRNGs that have true independent streams: > > > > https://github.com/bashtage/ng-numpy-randomstate > > ... But unfortunately I'm not sure how to make my interface suggestion above work on top of one of these RNGs, because for RandomState.random_state you really want a tree of independent RNGs and the fancy new PRNGs only provide a single flat namespace :-/. And even more annoyingly, the tree API is actually a nicer API, because with a flat namespace you have to know up front about all possible RNGs your code will use, which is an unfortunate global coupling that makes it difficult to compose programs out of independent pieces, while the RandomState.random_state approach composes beautifully. Maybe there's some clever way to allocate a 64-bit namespace to make it look tree-like? I'm not sure 64 bits is really enough... MT19937 doesn't have a "tree" any more than the others. It's the same flat state space. You are just getting the illusion of a tree by hoping that you never collide. You ought to think about precisely the same global coupling issues with MT19937 as you do with guaranteed-independent streams. Hope-and-prayer isn't really a substitute for properly engineering your problem. It's just a moral hazard to promote this method to the main API. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Tue, May 17, 2016 at 2:40 PM, Sturla Molden wrote: > > Stephan Hoyer wrote: > > I have recently encountered several use cases for randomly generate random > > number seeds: > > > > 1. When writing a library of stochastic functions that take a seed as an > > input argument, and some of these functions call multiple other such > > stochastic functions. Dask is one such example [1]. > > > > 2. When a library needs to produce results that are reproducible after > > calling numpy.random.seed, but that do not want to use the functions in > > numpy.random directly. This came up recently in a pandas pull request [2], > > because we want to allow using RandomState objects as an alternative to > > global state in numpy.random. A major advantage of this approach is that it > > provides an obvious alternative to reusing the private numpy.random._mtrand > > [3]. > > What about making numpy.random a finite state machine, and keeping a stack > of RandomState seeds? That is, something similar to what OpenGL does for > its matrices? Then we get two functions, numpy.random.push_seed and > numpy.random.pop_seed. I don't think that addresses the issues brought up here. It's just more global state to worry about. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer wrote: > > On Tue, May 17, 2016 at 12:18 AM, Robert Kern wrote: >> >> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer wrote: >> > 1. When writing a library of stochastic functions that take a seed as an input argument, and some of these functions call multiple other such stochastic functions. Dask is one such example [1]. >> >> Can you clarify the use case here? I don't really know what you are doing here, but I'm pretty sure this is not the right approach. > > Here's a contrived example. Suppose I've written a simulator for cars that consists of a number of loosely connected components (e.g., an engine, brakes, etc.). The behavior of each component of our simulator is stochastic, but we want everything to be fully reproducible, so we need to use seeds or RandomState objects. > > We might write our simulate_car function like the following: > > def simulate_car(engine_config, brakes_config, seed=None): > rs = np.random.RandomState(seed) > engine = simulate_engine(engine_config, seed=rs.random_seed()) > brakes = simulate_brakes(brakes_config, seed=rs.random_seed()) > ... > > The problem with passing the same RandomState object (either explicitly or dropping the seed argument entirely and using the global state) to both simulate_engine and simulate_breaks is that it breaks encapsulation -- if I change what I do inside simulate_engine, it also effects the brakes. That's a little too contrived, IMO. In most such simulations, the different components interact with each other in the normal course of the simulation; that's why they are both joined together in the same simulation instead of being two separate runs. Unless if the components are being run across a process or thread boundary (a la dask below) where true nondeterminism comes into play, then I don't think you want these semi-independent streams. This seems to be the advice du jour from the agent-based modeling community. > The dask use case is actually pretty different -- the intent is to create many random numbers in parallel using multiple threads or processes (possibly in a distributed fashion). I know that skipping ahead is the standard way to get independent number streams for parallel sampling, but that isn't exposed in numpy.random, and setting distinct seeds seems like a reasonable alternative for scientific computing use cases. Forget about integer seeds. Those are for human convenience. If you're not jotting them down in your lab notebook in pen, you don't want an integer seed. What you want is a function that returns many RandomState objects that are hopefully spread around the MT19937 space enough that they are essentially independent (in the absence of true jumpahead). The better implementation of such a function would look something like this: def spread_out_prngs(n, root_prng=None): if root_prng is None: root_prng = np.random elif not isinstance(root_prng, np.random.RandomState): root_prng = np.random.RandomState(root_prng) sprouted_prngs = [] for i in range(n): seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 under 1.11 sprouted_prngs.append(np.random.RandomState(seed_array)) return spourted_prngs Internally, this generates seed arrays of about the size of the MT19937 state so make sure that you can access more of the state space. That will at least make the chance of collision tiny. And it can be easily rewritten to take advantage of one of the newer PRNGs that have true independent streams: https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: numpy.random.random_seed
On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer wrote: > > I have recently encountered several use cases for randomly generate random number seeds: > > 1. When writing a library of stochastic functions that take a seed as an input argument, and some of these functions call multiple other such stochastic functions. Dask is one such example [1]. Can you clarify the use case here? I don't really know what you are doing here, but I'm pretty sure this is not the right approach. > 2. When a library needs to produce results that are reproducible after calling numpy.random.seed, but that do not want to use the functions in numpy.random directly. This came up recently in a pandas pull request [2], because we want to allow using RandomState objects as an alternative to global state in numpy.random. A major advantage of this approach is that it provides an obvious alternative to reusing the private numpy.random._mtrand [3]. It's only pseudo-private. This is an authorized use of it. However, for this case, I usually just pass around the the numpy.random module itself and let duck-typing take care of the rest. > [3] On a side note, if there's no longer a good reason to keep this object private, perhaps we should expose it in our public API. It would certainly be useful -- scikit-learn is already using it (see links in the pandas PR above). Adding a public get_global_random_state() function might be in order. Originally, I wanted there to be *some* barrier to entry, but just grabbing it to use as a default RandomState object is definitely an intended use of it. It's not going to disappear. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Floor divison on int returns float
On Wed, Apr 13, 2016 at 3:17 AM, Antony Lee wrote: > > This kind of issue (see also https://github.com/numpy/numpy/issues/3511) has become more annoying now that indexing requires integers (indexing with a float raises a VisibleDeprecationWarning). The argument "dividing an uint by an int may give a result that does not fit in an uint nor in an int" does not sound very convincing to me, It shouldn't because that's not the rule that numpy follows. The range of the result is never considered. Both *inputs* are cast to the same type that can represent the full range of either input type (for that matter, the actual *values* of the inputs are also never considered). In the case of uint64 and int64, there is no really good common type (the integer hierarchy has to top out somewhere), but float64 merely loses resolution rather than cutting off half of the range of uint64. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code
On Wed, Apr 6, 2016 at 4:17 PM, Neal Becker wrote: > I prefer to use a single instance of a RandomState so that there are > guarantees about the independence of streams generated from python random > functions, and from my c++ code. True, there are simpler approaches - but > I'm a purist. Consider using PRNGs that actually expose truly independent streams instead of a single shared stream: https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code
On Wed, Apr 6, 2016 at 2:18 PM, Neal Becker wrote: > > I have C++ code that tries to share the mtrand state. It unfortunately > depends on the layout of RandomState which used to be: > > struct __pyx_obj_6mtrand_RandomState { > PyObject_HEAD > rk_state *internal_state; > PyObject *lock; > }; > > But with 1.11 it's: > struct __pyx_obj_6mtrand_RandomState { > PyObject_HEAD > struct __pyx_vtabstruct_6mtrand_RandomState *__pyx_vtab; > rk_state *internal_state; > PyObject *lock; > PyObject *state_address; > }; > > So > 1. Why the change? > 2. How can I write portable code? There is no C API to RandomState at this time, stable, portable or otherwise. It's all private implementation detail. If you would like a stable and portable C API for RandomState, you will need to contribute one using PyCapsules to expose the underlying rk_state* pointer. https://docs.python.org/2.7/c-api/capsule.html -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Fri, Feb 19, 2016 at 12:10 PM, Andrew Nelson wrote: > > With respect to geomspace proposals: instead of specifying start and end values and the number of points I'd like to have an option where I can set the start and end points and the ratio. The function would then work out the correct number of points to get closest to the end value. > > E.g. geomspace(start=1, finish=2, ratio=1.03) > > The first entries would be 1.0, 1.03, 1*1.03**2, etc. > > I have a requirement for the correct ratio between the points, and it's a right bind having to calculate the exact number of points needed. At the risk of extending the twisty little maze of names, all alike, I would probably call a function with this signature geomrange() instead. It is more akin to arange(start, stop, step) than linspace(start, stop, num_steps). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Thu, Feb 18, 2016 at 10:19 PM, Alan Isaac wrote: > > On 2/18/2016 2:44 PM, Robert Kern wrote: >> >> In a new function not named `linspace()`, I think that might be fine. I do occasionally want to swap between linear and logarithmic/geometric spacing based on a parameter, so this >> doesn't violate the van Rossum Rule of Function Signatures. > > Would such a new function correct the apparent mistake (?) of > `linspace` including the endpoint by default? > Or is the current API justified by its Matlab origins? > (Or have I missed the point altogether?) The last, I'm afraid. Different use cases, different conventions. Integer ranges are half-open because that is the most useful convention in a 0-indexed ecosystem. Floating point ranges don't interface with indexing, and the closed intervals are the most useful (or at least the most common). > If this query is annoying, please ignore it. It is not meant to be. The same for my answer. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] proposal: new logspace without the log in the argument
On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith wrote: > > Some questions it'd be good to get feedback on: > > - any better ideas for naming it than "geomspace"? It's really too bad > that the 'logspace' name is already taken. geomspace() is a perfectly cromulent name, IMO. > - I guess the alternative interface might be something like > > np.linspace(start, stop, steps, spacing="log") > > what do people think? In a new function not named `linspace()`, I think that might be fine. I do occasionally want to swap between linear and logarithmic/geometric spacing based on a parameter, so this doesn't violate the van Rossum Rule of Function Signatures. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] making "low" optional in numpy.randint
He was talking consistently about "random integers" not "random_integers()". :-) On Wednesday, 17 February 2016, G Young wrote: > Your statement is a little self-contradictory, but in any case, you > shouldn't worry about random_integers getting removed from the code-base. > However, it has been deprecated in favor of randint. > > On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias > wrote: > >> Also fwiw, I think the 0-based, half-open interval is one of the best >> features of Python indexing and yes, I do use random integers to index into >> my arrays and would not appreciate having to litter my code with "-1" >> everywhere. >> >> On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac > > wrote: >> >>> On 2/17/2016 3:42 PM, Robert Kern wrote: >>> >>>> random.randint() was the one big exception, and it was considered a >>>> mistake for that very reason, soft-deprecated in favor of >>>> random.randrange(). >>>> >>> >>> >>> randrange also has its detractors: >>> https://code.activestate.com/lists/python-dev/138358/ >>> and following. >>> >>> I think if we start citing persistant conventions, the >>> persistent convention across *many* languages that the bounds >>> provided for a random integer range are inclusive also counts for >>> something, especially when the names are essentially shared. >>> >>> But again, I am just trying to be clear about what is at issue, >>> not push for a change. I think citing non-existent standards >>> is not helpful. I think the discrepancy between the Python >>> standard library and numpy for a function going by a common >>> name is harmful. (But then, I teach.) >>> >>> fwiw, >>> >>> Alan >>> >>> >>> _______ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] making "low" optional in numpy.randint
On Wed, Feb 17, 2016 at 8:43 PM, G Young wrote: > Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy. I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months-ago-me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] making "low" optional in numpy.randint
On Wed, Feb 17, 2016 at 8:30 PM, Alan Isaac wrote: > > On 2/17/2016 12:28 PM, G Young wrote: >> >> Perhaps, but we are not coding in Haskell. We are coding in Python, and >> the standard is that the endpoint is excluded, which renders your point >> moot I'm afraid. > > I am not sure what "standard" you are talking about. > I thought we were talking about the user interface. It is a persistent and consistent convention (i.e. "standard") across Python APIs that deal with integer ranges (range(), slice(), random.randrange(), ...), particularly those that end up related to indexing; e.g. `x[np.random.randint(0, len(x))]` to pull a random sample from an array. random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange(). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] making "low" optional in numpy.randint
On Wed, Feb 17, 2016 at 4:40 PM, Alan Isaac wrote: > > Behavior of random integer generation: > Python randint[a,b] > MATLAB randi [a,b] > Mma RandomInteger [a,b] > haskell randomR [a,b] > GAUSS rndi[a,b] > Maple rand[a,b] > > In short, NumPy's `randint` is non-standard (and, > I would add, non-intuitive). Presumably was due > due to relying on a float draw from [0,1) along > with the use of floor. No, never was. It is implemented so because Python uses semi-open integer intervals by preference because it plays most nicely with 0-based indexing. Not sure about all of those systems, but some at least are 1-based indexing, so closed intervals do make sense. The Python stdlib's random.randint() closed interval is considered a mistake by python-dev leading to the implementation and preference for random.randrange() instead. > The divergence in behavior between the (later) Python > function of the same name is particularly unfortunate. Indeed, but unfortunately, this mistake dates way back to Numeric times, and easing the migration to numpy was a priority in the heady days of numpy 1.0. > So I suggest further work on this function is > not called for, and use of `random_integers` > should be encouraged. Probably NumPy's `randint` > should be deprecated. Not while I'm here. Instead, `random_integers()` is discouraged and perhaps might eventually be deprecated. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster
On Mon, Feb 15, 2016 at 4:24 PM, Jeff Reback wrote: > > just an FYI. > > pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory savings, > see here, similar to how python range/xrange work. > > though there are substantial perf benefits, mainly with set operations, see here > though didn't officially benchmark thes. Since it is a numpy-aware object (unlike the builtins), you can (and have, if I'm reading the code correctly) implement __array__() such that it does the correctly performant thing and call np.arange(). RangeIndex won't be adversely impacted by retaining the status quo. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Hook in __init__.py to let distributors patch numpy
I would add a numpy/_distributor_init.py module and unconditionally import it in the __init__.py. It's contents in our upstream sources would just be a docstring: """Distributors! Put your initialization code here! """ One important technical benefit is that the unconditional import won't hide ImportErrors in the distributor's code. On Fri, Feb 12, 2016 at 1:19 AM, Matthew Brett wrote: > Hi, > > Over at https://github.com/numpy/numpy/issues/5479 we're discussing > Windows wheels. > > On thing that we would like to be able to ship Windows wheels, is to > be able to put some custom checks into numpy when you build the > wheels. > > Specifically, for Windows, we're building on top of ATLAS BLAS / > LAPACK, and we need to check that the system on which the wheel is > running, has SSE2 instructions, otherwise we know ATLAS will crash > (almost everybody does have SSE2 these days). > > The way I propose we do that, is this patch here: > > https://github.com/numpy/numpy/pull/7231 > > diff --git a/numpy/__init__.py b/numpy/__init__.py > index 0fcd509..ba3ba16 100644 > --- a/numpy/__init__.py > +++ b/numpy/__init__.py > @@ -190,6 +190,12 @@ def pkgload(*packages, **options): > test = testing.nosetester._numpy_tester().test > bench = testing.nosetester._numpy_tester().bench > > +# Allow platform-specific build to intervene in numpy init > +try: > +from . import _distributor_init > +except ImportError: > +pass > + > from . import core > from .core import * > from . import compat > > So, numpy __init__.py looks for a module `_distributor_init`, in which > the distributor might have put custom code to do any checks and > initialization needed for the particular platform. We don't by > default ship a `_distributor_init.py` but leave it up to packagers to > generate this when building binaries. > > Does that sound like a sensible approach to y'all? > > Cheers, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of np.random.uniform
On Tue, Jan 19, 2016 at 5:35 PM, Sebastian Berg wrote: > > On Di, 2016-01-19 at 16:28 +, G Young wrote: > > In rand range, it raises an exception if low >= high. > > > > I should also add that AFAIK enforcing low >= high with floats is a > > lot trickier than it is for integers. I have been knee-deep in > > corner cases for some time with randint where numbers that are > > visually different are cast as the same number by numpy due to > > rounding and representation issues. That situation only gets worse > > with floats. > > > > Well, actually random.uniform docstring says: > > Get a random number in the range [a, b) or [a, b] depending on > rounding. Which docstring are you looking at? The current one says [low, high) http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html#numpy.random.uniform -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of np.random.uniform
On Thu, Jan 21, 2016 at 7:06 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > > There doesn't seem to be much of a consensus on the way to go, so leaving things as they are and have been seems the wisest choice for now, thanks for all the feedback. I will work with Greg on documenting the status quo properly. Ugh. Be careful in documenting the way things currently work. No one intended for it to work that way! No one should rely on high___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of np.random.uniform
On Tue, Jan 19, 2016 at 5:40 PM, Charles R Harris wrote: > > On Tue, Jan 19, 2016 at 10:36 AM, Robert Kern wrote: >> >> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: >> > >> >> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal < chris.bar...@noaa.gov> wrote: >> >> >> >> What does the standard lib do for rand range? I see that randint Is closed on both ends, so order doesn't matter, though if it raises for b> > >> > randint is not closed on the high end. The now deprecated random_integers is the function that does that. >> > >> > For floats, it's good to have various interval options. For instance, in generating numbers that will be inverted or have their log taken it is good to avoid zero. However, the names 'low' and 'high' are misleading... >> >> They are correctly leading the users to the manner in which the author intended the function to be used. The *implementation* is misleading by allowing users to do things contrary to the documented intent. ;-) >> >> With floating point and general intervals, there is not really a good way to guarantee that the generated results avoid the "open" end of the specified interval or even stay *within* that interval. This function is definitely not intended to be used as `uniform(closed_end, open_end)`. > > Well, it is possible to make that happen if one is careful or directly sets the bits in ieee types... For the unit interval, certainly. For general bounds, I am not so sure. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of np.random.uniform
On Tue, Jan 19, 2016 at 5:36 PM, Robert Kern wrote: > > On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > > > > > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal < chris.bar...@noaa.gov> wrote: > >> > >> What does the standard lib do for rand range? I see that randint Is closed on both ends, so order doesn't matter, though if it raises for b > > > randint is not closed on the high end. The now deprecated random_integers is the function that does that. > > > > For floats, it's good to have various interval options. For instance, in generating numbers that will be inverted or have their log taken it is good to avoid zero. However, the names 'low' and 'high' are misleading... > > They are correctly leading the users to the manner in which the author intended the function to be used. The *implementation* is misleading by allowing users to do things contrary to the documented intent. ;-) > > With floating point and general intervals, there is not really a good way to guarantee that the generated results avoid the "open" end of the specified interval or even stay *within* that interval. This function is definitely not intended to be used as `uniform(closed_end, open_end)`. There are special cases that *can* be implemented and are worth doing so as they are building blocks for other distributions that do need to avoid 0 or 1 as you say. Full-featured RNG suites do offer these: [0, 1] [0, 1) (0, 1] (0, 1) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Behavior of np.random.uniform
On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris wrote: > > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal < chris.bar...@noaa.gov> wrote: >> >> What does the standard lib do for rand range? I see that randint Is closed on both ends, so order doesn't matter, though if it raises for b > randint is not closed on the high end. The now deprecated random_integers is the function that does that. > > For floats, it's good to have various interval options. For instance, in generating numbers that will be inverted or have their log taken it is good to avoid zero. However, the names 'low' and 'high' are misleading... They are correctly leading the users to the manner in which the author intended the function to be used. The *implementation* is misleading by allowing users to do things contrary to the documented intent. ;-) With floating point and general intervals, there is not really a good way to guarantee that the generated results avoid the "open" end of the specified interval or even stay *within* that interval. This function is definitely not intended to be used as `uniform(closed_end, open_end)`. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Software Capabilities of NumPy in Our Tensor Survey Paper
On Fri, Jan 15, 2016 at 5:30 PM, Nathaniel Smith wrote: > > On Jan 15, 2016 8:36 AM, "Li Jiajia" wrote: > > > > Hi all, > > I’m a PhD student in Georgia Tech. Recently, we’re working on a survey paper about tensor algorithms: basic tensor operations, tensor decomposition and some tensor applications. We are making a table to compare the capabilities of different software and planning to include NumPy. We’d like to make sure these parameters are correct to make a fair compare. Although we have looked into the related documents, please help us to confirm these. Besides, if you think there are more features of your software and a more preferred citation, please let us know. We’ll consider to update them. We want to show NumPy supports tensors, and we also include "scikit-tensor” in our survey, which is based on NumPy. > > Please let me know any confusion or any advice! > > Thanks a lot! :-) > > > > Notice: > > 1. “YES/NO” to show whether or not the software supports the operation or has the feature. > > 2. “?” means we’re not sure of the feature, and please help us out. > > 3. “Tensor order” means the maximum number of tensor dimensions that users can do with this software. > > 4. For computational cores, > > 1) "Element-wise Tensor Operation (A * B)” includes element-wise add/minus/multiply/divide, also Kronecker, outer and Katri-Rao products. If the software contains one of them, we mark “YES”. > > 2) “TTM” means tensor-times-matrix multiplication. We distinguish TTM from tensor contraction. If the software includes tensor contraction, it can also support TTM. > > 3) For “MTTKRP”, we know most software can realize it through the above two operations. We mark it “YES”, only if an specified optimization for the whole operation. > > NumPy has support for working with multidimensional tensors, if you like, but it doesn't really use the tensor language and notation (preferring instead to think in terms of "arrays" as a somewhat more computationally focused and less mathematically focused conceptual framework). > > Which is to say that I actually have no idea what all those jargon terms you're asking about mean :-) I am suspicious that NumPy supports more of those operations than you have marked, just under different names/notation, but really can't tell either way for sure without knowing what exactly they are. In particular check if your operations can be expressed with einsum() http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.einsum.html -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Get rid of special scalar arithmetic
On Wed, Jan 13, 2016 at 5:18 AM, Charles R Harris wrote: > > Hi All, > > I've opened issue #7002, reproduced below, for discussion. >> >> Numpy umath has a file scalarmath.c.src that implements scalar arithmetic using special functions that are about 10x faster than the equivalent ufuncs. >> >> In [1]: a = np.float64(1) >> >> In [2]: timeit a*a >> 1000 loops, best of 3: 69.5 ns per loop >> >> In [3]: timeit np.multiply(a, a) >> 100 loops, best of 3: 722 ns per loop >> >> I contend that in large programs this improvement in execution time is not worth the complexity and maintenance overhead; it is unlikely that scalar-scalar arithmetic is a significant part of their execution time. Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. This would also bring the benefits of __numpy_ufunc__ to scalars with minimal effort. > > Thoughts? Not all important-to-optimize programs are large in our field; interactive use is rampant. The scalar optimizations weren't added speculatively: people noticed that their Numeric code ran much slower under numpy and were reluctant to migrate. I was forever responding on comp.lang.python, "It's because scalar arithmetic hasn't been optimized yet. We know how to do it, we just need a volunteer to do the work. Contributions gratefully accepted!" The most critical areas tended to be optimization where you are often working with implicit scalars that pop out in the optimization loop. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate random.random_integers
On Sun, Jan 3, 2016 at 11:51 PM, G Young wrote: > > Hello all, > > In light of the discussion in #6910, I have gone ahead and deprecated random_integers in my most recent PR here. As this is an API change (sort of), what are people's thoughts on this deprecation? I'm reasonably in favor. random_integers() with its closed-interval convention only exists because it existed in Numeric's RandomArray module. The closed-interval convention was broadly been considered to be a mistake introduced early in the stdlib random module and rectified with the introduction and promotion of random.randrange() instead. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy funding update
On Wed, Dec 30, 2015 at 10:54 AM, Ralf Gommers wrote: > > Hi all, > > A quick good news message: OSDC has made a $5k contribution to NumFOCUS, which is split between support for a women in technology workshop and support for Numpy: http://www.numfocus.org/blog/osdc-donates-5k-to-support-numpy-women-in-tech > This was a very nice surprise to me, and a first sign that the FSA (fiscal sponsorship agreement) we recently signed with NumFOCUS is going to yield significant benefits for Numpy. > > NumFOCUS is also doing a special end-of-year fundraiser. Funds donated (up to $5k) will be tripled by anonymous sponsors: http://www.numfocus.org/blog/numfocus-end-of-year-fundraising-drive-5000-matching-gift-challenge > So think of Numpy (or your other favorite NumFOCUS-sponsored project of course) if you're considering a holiday season charitable gift! That sounds great! Do we have any concrete plans for spending that money, yet? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators
On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root wrote: > > Heh, never noticed that. Was it implemented more like a generator/iterator in older versions of Python? No, it predates generators and iterators so it has always had to be implemented like that. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators
On Mon, Dec 14, 2015 at 3:56 PM, Benjamin Root wrote: > By the way, any reason why this works? > >>> np.array(xrange(10)) > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) It's not a generator. It's a true sequence that just happens to have a special implementation rather than being a generic container. >>> len(xrange(10)) 10 >>> xrange(10)[5] 5 -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reshaping array question
On Nov 17, 2015 6:53 PM, "Sebastian Berg" wrote: > > On Di, 2015-11-17 at 13:49 -0500, Neal Becker wrote: > > Robert Kern wrote: > > > > > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: > > >> > > >> I have an array of shape > > >> (7, 24, 2, 1024) > > >> > > >> I'd like an array of > > >> (7, 24, 2048) > > >> > > >> such that the elements on the last dimension are interleaving the > > >> elements from the 3rd dimension > > >> > > >> [0,0,0,0] -> [0,0,0] > > >> [0,0,1,0] -> [0,0,1] > > >> [0,0,0,1] -> [0,0,2] > > >> [0,0,1,1] -> [0,0,3] > > >> ... > > >> > > >> What might be the simplest way to do this? > > > > > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) > > > > I get an error on that 1st transpose: > > > > Transpose needs a slightly different input. If you look at the help, it > should be clear. The help might also point to np.swapaxes, which may be > a bit more straight forward for this exact case. Sorry about that. Was in a rush and working from a faulty memory. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reshaping array question
On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: > > I have an array of shape > (7, 24, 2, 1024) > > I'd like an array of > (7, 24, 2048) > > such that the elements on the last dimension are interleaving the elements > from the 3rd dimension > > [0,0,0,0] -> [0,0,0] > [0,0,1,0] -> [0,0,1] > [0,0,0,1] -> [0,0,2] > [0,0,1,1] -> [0,0,3] > ... > > What might be the simplest way to do this? np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) > > A different question, suppose I just want to stack them > > [0,0,0,0] -> [0,0,0] > [0,0,0,1] -> [0,0,1] > [0,0,0,2] -> [0,0,2] > ... > [0,0,1,0] -> [0,0,1024] > [0,0,1,1] -> [0,0,1025] > [0,0,1,2] -> [0,0,1026] > ... A.reshape(A.shape[:-2] + (-1,)) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Nansum function behavior
On Fri, Oct 23, 2015 at 5:45 PM, Charles Rilhac wrote: > > Hello, > > I noticed the change regarding nan function and especially nansum function. I think this choice is a big mistake. I know that Matlab and R have made this choice but it is illogical and counterintuitive. What change are you referring to? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers
On Mon, Oct 19, 2015 at 9:51 PM, cy18 wrote: > > It would be useful when we need to subtracting a bit before comparing by greater or less. By subtracting a bit, we only have an absolute error tolerance and with the new functions, we can have both absolute and relative error tolerance. This is how isclose(a, b) better than abs(a-b)<=atol. You just adjust the value by whichever tolerance is greatest in magnitude. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver
On Wed, Sep 30, 2015 at 10:35 AM, Matthieu Brucher < matthieu.bruc...@gmail.com> wrote: > > Yes, obviously, the code has NR parts, so it can't be licensed as BSD > as it is... It's not obvious to me, especially after Juha's further clarifications. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] interpretation of the draft governance document (was Re: Governance model request)
On Wed, Sep 23, 2015 at 11:12 PM, Travis Oliphant wrote: >> [We also have to decide on the initial membership for the Council. While the above text makes pains to distinguish between "committer" and "Council Member", in the past we've pretty much treated them as the same. So to keep things simple and deterministic, I propose that we seed the Council with everyone who has reviewed/merged a pull request since Jan 1, 2014, and move those who haven't used their commit bit in >1.5 years to the emeritus list. Based on the output of >> >>git log --grep="^Merge pull request" --since 2014-01-01 | grep Author: | sort -u >> >> I believe this would give us an initial Steering Council of: Sebastian Berg, Jaime Fernández del Río, Ralf Gommers, Alex Griffing, Charles Harris, Nathaniel Smith, Julian Taylor, and Pauli Virtanen (assuming everyone on that list is interested/willing to serve).] > > I would ask to be on this initial council by having the rule include "original contributors of the code" which would basically include Robert Kern and Pearu Peterson in addition to Chuck Harris (though Pearu's main contribution was numpy.distutils and f2py and he may not be interested any longer in the rest of it). Were I to want such a thing, I would simply review and merge one of the many languishing PRs. I have the power to add myself to this list without asking for special exemptions to the general rule, merely a token bit of effort on my part. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?
On Sat, Sep 19, 2015 at 2:26 PM, Antoine Pitrou wrote: > > On Mon, 14 Sep 2015 19:02:58 +0200 > Sturla Molden wrote: > > On 14/09/15 10:34, Antoine Pitrou wrote: > > > > > If Numpy wanted to switch to a different generator, and if Numba wanted > > > to remain compatible with Numpy, one of the PCG functions would be an > > > excellent choice (also for CPU performance, incidentally). > > > > Is Apache license ok in NumPy? > > While I don't know Numpy's policy precisely, I have no idea why a > modern non-copyleft license such as Apache wouldn't be ok. GPLv2 incompatibility: http://www.apache.org/licenses/GPL-compatibility.html > That said, PCG looks simple enough that you can reimplement it (or at > least the variants that are of interest to you) independently without > too much effort. The author has agreed in principle to relicense the code as MIT, though has not yet merged the PR that accomplishes this yet. That said, we'd probably end up doing a significant amount of rewriting so that we will have a C implementation of software-uint128 arithmetic. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OK to upload patched 1.9.2 for Python 3.5?
On Mon, Sep 14, 2015 at 9:32 AM, Matthew Brett wrote: > > Hi, > > On Mon, Sep 14, 2015 at 1:22 AM, David Cournapeau wrote: > > > > > > On Mon, Sep 14, 2015 at 9:18 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> I'm just building numpy 1.9.2 for Python 3.5 (just released). > >> > >> In order to get the tests to pass on Python 3.5, I need to cherry pick > >> commit 7d6aa8c onto the 1.9.2 tag position. > >> > >> Does anyone object to me uploading a wheel built from this patched > >> version to pypi as 1.9.2 for Python 3.5 on OSX? It would help to get > >> the ball rolling for Python 3.5 binary wheels. > > > > > > Why not releasing this as 1.9.3 ? It does not need to be a full release > > (with binaries and all), but having multiple sources for a given tag is > > confusing. > > Generally OK with me, but it's quite a bit of extra work for very > little gain. We'd have to tag, release a source tarball and OSX > wheels, at least. I think it's highly desirable that we also have a *source* release that builds on Python 3.5, irrespective of whether or not we have binary wheels for a couple of platforms up for Python 3.5. So I would encourage a quick 1.9.3 release that incorporates this patch. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?
[Tim, ping me if you want to get dropped from the reply chain, as we are liable to get more into numpy decision-making. I've dropped python-ideas.] On Mon, Sep 14, 2015 at 4:34 AM, Tim Peters wrote: > > [Robert Kern ] > >> ... > >> I'll also recommend the PCG paper (and algorithm) as the author's > >> cross-PRNGs comparisons have been bandied about in this thread already. The > >> paper lays out a lot of the relevant issues and balances the various > >> qualities that are important: statistical quality, speed, and security (of > >> various flavors). > >> > >> http://www.pcg-random.org/paper.html > >> > >> I'm actually not that impressed with Random123. The core idea is nice and > >> clean, but the implementation is hideously complex. > > [Nathaniel Smith ] > > I'm curious what you mean by this? In terms of the code, or...? In terms of the code that would be necessary to implement this in a simultaneously performant and cross-platform way in numpy.random. > > I haven't looked at the code, I recommend it! Approach it with the practical goal of "how do I stick this in as a new core PRNG for numpy.random?" > > but the simplest generator they > > recommend in the paper is literally just > > > > def rng_stream(seed): > > counter = 0 > > while True: > > # AES128 takes a 128 bit key and 128 bits of data and returns > > 128 bits of encrypted data > > val = AES128(key=seed, data=counter) > > yield low_64_bits(val) > > yield high_64_bits(val) > > counter += 1 > > I assume it's because if you expand what's required to _implement_ > AES128() in C, it does indeed look pretty hideously complex. On HW > implementing AES primitives, of course the code can be much simpler. > > But to be fair, if integer multiplication and/or addition weren't > implemented in HW, and we had to write to C code emulating them via > bit-level fiddling, the code for any of the PCG algorithms would look > hideously complex too. > > But being fair isn't much fun ;-) Actually, I meant all of the crap *around* it, the platform-compatibility testing to see if you have such a hardware instruction or not, and C++ template shenanigans in the surrounding code. It's possible that the complexity is only due to flexibility, but it was too complex for me to begin understanding *why* it's so complex before I succumbed to ennui and moved on to some other productive use of my free time. At least some of the complexity is due to needing software implementations of reduced-round crypto for decent performance in the absence of the hardware instruction. Performing well in the absence of the hardware instruction is very important to me as I do not seem to have the AES-NI instruction available on my mid-2012 Macbook Pro. Exposing counter-mode AES128 as a core PRNG is a nice idea, but it's just low on my wishlist. I want fast, multiple independent streams on my current hardware first, and PCG gives that to me. On the "if I had to implement all of the bit-fiddling myself" count, I'd still prefer PCG. It has a similar problem with 128-bit integers that may or may not be natively provided (should you want a 128-bit state), and I have to say that the software implementation of that bit-fiddling is much nicer to work with than reduced-round Threefry. It's reference implementation is also overly complex for practical use as it is showing off the whole parameterized PCG family to support the paper, even the explicitly unrecommended members. But it's relatively straightforward to disentangle the desirable members; their implementations, including the supporting 128-bit arithmetic code, are small and clean. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] strange casting rules
On Wed, Jul 29, 2015 at 1:07 PM, Neal Becker wrote: > > np.uint64(-1)+0 > Out[36]: 1.8446744073709552e+19 > > I often work on signal processing requiring bit-exact integral arithmetic. > Promoting to float is not helpful - I don't understand the logic of the > above example. See this thread: http://mail.scipy.org/pipermail/numpy-discussion/2015-July/073196.html Cast your 0 to a uint64 or other unsigned int type to avoid this. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] difference between dtypes
On Fri, Jul 24, 2015 at 10:05 AM, wrote: > > On Fri, Jul 24, 2015 at 3:46 AM, Robert Kern wrote: >> >> On Wed, Jul 22, 2015 at 7:45 PM, wrote: >> > >> > Is there an explanation somewhere of what different basic dtypes mean, across platforms and python versions? >> > >> > >>> np.bool8 >> > >> > >>> np.bool_ >> > >> > >>> bool >> > >> > >> > >> > Are there any rules and recommendations or is it all folks lore? >> >> This may help a little: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing >> >> Basically, we accept the builtin Python type objects as a dtype argument and do something sensible with them. float -> np.float64 because Python floats are C doubles. int -> np.int32 or np.int64 depending on whatever a C long is (i.e. depending on the 64bitness of your CPU and how your OS chooses to deal with that). We encode those precision choices as aliases to the corresponding specific numpy scalar types (underscored as necessary to avoid shadowing builtins of the same name): np.float_ is np.float64, for example. >> >> See here for why the aliases to Python builtin types, np.int, np.float, etc. still exist: >> >> https://github.com/numpy/numpy/pull/6103#issuecomment-123652497 >> >> If you just need to pass a dtype= argument and want the precision that matches the "native" integer and float for your platform, then I prefer to use the Python builtin types instead of the underscored aliases; they just look cleaner. If you need a true numpy scalar type (e.g. to construct a numpy scalar object), of course, you must use one of the numpy scalar types, and the underscored aliases are convenient for that. Never use the aliases to the Python builtin types. > > (I don't have time to follow up on this for at least two weeks) > > my thinking was that, if there is no actual difference between bool, np.bool and np.bool_, the np.bool could become an alias and a replacement for np.bool_, so we can get rid of a "ugly" trailing underscore. > If np.float is always float64 it could be mapped to that directly. Well, I'll tell you why that's a bad idea in when you get back in two weeks. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] difference between dtypes
On Wed, Jul 22, 2015 at 7:45 PM, wrote: > > Is there an explanation somewhere of what different basic dtypes mean, across platforms and python versions? > > >>> np.bool8 > > >>> np.bool_ > > >>> bool > > > > Are there any rules and recommendations or is it all folks lore? This may help a little: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing Basically, we accept the builtin Python type objects as a dtype argument and do something sensible with them. float -> np.float64 because Python floats are C doubles. int -> np.int32 or np.int64 depending on whatever a C long is (i.e. depending on the 64bitness of your CPU and how your OS chooses to deal with that). We encode those precision choices as aliases to the corresponding specific numpy scalar types (underscored as necessary to avoid shadowing builtins of the same name): np.float_ is np.float64, for example. See here for why the aliases to Python builtin types, np.int, np.float, etc. still exist: https://github.com/numpy/numpy/pull/6103#issuecomment-123652497 If you just need to pass a dtype= argument and want the precision that matches the "native" integer and float for your platform, then I prefer to use the Python builtin types instead of the underscored aliases; they just look cleaner. If you need a true numpy scalar type (e.g. to construct a numpy scalar object), of course, you must use one of the numpy scalar types, and the underscored aliases are convenient for that. Never use the aliases to the Python builtin types. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rationale for integer promotion rules
On Thu, Jul 16, 2015 at 7:14 PM, Nathaniel Smith wrote: > > On Thu, Jul 16, 2015 at 9:18 AM, Antoine Pitrou wrote: > > while adding int8 and uint8 will give int16 as result > > (promoting to the smallest fitting type). > > I understand this to be a consequence of the previous rule (results > should match inputs) combined with the need to find a common input > type. Specifically, when combining signed and unsigned ints, we need to find a signed int type that can simultaneously hold both of the *inputs*. Whether the *output* would fit or overflow is not what the rules are concerned with. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dimension independent copy of corner of array
newarr[tuple(slice(0, i) for i in oldarray.shape)] = oldarray On Mon, Jul 13, 2015 at 12:34 PM, Neal Becker wrote: > I want to copy an array to the corner of a new array. What is a dimension > independent way to say: > > newarr[:i0,:i1,...] = oldarray > > where (i0,i1...) is oldarray.shape? > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
On Sun, May 24, 2015 at 7:56 PM, Sturla Molden wrote: > > On 24/05/15 20:04, Nathaniel Smith wrote: > > > I'm not sure what you're envisioning as needing a deprecation cycle? The > > neat thing about random is that we already have a way for users to say > > that they want replicability -- the use of an explicit seed -- > > No, this is not sufficient for random numbers. Random sampling and > ziggurat generators are examples. If we introduce a change (e.g. a > bugfix) that will affect the number of calls to the entropy source, just > setting the seed will in general not be enough to ensure backwards > compatibility. That is e.g. the case with using ziggurat samplers > instead of the current transcendental transforms for normal, exponential > and gamma distributions. While ziggurat is faster (and to my knowledge) > more accurate, it will also make a different number of calls to the > entropy source, and hence the whole sequence will be affected, even if > you do set a random seed. Please reread the proposal at the top of the thread. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState
On Sun, May 24, 2015 at 7:46 PM, Sturla Molden wrote: > > On 24/05/15 17:13, Anne Archibald wrote: > > Do we want a deprecation-like approach, so that eventually people who > > want replicability will specify versions, and everyone else gets bug > > fixes and improvements? This would presumably take several major > > versions, but it might avoid people getting unintentionally trapped on > > this version. > > > > Incidentally, bug fixes are complicated: if a bug fix uses more or fewer > > raw random numbers, it breaks repeatability not just for the call that > > got fixed but for all successive random number generations. > > If a function has a bug, changing it will change the output of the > function. This is not special for random numbers. If not retaining the > old erroneous output means we break-backwards compatibility, then no > bugs can ever be fixed, anywhere in NumPy. I think we need to clarify > what we mean by backwards compatibility for random numbers. What > guarantees should we make from one version to another? The policy thus far has been that we will fix bugs in the distributions and make changes that allow a strictly wider domain of distribution parameters (e.g. allowing b==0 where before we only allowed b>0), but we will not make other enhancements that would change existing good output. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] binary wheels for numpy?
On Sun, May 17, 2015 at 7:50 PM, Matthew Brett wrote: > > On Sun, May 17, 2015 at 8:22 AM, Sturla Molden wrote: > > Matthew Brett wrote: > > > >> Yes, unfortunately we can't put MKL binaries on pypi because of the > >> MKL license - see > > > > I believe we can, because we asked Intel for permission. From what I heard > > the response was positive. > > We would need something formal from Intel saying that they do not > require us to hold our users to their standard redistribution terms > and that they waive the requirement that we be responsible for any > damage to Intel that happens as a result of people using our binaries. > > I'm guessing we don't have this, but I'm happy to be corrected, I don't think permission from Intel is the blocking issue for putting these binaries up on PyPI. Even with Intel's permission, we would be putting up proprietary binaries on a page that is explicitly claiming that the files linked therein are BSD-licensed. The binaries could not be redistributed with any GPLed module, say, pygsl. We could host them on numpy.org on their own page that clearly explained the license of those files, but I think PyPI is out. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance of numpy.array()
On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista wrote: > > I work on two distinct scientific clusters. I have run the same python code on the two clusters and I have noticed that one is faster by an order of magnitude than the other (1min vs 10min, this is important because I run this function many times). > > I have investigated with a profiler and I have found that the cause of this is that (same code and same data) is the function numpy.array that is being called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B it takes ~6 min. For what regards the other functions, they are generally faster on cluster A. I understand that the clusters are quite different, both as hardware and installed libraries. It strikes me that on this particular function the performance is so different. I would have though that this is due to a difference in the available memory, but actually by looking with `top` the memory seems to be used only at 0.1% on cluster B. In theory numpy is compiled with atlas on cluster B, and on cluster A it is not clear, because numpy.__config__.show() returns NOT AVAILABLE for anything. > > Does anybody has any insight on that, and if I can improve the performance on cluster B? Check to see if you have the "Transparent Hugepages" (THP) Linux kernel feature enabled on each cluster. You may want to try turning it off. I have recently run into a problem with a large-memory multicore machine with THP for programs that had many large numpy.array() memory allocations. Usually, THP helps memory-hungry applications (you can Google for the reasons), but it does require defragmenting the memory space to get contiguous hugepages. The system can get into a state where the memory space is so fragmented such that trying to get each new hugepage requires a lot of extra work to create the contiguous memory regions. In my case, a perfectly well-performing program would suddenly slow down immensely during it's memory-allocation-intensive actions. When I turned THP off, it started working normally again. If you have root, try using `perf top` to see what C functions in user space and kernel space are taking up the most time in your process. If you see anything like `do_page_fault()`, this, or a similar issue, is your problem. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Non-meta indexing improvements discussion
On Thu, Apr 9, 2015 at 8:07 AM, Sebastian Berg wrote: > > On Do, 2015-04-09 at 08:50 +0200, Sebastian Berg wrote: > > 3. I do not know if it possible or useful, but I could imagine a module > > wide switch (similar to __future__ imports) to change the default > > indexing behaviour. > > OK, my suggestion But actually I do not know if I like it all that > much (nor if it can be done) since 1. and 2. seem to me like enough. But > if someone feels strongly about fancy indexing being bad, I wanted to > point out that there may be ways to go down the road to "switch" numpy > without actually switching. I can't think of a way to actually make that work (I can list all the ways I thought of that *don't* work if anyone insists, but it's a tedious dead end), but it also seems to me to be a step backward. Assuming that we have both .ortho_ix and .fancy_ix to work with, it seems to me that the explicitness is a good thing. Even if in this module you only want to exploit one of those semantics, your module's readers live in a wider context where both semantics play a role. Moving your marker of "this is the non-default semantics I am using" to some module or function header away from where the semantics are actually used makes the code harder to read. A newish user trying to read some nontrivial indexing code may come to the list and ask "what exactly is this expression doing here?" and give us just the line of code with the expression (anecdotally, this is usually how this scenario goes down). If we have to answer "it depends; is there an @ortho_indexing decorator at the top of the function?", that's probably a cure worse than the disease. The properties are a good way to provide googleable signposts right where the tricky semantics are being used. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal)
On Wed, Apr 8, 2015 at 8:40 PM, Ralf Gommers wrote: > To address in detail the list of Matthew you mention above: > > * implement orthogonal indexing as a method arr.sensible_index[...] > That's basically Jaime's PR. > > * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] > Not that harmful, but only makes sense in combination with the next steps. Well, since we got the peanut butter in our meta, I might as well join in here. I think this step is useful even without the deprecation steps. First, allow me to rename things in a less judgy fashion: * arr.ortho_ix is a property that allows for orthogonal indexing, a la Jaime's PR. * arr.fancy_ix is a property that implements the current numpy-standard fancy indexing. Even though arr.fancy_ix "only" replicating the default semantics, having it opens up some possibilities. Other array-like objects can implement both of these with the same names. Thus, to write generic code that exploits the particular behaviors of one of these semantics, you just make sure to use the property that behaves the way you want. Then you don't care what the default syntax does on that object. You don't have to test if an object has arr.__orthogonal_indexing__ and have two different code paths; you just use the property that behaves the way you want. It also allows us to issue warnings when the default indexing syntax is used for some of the behaviors that are weird corner cases, like [1, :, array]. This is one of those corner cases where the behavior is probably not what anyone actually *wants*; it was just the only thing we could do that is consistent with the desired semantics of the rest of the cases. I think it would be reasonable to issue a warning if the default indexing syntax was used with it. It's probably a sign that the user thought that indexing worked like orthogonal indexing. The warning would *not* be issued if the arr.fancy_ix property was used, since that is an explicit signal that the user is specifically requesting a particular set of behaviors. I probably won't want to ever *deprecate* the behavior for the default syntax, but a warning is easy to deal with even with old code that you don't want to modify directly. Lastly, I would appreciate having some signal to tell readers "pay attention; this is nontrivial index manipulation; here is a googleable term so you can look up what this means". I almost always use the default fancy indexing, but I'd use the arr.fancy_ix property for the nontrivial cases just for this alone. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal)
On Wed, Apr 8, 2015 at 8:05 PM, Eric Firing wrote: > Now, can we please get back to consideration of reasonable options? Sure, but I recommend going back to the actually topical thread (or a new one), as this one is meta. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal)
On Wed, Apr 8, 2015 at 2:06 AM, Nathaniel Smith wrote: > > On Apr 5, 2015 7:04 AM, "Robert Kern" wrote: > > > > On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: > > > > > > On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > > > > > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. > > > > > > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. > > > > It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal. > > Ok, let me try to make my point another way. > > I don't actually care at this stage in the discussion whether the change is ultimately viable. And I don't think you should either. (For values of "you" that includes everyone in the discussion, not picking on Robert in particular :-).) > > My point is that rational, effective discussion requires giving ideas room to breath. Sometimes ideas turn out to be not as bad as they looked. Sometimes it turns out that they are, but there's some clever tweak that gives you 95% of the benefits for 5% of the cost. Sometimes you generate a better understanding of the tradeoffs that subsequently informs later design decisions. Sometimes working through the details makes both sides realize that there's a third option that solves both their problems. Sometimes you merely get a very specific understanding of why the whole approach is unreasonable that you can then, say, take to the pandas and netcdf developers as evidence of that you made a good faith effort and ask them to meet you half way. And all these things require understanding the specifics of what *exactly* works or doesn't work about about idea. IMHO, it's extremely misleading at this stage to make any assertion about whether Jaime's approach gives the "majority of benefits of such a change" is extremely misleading at this stage: not because it's wrong, but because it totally short-circuits the discussion about what benefits we care about. Jaime's patch certainly has merits, but does it help us make numpy and pandas/netcdf's more compatible? Does it make it easier for Eric to teach? Those are some specific merits that we might care about a lot, and for which Jaime's patch may or may not help much. But that kind of nuance gets lost when we jump straight to debating thumbs-up versus thumbs-down. And we can get all of that discussion from discussing Jaime's proposal. I would argue that we will get better, more focused discussion from it since it is actually a concrete proposal and not just a wish that numpy's indexing semantics were something else. I think that a full airing and elaboration of Jaime's proposal (as the final PR should look quite different than the initial one to incorporate the what is found in the discussion) will give us a satisficing solution. I certainly think that that is *more likely* to arrive at a satisficing solution than an attempt to change the default indexing semantics. I can name specific improvements that would specifically address the concerns you named if you would like. Maybe it won't be *quite* as good (for some parties) than if Numeric chose orthogonal indexing from the get-go, but it will likely be much better for everyone than if numpy broke backward compatibility on this feature now. > I cross-my-heart promise that under the current regime, no PR breaking fancy indexing would ever get anywhere *near* numpy master without *extensive* discussion and warnings on the list. The core devs just spent weeks quibbling about whether a PR that adds a new field to the end of the dtype struct would break ABI backcompat (we're now pretty sure it doesn't), and the current standard we enforce is that every PR that touches public API needs a list discuss
Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal
On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: > > On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > > > > On Apr 4, 2015 10:54 AM, "Nathaniel Smith" wrote: > > > > > > Core python broke backcompat on a regular basis throughout the python > > > 2 series, and almost certainly will again -- the bar to doing so is > > > *very* high, and they use elaborate mechanisms to ease the way > > > (__future__, etc.), but they do it. A few months ago there was even > > > some serious consideration given to changing py3 bytestring indexing > > > to return bytestrings instead of integers. (Consensus was > > > unsurprisingly that this was a bad idea, but there were core devs > > > seriously exploring it, and no-one complained about the optics.) > > > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. > > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith wrote: > > On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers wrote: > > > > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: > >> So I'd be very happy to see worked out proposals for any or > >> all of these approaches. It strikes me as really premature to be > >> issuing proclamations about what changes might be considered. There is > >> really no danger to *considering* a proposal; > > > > Sorry, I have to disagree. Numpy is already seen by some as having a poor > > track record on backwards compatibility. Having core developers say "propose > > some backcompat break to how indexing works and we'll consider it" makes our > > stance on that look even worse. Of course everyone is free to make any > > technical proposal they deem fit and we'll consider the merits of it. > > However I'd like us to be clear that we do care strongly about backwards > > compatibility and that the fundamentals of the core of Numpy (things like > > indexing, broadcasting, dtypes and ufuncs) will not be changed in > > backwards-incompatible ways. > > > > Ralf > > > > P.S. also not for a possible numpy 2.0 (or have we learned nothing from > > Python3?). > > I agree 100% that we should and do care strongly about backwards > compatibility. But you're saying in one sentence that we should tell > people that we won't consider backcompat breaks, and then in the next > sentence that of course we actually will consider them (even if we > almost always reject them). Basically, I think saying one thing and > doing another is not a good way to build people's trust. There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals. That is what Ralf was getting at. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Mathematical functions in Numpy
On Tue, Mar 17, 2015 at 6:29 PM, Matthieu Brucher < matthieu.bruc...@gmail.com> wrote: > > Hi, > > These functions are defined in the C standard library! I think he's asking how to define numpy ufuncs. > 2015-03-17 18:00 GMT+00:00 Shubhankar Mohapatra : > > Hello all, > > I am a undergraduate and i am trying to do a project this time on numppy in > > gsoc. This project is about integrating vector math library classes of sleef > > and yeppp into numpy to make the mathematical functions faster. I have > > already studied the new library classes but i am unable to find the sin , > > cos function definitions in the numpy souce code.Can someone please help me > > find the functions in the source code so that i can implement the new > > library class into numpy. > > Thanking you, > > Shubhankar Mohapatra > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > _______ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] random.RandomState and deepcopy
On Fri, Mar 13, 2015 at 5:59 PM, Neal Becker wrote: > > Robert Kern wrote: > > > On Fri, Mar 13, 2015 at 5:34 PM, Neal Becker wrote: > >> > >> It is common that to guarantee good statistical independence between > > various > >> random generators, a singleton instance of an RNG is shared between them. > >> > >> So I typically have various random generator objects, which (sometimes > >> several levels objects deep) embed an instance of RandomState. > >> > >> Now I have a requirement to copy a generator object (without knowing > > exactly > >> what that generator object is). > > > > Or rather, you want the generator object to *avoid* copies by returning > > itself when a copy is requested of it. > > > >> My solution is to use deepcopy on the top-level object. But I need to > >> overload __deepcopy__ on the singleton RandomState object. > >> > >> Unfortunately, RandomState doesn't allow customization of __deepcopy__ > >> (or > >> anything else). And it has no __dict__. > > > > You can always subclass RandomState to override its __deepcopy__. > > > > -- > > Robert Kern > > Yes, I think I prefer this: > > from numpy.random import RandomState > > class shared_random_state (RandomState): > def __init__ (self, rs): > RandomState.__init__(self, rs) > > def __deepcopy__ (self, memo): > return self > > Although, that means I have to use it like this: > > rs = shared_random_state (0) > > where I really would prefer (for aesthetic reasons): > > rs = shared_random_state (RandomState(0)) > > but I don't know how to do that if shared_random_state inherits from > RandomState. If you insist: class shared_random_state(RandomState): def __init__(self, rs): self.__setstate__(rs.__getstate__()) def __deepcopy__(self, memo): return self -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] random.RandomState and deepcopy
On Fri, Mar 13, 2015 at 5:34 PM, Neal Becker wrote: > > It is common that to guarantee good statistical independence between various > random generators, a singleton instance of an RNG is shared between them. > > So I typically have various random generator objects, which (sometimes > several levels objects deep) embed an instance of RandomState. > > Now I have a requirement to copy a generator object (without knowing exactly > what that generator object is). Or rather, you want the generator object to *avoid* copies by returning itself when a copy is requested of it. > My solution is to use deepcopy on the top-level object. But I need to > overload __deepcopy__ on the singleton RandomState object. > > Unfortunately, RandomState doesn't allow customization of __deepcopy__ (or > anything else). And it has no __dict__. You can always subclass RandomState to override its __deepcopy__. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argument handling by uniform
On Fri, Mar 13, 2015 at 3:57 PM, Alan G Isaac wrote: > > Today I accidentally wrote `uni = np.random.uniform((-0.5,0.5),201)`, > supply a tuple instead of separate low and high values. This gave > me two draws (from [0..201] I think). My question: how were the > arguments interpreted? Broadcast against each other. Roughly equivalent to: uni = np.array([ np.random.uniform(-0.5, 201), np.random.uniform(0.5, 201), ]) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] tie breaking for max, min, argmax, argmin
On Thu, Mar 12, 2015 at 1:31 PM, Johannes Kulick < johannes.kul...@ipvs.uni-stuttgart.de> wrote: > > Hello, > > I wonder if it would be worth to enhance max, min, argmax and argmin (more?) > with a tie breaking parameter: If multiple entries have the same value the first > value is returned by now. It would be useful to have a parameter to alter this > behavior to an arbitrary tie-breaking. I would propose, that the tie-breaking > function gets a list with all indices of the max/mins. > > Example: > >>> a = np.array([ 1, 2, 5, 5, 2, 1]) > >>> np.argmax(a, tie_breaking=random.choice) > 3 > > >>> np.argmax(a, tie_breaking=random.choice) > 2 > > >>> np.argmax(a, tie_breaking=random.choice) > 2 > > >>> np.argmax(a, tie_breaking=random.choice) > 2 > > >>> np.argmax(a, tie_breaking=random.choice) > 3 > > Especially for some randomized experiments it is necessary that not always the > first maximum is returned, but a random optimum. Thus I end up writing these > things over and over again. > > I understand, that max and min are crucial functions, which shouldn't be slowed > down by the proposed changes. Adding new functions instead of altering the > existing ones would be a good option. > > Are there any concerns against me implementing these things and sending a pull > request? Should such a function better be included in scipy for example? On the whole, I think I would prefer new functions for this. I assume you only need variants for argmin() and argmax() and not min() and max(), since all of the tied values for the latter two would be identical, so returning the first one is just as good as any other. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3
On Sat, Mar 7, 2015 at 9:54 AM, Pauli Virtanen wrote: > How about an extra use_pickle=True kwarg that can be used to disable > using pickle altogether in these routines? If we do, I'd vastly prefer `forbid_pickle=False`. The use_pickle spelling suggests that you are asking it to use pickle when it otherwise wouldn't, which is not the intention. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] So I found a bug...
On Fri, Feb 27, 2015 at 10:33 PM, Sturla Molden wrote: > > Somewhere... But where is it? > > NumPy, SciPy, Matplotlib, Cython or ipython? > > I am suspecting ipython, but proving it is hard... > > http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/12464039/lenna-bug.ipynb When plt.imshow() is given floating point RGB images, it assumes that each channel is normalized to 1. You are mixing a 0..255 image with a 0..1 image. Divide `lenna` by 255.0 before you stack it with `_dct`. Or multiply `_dct` by 255 and cast it to uint8. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] One-byte string dtype: third time's the charm?
On Sun, Feb 22, 2015 at 7:29 PM, Sturla Molden wrote: > > On 22/02/15 19:21, Aldcroft, Thomas wrote: > > > Problems like this are now showing up in the wild [3]. Workarounds are > > also showing up, like a way to easily convert from 'S' to 'U' within > > astropy Tables [4], but this is really not a desirable way to go. > > Gigabyte-sized string data arrays are not uncommon, so converting to > > UCS-4 is a real memory and performance hit. > > Why UCS-4? The Python's internal "flexible string respresentation" will > use ascii for ascii text. numpy's 'U' dtype is UCS-4, and this is what Thomas is referring to, not Python's string type. It cannot have a flexible representation as it *is* the representation. Python 3's `str` type is opaque, so it can freely choose how to represent the data in memory. numpy dtypes transparently describe how the data is represented in memory. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unpacking data values into array of bits
On Thu, Feb 12, 2015 at 3:22 PM, Neal Becker wrote: > > Robert Kern wrote: > > > On Thu, Feb 12, 2015 at 3:00 PM, Neal Becker wrote: > >> > >> Robert Kern wrote: > >> > >> > On Thu, Feb 12, 2015 at 2:21 PM, Neal Becker > > wrote: > >> >> > >> >> I need to transmit some data values. These values will be float and > > long > >> >> values. I need them encoded into a string of bits. > >> >> > >> >> The only way I found so far to do this seems rather roundabout: > >> >> > >> >> > >> >> np.unpackbits (np.array (memoryview(struct.pack ('d', pi > >> >> Out[45]: > >> >> array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, > > 1, > >> > 0, > >> >>0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, > > 0, > >> > 0, > >> >>0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], > > dtype=uint8) > >> >> > >> >> (which I'm not certain is correct) > >> >> > >> >> Also, I don't know how to reverse this process > >> > > >> > You already had your string ready for transmission with > > `struct.pack('d', > >> > pi)`. > >> > > >> > -- > >> > Robert Kern > >> > >> my transmitter wants an np array of bits, not a string > > > > Can you provide any details on what your "transmitter" is? > > > > -- > > My transmitter is c++ code that accepts as input a numpy array of np.int32. > Each element of that array has value 0 or 1. Ah, great. That makes sense, then. def tobeckerbits(x): return np.unpackbits(np.frombuffer(np.asarray(x), dtype=np.uint8)).astype(np.int32) def frombeckerbits(bits, dtype): return np.frombuffer(np.packbits(bits), dtype=dtype)[0] -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unpacking data values into array of bits
On Thu, Feb 12, 2015 at 3:00 PM, Neal Becker wrote: > > Robert Kern wrote: > > > On Thu, Feb 12, 2015 at 2:21 PM, Neal Becker wrote: > >> > >> I need to transmit some data values. These values will be float and long > >> values. I need them encoded into a string of bits. > >> > >> The only way I found so far to do this seems rather roundabout: > >> > >> > >> np.unpackbits (np.array (memoryview(struct.pack ('d', pi > >> Out[45]: > >> array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, > > 0, > >>0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, > > 0, > >>0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], dtype=uint8) > >> > >> (which I'm not certain is correct) > >> > >> Also, I don't know how to reverse this process > > > > You already had your string ready for transmission with `struct.pack('d', > > pi)`. > > > > -- > > Robert Kern > > my transmitter wants an np array of bits, not a string Can you provide any details on what your "transmitter" is? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unpacking data values into array of bits
On Thu, Feb 12, 2015 at 2:21 PM, Neal Becker wrote: > > I need to transmit some data values. These values will be float and long > values. I need them encoded into a string of bits. > > The only way I found so far to do this seems rather roundabout: > > > np.unpackbits (np.array (memoryview(struct.pack ('d', pi > Out[45]: > array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, >0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, >0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], dtype=uint8) > > (which I'm not certain is correct) > > Also, I don't know how to reverse this process You already had your string ready for transmission with `struct.pack('d', pi)`. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why am I getting a FutureWarning with this code?
On Mon, Dec 22, 2014 at 10:35 AM, Steve Spicklemire wrote: > > I’m working on visual python (http://vpython.org) which lists numpy among its dependencies. > > I recently updated my numpy installation to 1.9.1 and I’m now encountering this error: > > /usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/VPython-6.10-py2.7-macosx-10.10-x86_64.egg/visual_common/materials.py:70: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. > self.__setattr__(key, value) > > Oddly, the code in question looks like this: > > > 62 from . import cvisual > 63 from numpy import array, reshape, fromstring, ubyte, ndarray, zeros, asarray > 64 import os.path, math > 65 > 66 class raw_texture(cvisual.texture): > 67 def __init__(self, **kwargs): > 68 cvisual.texture.__init__(self) > 69 for key, value in kwargs.items(): > 70 self.__setattr__(key, value) The answer is in the implementation of cvisual.texture somewhere. This is Boost.Python C++ code, so there's a fair bit of magic going on such that I can't pinpoint the precise line that's causing this, but I suspect that it might be this one: https://github.com/BruceSherwood/vpython-wx/blob/master/src/python/numeric_texture.cpp#L258 When `data` is assigned, this line checks if the value is None so that it can raise an error to tell you not to do that (I believe, from the context, that `py::object()` is the Boost.Python idiom for getting the None singleton). I *think*, but am not positive, that Boost.Python converts the C++ == operator into a Python == operation, so that would trigger this FutureWarning. So VPython is the code that needs to be fixed to do an identity comparison with None rather than an equality comparison. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks?
On Thu, Dec 11, 2014 at 4:17 PM, Sebastian Berg wrote: > > On Do, 2014-12-11 at 16:56 +0100, Pierre Haessig wrote: > > Le 11/12/2014 16:52, Robert Kern a écrit : > > > > > > And we already have a numpy.broadcast() function. > > > > > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast.html > > True, I once read the docstring of this function. but never used it though. > > I am not sure it is really the right thing for most things since it > returns an old style iterator. Indeed. That's why I wrote broadcast_arrays(). > On the other hand arrays with 0-strides > need a bit more care (if we add this top level, one might have copy=True > as a default or so?). > Also because of that it is currently limited to NPY_MAXARGS (32). > Personally, I would like to see this type of functionality implemented > in C, and may be willing to help with it. This kind of code exists in > enough places in numpy so it can be stolen pretty readily. One option > would also be to have something like: > > np.common_shape(*arrays) > np.broadcast_to(array, shape) > # (though I would like many arrays too) > > and then broadcast_ar rays could be implemented in terms of these two. Why? What benefit does broadcast_arrays() get from being reimplemented in C? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)
On Thu, Dec 11, 2014 at 3:53 PM, Eric Moore wrote: > > On Thu, Dec 11, 2014 at 10:41 AM, Todd wrote: >> I recently became aware of another C-library for doing FFTs (and other things): >> >> https://github.com/arrayfire/arrayfire >> >> They claim to have comparable FFT performance to MKL when run on a CPU (they also support running on the GPU but that is probably outside the scope of numpy or scipy). It used to be proprietary but now it is under a BSD-3-Clause license. It seems it supports non-power-of-2 FFT operations as well (although those are slower). I don't know much beyond that, but it is probably worth looking in > > AFAICT the cpu backend is a FFTW wrapper. Indeed. https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cpu/fft.cpp#L16 -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks?
On Thu, Dec 11, 2014 at 2:47 PM, Nathaniel Smith wrote: > > On 11 Dec 2014 14:31, "Pierre Haessig" wrote: > > > > > > Le 11/12/2014 01:00, Nathaniel Smith a écrit : > > > Seems like a useful addition to me -- I've definitely wanted this in > > > the past. I agree with Stephan that reshape() might not be the best > > > place, though; I wouldn't think to look for it there. > > > > > > Two API ideas, which are not mutually exclusive: > > > > > > [...] > > > > > > 2) Add a broadcast_to(arr, shape) function, which broadcasts the array > > > to exactly the shape given, or else errors out if this is not > > > possible. > > That's also possible. Then there could be a point in `reshape` docstring. > > > > Could this function be named `broadcast` instead of `broadcast_to` ? > > (in coherence with `reshape`) > > It could, but then there wouldn't be much to distinguish it from broadcast_arrays. Broadcasting is generally a symmetric operation - see broadcast_arrays or arr1 + arr2. So the 'to' is there to give a clue that this function is not symmetric, and rather has a specific goal in mind. And we already have a numpy.broadcast() function. http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast.html -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion