[Numpy-discussion] Bug in np.cross for 2D vectors
Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. The first example shows the problem and we looked at the source for np.cross and believe we found the bug - an unnecessary swapaxes when returning the output (comment inserted in the code). Thanks Neil # Example shape = (3,5,7,2) # These are effectively 3D arrays (3*5*7) of 2D vectors data1 = np.random.randn(*shape) data2 = np.random.randn(*shape) # The cross product of data1 and data2 should produce a (3*5*7) array of scalars cross_product_longhand = data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0] print 'longhand output shape:',cross_product_longhand.shape # and it does cross_product_numpy = np.cross(data1,data2) print 'numpy output shape:',cross_product_numpy.shape # It seems to have transposed the last 2 dimensions if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all(): print 'Unexpected transposition in numpy.cross (numpy version %s)'%np.__version__ # np.cross L1464if axis is not None: axisa, axisb, axisc=(axis,)*3 a = asarray(a).swapaxes(axisa, 0) b = asarray(b).swapaxes(axisb, 0) msg = incompatible dimensions for cross product\n\ (dimension must be 2 or 3) if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]): raise ValueError(msg) if a.shape[0] == 2: if (b.shape[0] == 2): cp = a[0]*b[1] - a[1]*b[0] if cp.ndim == 0: return cp else: ## WE SHOULD NOT SWAPAXES HERE! ## For 2D vectors the first axis has been ## collapsed during the cross product return cp.swapaxes(0, axisc) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
Hi Chuck, This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. For storing data in HDF5 (PyTables or h5py), it would be somewhat cleaner if either ASCII or UTF-8 are used, as these are the only two charsets officially supported by the library. Latin-1 would require a custom read/write converter, which isn't the end of the world but would be tricky to do in a correct way, and likely somewhat slow. We'd also run into truncation issues since certain latin-1 chars become multibyte sequences in UTF8. I assume 'a' strings would still be null-padded? Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
WeIl, I do not see the confusion here (only due to the use of the array function, maybe). It is a string, after all, so it should be colour-coded as such. I would love to keep this feaure of np.mat in somehow, named np.txt2arr or something. We, linear algebraists, will already lose the .I method for matrix inversion, the * for matrix multiplication, let’s keep at least one of the many handy features of the matrix-type in. It is simply a very useful, short-hand way, probably a separate function, to make a 2D-array. If you think it’s ugly, don’t use it. But it certainly is faster to type it and former Matlab-users will love it as well. Just my 2 cts. From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Alexander Belopolsky Sent: zondag 13 juli 2014 19:31 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style Also, the use of strings will confuse most syntax highlighters. Compare the two options in this screenshot: [Inline image 2] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
Also, the use of strings will confuse most syntax highlighters. Compare the two options in this screenshot: [image: Inline image 2] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
On Sat, Jul 12, 2014 at 10:17 AM, Charles R Harris charlesr.har...@gmail.com wrote: As previous posts have pointed out, Numpy's `S` type is currently treated as a byte string, which leads to more complicated code in python3. Also, a byte string in py3 is not, in fact the same as the py2 string type. So we have a problem -- if we want 'S' to mean what it essentially does in py2, what do we map it to in pure-python land? I propose we embrace the py3 model as fully as possible: There is text data, and there is binary data. In py3, that is 'str' and 'bytes'. So numpy should have dtypes to match these. We're a bit stuck, however, because 'S' mapped to the py2 string type, which no longer exists in py3. Sorry not running py3 to see what 'S' does now, but I know it's bit broken, and may be too late to change it. But: it is certainly a common case in the scientific world to have 1-byte-per-character string data, and care about store size. So a 1-byte-per-character text data types may be a good idea: As for a bytes type -- do we need it, or are we fine with simply using uint8 arrays? (or, even the most common case, converting directly to the type that is actually stored in those bytes... especially for ascii strings. This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. +1 Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. +1 for latin-1 -- those extra symbols are handy. Also, at least with Python's stdlib encoding, you can round-trip any binary data through latin-1 -- kind of making it act like a bytes object Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. yeah -- that is a) overhead, and b) breaks the numpy fixed size dtype model. And it's trickier for numpy arrays, 'cause they are mutable -- python strings can do OK, as they don't need to accommodate potentially changing sizes of strings. On Sat, Jul 12, 2014 at 5:02 PM, Nathaniel Smith n...@pobox.com wrote: I feel like for most purposes, what we *really* want is a variable length string dtype (I.e., where each element can be a different length.). well, that is fundamentally different than the usual numpy data model -- it would require that the array store pointers and dereference them on use -- is there anywhere else in numpy (other than the object dtype ) that does that? And if we did -- would it end up having any advantage over putting strings in an object array? Or for that matter, using a list of strings instead? Pandas pays quite some price in overhead to fake this right now. Adding such a thing will cause some problems regarding compatibility (what to do with array([foo])) and education, but I think it's worth it in the long run. i.e do you use the fixed-length type or the variable-length type? I'm not sure it's to killer to have a default and let eh user set a dtype if they want something else. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] __numpy_ufunc__
Hi All, Julian has raised the question of including numpy_ufunc in numpy 1.9. I don't feel strongly one way or the other, but it doesn't seem to be finished yet and 1.10 might be a better place to work out the remaining problems along with the astropy folks testing possible uses. Thoughts? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
2014-07-13 19:05 GMT+02:00 Alexander Belopolsky ndar...@mac.com: On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith n...@pobox.com wrote: I feel like for most purposes, what we *really* want is a variable length string dtype (I.e., where each element can be a different length.). I've been toying with the idea of creating an array type for interned strings. In many applications dealing with large arrays of variable size strings, the strings come from a relatively short set of names. Arrays of interned strings can be manipulated very efficiently because in may respects they are just like arrays of integers. +1 I think this is why pandas is using dtype=object to load string data: in many cases short string values are used to represent categorical variables with a comparatively small cardinality of possible values for a dataset with comparatively numerous records. In that case the dtype=object is not that bad as it just stores pointer on string objects managed by Python. It's possible to intern the strings manually at load time (I don't know if pandas or python already do it automatically in that case). The integer semantics is good for that case. Having an explicit dtype might be even better. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style
On Sun, Jul 13, 2014 at 6:31 PM, Alexander Belopolsky ndar...@mac.com wrote: Also, the use of strings will confuse most syntax highlighters. Compare the two options in this screenshot: [image: Inline image 2] I guess this is a minor issue for real code, but even IPython doesn't (yet?) provide syntax highlighting for lines as they're typed, and this is a tool intended mainly for interactive use. That screenshot also I think illustrates why people have such a preference for the first syntax. The second line looks nice, but try typing it quickly and getting all the commas located correctly inside versus outside of each of the triply-nested brackets... No-one's come up with any names for this that are nearly as good as arr. Is it really that bad to have to type one extra character, np.array instead of np.arrtab? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String type again.
On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: As previous posts have pointed out, Numpy's `S` type is currently treated as a byte string, which leads to more complicated code in python3. OTOH, the unicode type is stored as UCS4, which consumes a lot of space, especially for ascii strings. This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. These are just preliminary thoughts, comments are welcome. Just wondering, couldn't we have a type which actually has an (arbitrary, python supported) encoding (and bytes might even just be a special case of no encoding)? Basically storing bytes and on access do element[i].decode(specified_encoding) and on storing element[i] = value.encode(specified_encoding). There is always the never ending small issue of trailing null bytes. If we want to be fully compatible, such a type would have to store the string length explicitly to support trailing null bytes. - Sebastian Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion