[Numpy-discussion] Bug in np.cross for 2D vectors

2014-07-15 Thread Neil Hodgson
Hi,

We came across this bug while using np.cross on 3D arrays of 2D vectors.
The first example shows the problem and we looked at the source for np.cross 
and believe we found the bug - an unnecessary swapaxes when returning the 
output (comment inserted in the code).

Thanks
Neil 

# Example


shape = (3,5,7,2)

# These are effectively 3D arrays (3*5*7) of 2D vectors
data1 = np.random.randn(*shape)
data2 = np.random.randn(*shape)

# The cross product of data1 and data2 should produce a (3*5*7) array of scalars
cross_product_longhand = 
data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0]
print 'longhand output shape:',cross_product_longhand.shape # and it does

cross_product_numpy = np.cross(data1,data2)
print 'numpy output shape:',cross_product_numpy.shape # It seems to have 
transposed the last 2 dimensions

if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all():
print 'Unexpected transposition in numpy.cross (numpy version 
%s)'%np.__version__

# np.cross L1464if axis is not None: 
    axisa, axisb, axisc=(axis,)*3
a = asarray(a).swapaxes(axisa, 0)
b = asarray(b).swapaxes(axisb, 0)
msg = incompatible dimensions for cross product\n\
   (dimension must be 2 or 3)
if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]):
raise ValueError(msg)
if a.shape[0] == 2:    if (b.shape[0] == 2): 
    cp = a[0]*b[1] - a[1]*b[0]
    if cp.ndim == 0:
    return cp
    else:
    ## WE SHOULD NOT SWAPAXES HERE! 
    ## For 2D vectors the first axis has been 

    ## collapsed during the cross product
   return cp.swapaxes(0, axisc)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-15 Thread Andrew Collette
Hi Chuck,

 This note proposes to adapt the currently existing 'a'
 type letter, currently aliased to 'S', as a new fixed encoding dtype. Python
 3.3 introduced two one byte internal representations for unicode strings,
 ascii and latin1. Ascii has the advantage that it is a subset of UTF-8,
 whereas latin1 has a few more symbols. Another possibility is to just make
 it an UTF-8 encoding, but I think this would involve more overhead as Python
 would need to determine the maximum character size.

For storing data in HDF5 (PyTables or h5py), it would be somewhat
cleaner if either ASCII or UTF-8 are used, as these are the only two
charsets officially supported by the library.  Latin-1 would require a
custom read/write converter, which isn't the end of the world but
would be tricky to do in a correct way, and likely somewhat slow.
We'd also run into truncation issues since certain latin-1 chars
become multibyte sequences in UTF8.

I assume 'a' strings would still be null-padded?

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-15 Thread Jacco Hoekstra - LR
WeIl, I do not see the confusion here (only due to the use of the array 
function, maybe). It is a string, after all, so it should be colour-coded as 
such.

I would love to keep this feaure of np.mat in somehow, named np.txt2arr or 
something. We, linear algebraists, will already lose the .I method for matrix 
inversion, the * for matrix multiplication, let’s keep at least one of the many 
handy features of the matrix-type in.  It is simply a very useful, short-hand 
way, probably a separate function, to make a 2D-array. If you think it’s ugly, 
don’t use it. But it certainly is faster to type it and former Matlab-users 
will love it as well. Just my 2 cts.

From: numpy-discussion-boun...@scipy.org 
[mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Alexander Belopolsky
Sent: zondag 13 juli 2014 19:31
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

Also, the use of strings will confuse most syntax highlighters.  Compare the 
two options in this screenshot:

[Inline image 2]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-15 Thread Alexander Belopolsky
Also, the use of strings will confuse most syntax highlighters.  Compare
the two options in this screenshot:

[image: Inline image 2]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-15 Thread Chris Barker
On Sat, Jul 12, 2014 at 10:17 AM, Charles R Harris 
charlesr.har...@gmail.com wrote:

 As previous posts have pointed out, Numpy's `S` type is currently treated
 as a byte string, which leads to more complicated code in python3.


Also, a byte string in py3 is not, in fact the same as the py2 string type.
So we have a problem -- if we want 'S' to mean what it essentially does in
py2, what do we map it to in pure-python land?

I propose we embrace the py3 model as fully as possible:

There is text data, and there is binary data. In py3, that is 'str' and
'bytes'.

So numpy should have dtypes to match these. We're a bit stuck, however,
because 'S' mapped to the py2 string type, which no longer exists in py3.
Sorry not running py3 to see what 'S' does now, but I know it's bit broken,
and may be too late to change it.

But: it is certainly a common case in the scientific world to have
1-byte-per-character string data, and care about store size. So a
1-byte-per-character text data types may be a good idea:

As for a bytes type -- do we need it, or are we fine with simply using
uint8 arrays? (or, even the most common case, converting directly to the
type that is actually stored in those bytes...


 especially for ascii strings. This note proposes to adapt the currently
 existing 'a' type letter, currently aliased to 'S', as a new fixed encoding
 dtype.


+1


 Python 3.3 introduced two one byte internal representations for unicode
 strings, ascii and latin1. Ascii has the advantage that it is a subset of
 UTF-8, whereas latin1 has a few more symbols.


+1 for latin-1 -- those extra symbols are handy. Also, at least with
Python's stdlib encoding, you can round-trip any binary data through
latin-1 -- kind of making it act like a bytes object


 Another possibility is to just make it an UTF-8 encoding, but I think this
 would involve more overhead as Python would need to determine the maximum
 character size.


yeah -- that is a) overhead, and b) breaks the numpy fixed size dtype
model. And it's trickier for numpy arrays, 'cause they are mutable --
python strings can do OK, as they don't need to accommodate potentially
changing sizes of strings.

On Sat, Jul 12, 2014 at 5:02 PM, Nathaniel Smith n...@pobox.com wrote:

 I feel like for most purposes, what we *really* want is a variable length
 string dtype (I.e., where each element can be a different length.).


well, that is fundamentally different than the usual numpy data model -- it
would require that the array store pointers and dereference them on use --
is there anywhere else in numpy (other than the object dtype ) that does
that?

And if we did -- would it end up having any advantage over putting strings
in an object array? Or for that matter, using a list of strings instead?



 Pandas pays quite some price in overhead to fake this right now. Adding
 such a thing will cause some problems regarding compatibility (what to do
 with array([foo])) and education, but I think it's worth it in the long
 run.


i.e do you use the fixed-length type or the variable-length type? I'm not
sure it's to killer to have a default and let eh user set a dtype if they
want something else.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] __numpy_ufunc__

2014-07-15 Thread Charles R Harris
Hi All,

Julian has raised the question of including numpy_ufunc in numpy 1.9. I
don't feel strongly one way or the other, but it doesn't seem to be
finished yet and 1.10 might be a better place to work out the remaining
problems along with the astropy folks testing possible uses.

Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-15 Thread Olivier Grisel
2014-07-13 19:05 GMT+02:00 Alexander Belopolsky ndar...@mac.com:

 On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith n...@pobox.com wrote:

 I feel like for most purposes, what we *really* want is a variable length
 string dtype (I.e., where each element can be a different length.).



 I've been toying with the idea of creating an array type for interned
 strings.  In many applications dealing with large arrays of variable size
 strings, the strings come from a relatively short set of names.  Arrays of
 interned strings can be manipulated very efficiently because in may respects
 they are just like arrays of integers.

+1 I think this is why pandas is using dtype=object to load string
data: in many cases short string values are used to represent
categorical variables with a comparatively small cardinality of
possible values for a dataset with comparatively numerous records.

In that case the dtype=object is not that bad as it just stores
pointer on string objects managed by Python. It's possible to intern
the strings manually at load time (I don't know if pandas or python
already do it automatically in that case). The integer semantics is
good for that case. Having an explicit dtype might be even better.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style

2014-07-15 Thread Nathaniel Smith
On Sun, Jul 13, 2014 at 6:31 PM, Alexander Belopolsky ndar...@mac.com
wrote:

 Also, the use of strings will confuse most syntax highlighters.  Compare
 the two options in this screenshot:

 [image: Inline image 2]


I guess this is a minor issue for real code, but even IPython doesn't
(yet?) provide syntax highlighting for lines as they're typed, and this is
a tool intended mainly for interactive use.

That screenshot also I think illustrates why people have such a preference
for the first syntax. The second line looks nice, but try typing it quickly
and getting all the commas located correctly inside versus outside of each
of the triply-nested brackets...

No-one's come up with any names for this that are nearly as good as arr.
Is it really that bad to have to type one extra character, np.array instead
of np.arrtab?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String type again.

2014-07-15 Thread Sebastian Berg
On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote:
 As previous posts have pointed out, Numpy's `S` type is currently
 treated as a byte string, which leads to more complicated code in
 python3. OTOH, the unicode type is stored as UCS4, which consumes a
 lot of space, especially for ascii strings. This note proposes to
 adapt the currently existing 'a' type letter, currently aliased to
 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte
 internal representations for unicode strings, ascii and latin1. Ascii
 has the advantage that it is a subset of UTF-8, whereas latin1 has a
 few more symbols. Another possibility is to just make it an UTF-8
 encoding, but I think this would involve more overhead as Python would
 need to determine the maximum character size. These are just
 preliminary thoughts, comments are welcome.
 

Just wondering, couldn't we have a type which actually has an
(arbitrary, python supported) encoding (and bytes might even just be a
special case of no encoding)? Basically storing bytes and on access do
element[i].decode(specified_encoding) and on storing element[i] =
value.encode(specified_encoding).

There is always the never ending small issue of trailing null bytes. If
we want to be fully compatible, such a type would have to store the
string length explicitly to support trailing null bytes.

- Sebastian

 
 Chuck  
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion