[Numpy-discussion] cannot decode 'S'

2014-01-23 Thread josef . pktd
truncating null bytes in 'S' breaks decoding that needs them

 a = np.array([si.encode('utf-16LE') for si in ['Õsc', 'zxc']], dtype='S')
 a
array([b'\xd5\x00s\x00c', b'z\x00x\x00c'],
  dtype='|S6')

 [ai.decode('utf-16LE') for ai in a]
Traceback (most recent call last):
  File pyshell#118, line 1, in module
[ai.decode('utf-16LE') for ai in a]
  File pyshell#118, line 1, in listcomp
[ai.decode('utf-16LE') for ai in a]
  File C:\Programs\Python33\lib\encodings\utf_16_le.py, line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x63 in position
4: truncated data

messy workaround (arrays in contrast to scalars are not truncated in `tostring`)

 [a[i:i+1].tostring().decode('utf-16LE') for i in range(len(a))]
['Õsc', 'zxc']

Found while playing with examples in the other thread.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] cannot decode 'S'

2014-01-23 Thread Chris Barker
Josef,

Nice find -- another reason why 'S' can NOT be used a-is for arbitrary
bytes.

See the other thread for my proposals about that.


 messy workaround (arrays in contrast to scalars are not truncated in
 `tostring`)

  [a[i:i+1].tostring().decode('utf-16LE') for i in range(len(a))]
 ['Õsc', 'zxc']


I think the real work around is to not try to store arbitrary bytes --
i.e. encoded text, in the 'S' dtype.

But  is there a convenient way to do it with other existing numpy types?

I tried to do it with uint8, and it's really awkward

-CHB





-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion