On Thu, Jan 23, 2014 at 11:43 AM, Oscar Benjamin
<[email protected]> wrote:
> On Thu, Jan 23, 2014 at 11:23:09AM -0500, [email protected] wrote:
>>
>> another curious example, encode utf-8 to latin-1 bytes
>>
>> >>> b
>> array(['Õsc', 'zxc'],
>> dtype='<U3')
>> >>> b[0].encode('utf8')
>> b'\xc3\x95sc'
>> >>> b[0].encode('latin1')
>> b'\xd5sc'
>> >>> b.astype('S')
>> Traceback (most recent call last):
>> File "<pyshell#40>", line 1, in <module>
>> b.astype('S')
>> UnicodeEncodeError: 'ascii' codec can't encode character '\xd5' in
>> position 0: ordinal not in range(128)
>> >>> c = b.view('S4').astype('S1').view('S3')
>> >>> c
>> array([b'\xd5sc', b'zxc'],
>> dtype='|S3')
>> >>> c[0].decode('latin1')
>> 'Õsc'
>
> Okay, so it seems that .view() implicitly uses latin-1 whereas .astype() uses
> ascii:
>
>>>> np.array(['Õsc']).astype('S4')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character '\xd5' in position
> 0: ordinal not in range(128)
>>>> np.array(['Õsc']).view('S4')
> array([b'\xd5', b's', b'c'],
> dtype='|S4')
No, a view doesn't change the memory, it just changes the
interpretation and there shouldn't be any conversion involved.
astype does type conversion, but it goes through ascii encoding which fails.
>>> b = np.array(['Õsc', 'zxc'], dtype='<U3')
>>> b.tostring()
b'\xd5\x00\x00\x00s\x00\x00\x00c\x00\x00\x00z\x00\x00\x00x\x00\x00\x00c\x00\x00\x00'
>>> b.view('S12')
array([b'\xd5\x00\x00\x00s\x00\x00\x00c', b'z\x00\x00\x00x\x00\x00\x00c'],
dtype='|S12')
The conversion happens somewhere in the array creation, but I have no
idea about the memory encoding for uc2 and the low level layouts.
Josef
>
>> --------
>> The original numpy py3 conversion used latin-1 as default
>> (It's still used in statsmodels, and I haven't looked at the structure
>> under the common py2-3 codebase)
>>
>> if sys.version_info[0] >= 3:
>> import io
>> bytes = bytes
>> unicode = str
>> asunicode = str
>
> These two functions are an abomination:
>
>> def asbytes(s):
>> if isinstance(s, bytes):
>> return s
>> return s.encode('latin1')
>> def asstr(s):
>> if isinstance(s, str):
>> return s
>> return s.decode('latin1')
>
>
> Oscar
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion