Array and string interoperability Just sharing my thoughts and few ideas about simplification of casting strings to arrays. In examples assume Numpy is in the namespace (from numpy import *)
Initialize array from a string currently looks like: s= "012 abc" A= fromstring(s,"u1") print A -> [48 49 50 32 97 98 99] Perfect. Now when writing values it will not work as IMO it should, namley consider this example: B= zeros(7,"u1") B[0]=s[1] print B -> [1 0 0 0 0 0 0] Ugh? It tries to parse the s[1] character "1" as integer and writes 1 to B[0]. First thing I would expect is a value error and I'd never expect it does that high-level manipulations with parsing. IMO ideally it would do the following instead: B[0]=s[1] print B -> [49 0 0 0 0 0 0] So it should just write ord(s[1]) to B. Sounds logical? For me very much. Further, one could write like this: B[:] = s print B-> [48 49 50 32 97 98 99] Namely cast the string into byte array. IMO this would be the logical expected behavior. Currently it just throws the value error if met non-digits in a string, so IMO current casting hardly can be of practical use. Furthermore, I think this code: A= array(s,"u1") Could act exactly same as: A= fromstring(s,"u1") But this is just a side-idea for spelling simplicty/generality. Not really necessary. Further thoughts: If trying to create "u1" array from a Pyhton 3 string, question is, whether it should throw an error, I think yes, and in this case "u4" type should be explicitly specified by initialisation, I suppose. And e.g. translation from unicode to extended ascii (Latin1) or whatever should be done on Python side or with explicit translation. Python3 assumes 4-byte strings but in reality most of the time we deal with 1-byte strings, so there is huge waste of resources when dealing with 4-bytes. For many serious projects it is just not needed. Furthermore I think some of the methods from "chararray" submodule should be possible to use directly on normal integer arrays without conversions to other array types. So I personally don't realy get why the need of additional chararray type, Its all numbers anyway and it's up to the programmer to decide what size of translation tables/value ranges he wants to use. There can be some convinience methods for ascii operations, like eg char.toupper(), but currently they don't seem to work with integer arrays so why not make those potentially useful methots usable and make them work on normal integer arrays? Or even migrate them to the root namespace to e.g. introduce names with prefixes: A=ascii_toupper(A) A=ascii_tolower(A) Many things can be be achieved with general numeric methods, e.g. translate/reduce the array. Here obviosly I mean not dynamical arrays, just fixed-sized arrays. How to deal with dynamically changing array sizes is another problematic, and it depends on how the software is designed in the first place and what it does with the data. For my own text-editing software project I consider fixed allocated 1D and 2D "uint8" arrays only. And specifically I experiment with own encodings, so just as a side-note, I don't think that encoding should be assumed much for creating new array types, it is up to the programmer to decide what 'meanings' the bytes have. Kind regards, Mikhail _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion