On 5 June 2017 at 19:40, Chris Barker <chris.bar...@noaa.gov> wrote: > > >> > Python3 assumes 4-byte strings but in reality most of the time >> > we deal with 1-byte strings, so there is huge waste of resources >> > when dealing with 4-bytes. For many serious projects it is just not >> > needed. >> >> That's quite enough anglo-centrism, thank you. For when you need byte >> strings, Python 3 has a type for that. For when your strings contain >> text, bytes with no information on encoding are not enough. > > There was a big thread about this recently -- it seems to have not quite > come to a conclusion.
I have started to read that thread, though I've lost in idea transitions. Likely it was about some new string array type... > But anglo-centrism aside, there is substantial demand > for a "smaller" way to store mostly-ascii text. > Obviously there is demand. Terror of unicode touches many aspects of programmers life. It is not Numpy's problem though. The realistic scenario for satisfaction for this demand is a hard and wide problem. Foremost, it comes down to the question of defining this "optimal 8-bit character table". And "Latin-1", (exactly as it is) is not that optimal table, at least because of huge amount of accented letters. But, granted, if define most accented letters as "optional", i.e . delete them then it is quite reasonable basic char table to start with. Further comes the question of popularizisng new table (which doesn't even exists yet). >> > There can be some convenience methods for ascii operations, >> > like eg char.toupper(), but currently they don't seem to work with >> > integer >> > arrays so why not make those potentially useful methots usable >> > and make them work on normal integer arrays? >> I don't know what you're doing, but I don't think numpy is normally the >> right tool for text manipulation... > > > I agree here. But if one were to add such a thing (vectorized string > operations) -- I'd think the thing to do would be to wrap (or port) the > python string methods. But it shoudl only work for actual string dtypes, of > course. > > note that another part of the discussion previously suggested that we have a > dtype that wraps a native python string object -- then you'd get all for > free. This is essentially an object array with strings in it, which you can > do now. > Well here I must admit I don't quite understand the whole idea of "numpy array of string type". How used? What is main bebefit/feature...? Example integer array usage in context of textual data in my case: - holding data in a text editor (mutability+indexing/slicing) - filtering, transformations (e.g. table translations, cryptography, etc.) String type array? Will this be a string array you describe: s= "012 abc" arr = np.array(s) print ("type ", arr.dtype) print ("shape ", arr.shape) print ("my array: ", arr) arr = np.roll(arr[0],2) print ("my array: ", arr) -> type <U7 shape () my array: 012 abc my array: 012 abc So what it does? What's up with shape? e.g. here I wanted to 'roll' the string. How would I replace chars? or delete? What is the general idea behind? Mikhail _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion