On Sun, Feb 22, 2015 at 2:42 PM, Charles R Harris
<charlesr.har...@gmail.com> wrote:
> On Sun, Feb 22, 2015 at 12:52 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> On Sun, Feb 22, 2015 at 10:21 AM, Aldcroft, Thomas
>> <aldcr...@head.cfa.harvard.edu> wrote:
>> > The idea of a one-byte string dtype has been extensively discussed twice
>> > before, with a lot of good input and ideas, but no action [1, 2].
>> >
>> > tl;dr: Perfect is the enemy of good.  Can numpy just add a one-byte
>> > string
>> > dtype named 's' that uses latin-1 encoding as a bridge to enable Python
>> > 3
>> > usage in the near term?
>> I think this is a good idea. I think overall it would be good for
>> numpy to switch to using variable-length strings in most cases (cf.
>> pandas), which is a different kind of change, but fixed-length 8-bit
>> encoded text is obviously a common on-disk format in scientific
>> applications, so numpy will still need some way to deal with it
>> conveniently. In the long run we'd like to have more flexibility (e.g.
>> allowing choice of character encoding), but since this proposal is a
>> subset of that functionality, then it won't interfere with later
>> improvements. I can see an argument for utf8 over latin1, but it
>> really doesn't matter that much so whatever, blue and purple bikesheds
>> are both fine.
>> The tricky bit here is "just" :-). Do you want to implement this? Do
>> you know someone who does? It's possible but will be somewhat
>> annoying, since to do it directly without refactoring how dtypes work
>> first then you'll have to add lots of copy-paste code to all the
>> different ufuncs.
> We're also running out of letters for types. We need to decide on how to
> extend that representation. It would seem straight forward to just start
> using multiple letters, but there is a lot of code the uses things like `for
> dt in 'efdg':`. Can we perhaps introduce an extended dtype structure, maybe
> with some ideas from dynd and versioning.

I don't mind using "s" for this particular case, but in general I
think we should de-emphasise the string representations, and even
allow new dtypes to forgo them entirely. We have all of Python to work
with. It's much nicer for users and for us to write things like


instead of


or whatever weird ad-hoc syntax we come up with.

(Obviously there are some details to work out with things like the
.npy format, but these seem solveable.)


Nathaniel J. Smith -- http://vorpus.org
NumPy-Discussion mailing list

Reply via email to