On Thursday, March 15, 2012, Nathaniel Smith <n...@pobox.com> wrote:
> On Wed, Mar 14, 2012 at 1:44 AM, Mark Wiebe <mwwi...@gmail.com> wrote:
>> On Fri, Mar 9, 2012 at 8:55 AM, Bryan Van de Ven <bry...@continuum.io>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I have started working on a NEP for adding an enumerated type to NumPy.
>>> It is on my GitHub:
>>>
>>>     https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst
>>>
>>> It is still very rough, and incomplete in places. But I would like to
>>> get feedback sooner rather than later in order to refine it. In
>>> particular there are a few questions inline in the document that I would
>>> like input on. Any comments, suggestions, questions, concerns, etc. are
>>> very welcome.
>>
>>
>> This looks like a great start to me.
>>
>> I think the open/closed enum distinction will need to be explored a
little
>> bit more, because it interacts with dtype immutability/hashability. Do
you
>> know if there are any examples of Python objects in the wild that
>> dynamically convert from not being hashable (i.e. raising an exception if
>> used as a dict key) to become hashable?
>
> I haven't run into any...
>
> Thinking about it, I'm not sure I have any use case for this type
> being mutable. Maybe someone else can think of one? The first case
> that came to mind was in reading a large text file, where you want to
> (1) auto-create an enum, (2) use a pre-allocated array, and (3) don't
> know ahead of time what the levels are:
>
>  a = np.empty(lines_in_file, dtype=np.dtype(Enum()))
>  for i, line in enumerate(f):
>    field = line.split()[0]
>    a.dtype.add_level(field)
>    a[i] = field
>  a.dtype.seal()
>
> But really this is just can be done just as easily and efficiently
> without a mutable dtype:
>
>  a = np.empty(lines_in_file, dtype=np.int32)
>  intern_table = {}
>  next_level = 0
>  for i, line in enumerate(f):
>    field = line.split()[0]
>    val = intern_table.setdefault(field, next_level)
>    if val == next_level:
>      next_level += 1
>    a[i] = val
>  a = a.view(dtype=np.dtype(Enum(map=intern_table)))
>
> I notice that the HDF5 C library has a concept of open versus closed
> enums, but I can't tell from the documentation at hand why this is; it
> looks like it might just be a limitation of the implementation. (Like,
> a workaround for C's lack of a standard mapping type, which makes it
> inconvenient to pass in all the mappings in to a single API call.)
>
>> It might be worth adding a section which briefly compares and contrasts
the
>> proposed functionality with enums in various programming languages. Here
are
>> two links I found to try and get an idea:
>>
>> MS on C# enum usage:
>> http://msdn.microsoft.com/en-us/library/cc138362.aspx
>> Wikipedia on C++ enum class:
>> http://en.wikipedia.org/wiki/C%2B%2B11#Strongly_typed_enumerations
>>
>> For example, the C# enum has a way to enable a "flags" mode, which will
>> create successive powers of 2. This may not be a feature NumPy needs,
but if
>> people are finding it useful in C#, maybe it would be useful here too.
>
> There's also a long, ongoing debate about how to do enums in Python --
e.g.:
>  http://www.python.org/dev/peps/pep-0354/
>  http://pypi.python.org/pypi/enum/
>  http://pypi.python.org/pypi/enum_meta/
>  http://pypi.python.org/pypi/flufl.enum/
>  http://pypi.python.org/pypi/lazr.enum/
>  http://pypi.python.org/pypi/pyutilib.enum/
>  http://pypi.python.org/pypi/coding/
>
http://stackoverflow.com/questions/36932/whats-the-best-way-to-implement-an-enum-in-python
> I guess Guido likes flufl.enum:
>  http://mail.python.org/pipermail/python-ideas/2011-July/010909.html
>
> BUT, I'm not sure any of this is relevant at all. "Enums" are a
> programming language feature that are, first and foremost, about
> injecting names into your code's namespace. What I'm hoping to see is
> a dtype for holding categorical data, similar to an R "factor"
>  http://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
>  https://svn.r-project.org/R/trunk/src/library/base/R/factor.R (NB:
> This is GPL code if anyone is paranoid about contamination, but also
> the most complete API description available)
> or an HDF5 "enum"
>  http://www.hdfgroup.org/HDF5/doc/H5.user/Datatypes.html#Datatypes_Enum
> I believe pandas has some functionality along these lines too, though
> I can't find it in the online docs -- hopefully Wes will fill us in.
>
> These are basically objects that act for most purposes like string
> arrays, but in which all strings are required to come from a finite,
> specified list. This list acts like some metadata attached to the
> array; it's order may or may not be significant. And they're
> implemented internally as integer arrays.
>
> I'm not sure what it would even mean to treat this kind of data as
> "flags", since you can't take the bitwise-or of two strings...
>
> -- Nathaniel
>

I guess my problem is that this isn't _quite_ like an enum that I am
familiar with (but not quite unlike it either).  Should we call it
"factor", to avoid confusion or are there going to be too many that won't
know what that is, but would be drawn in by a name of "enum"?

Just a thought.

Ben Root
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to