On Sun, Feb 6, 2022 at 9:42 PM Chris Angelico <ros...@gmail.com> wrote:

> > As for dataclasses, this is what i mean by "code" vs "data" -- if you
> know when you are writing the code exactly what key (fields, etc) you
> expect , and you want to be able to work with that data model as code (e.g.
> attribute access, maybe some methods, then you do:
> >
> > In [10]: @dataclass
> >     ...: class Stream:
> >     ...:     codec_type : str
> >     ...:     width: int
> >     ...:     height: int
> >
> > And if you have that data in a dict (say, from JSON, then you can
> extract it like this:
> >
> > In [11]: stream_info = {'codec_type': 'video',
> >     ...:                'width': 1024,
> >     ...:                'height': 768,
> >     ...:                }
> >
> > In [12]: stream = Stream(**stream_info)
> >
> > In [13]: stream
> > Out[13]: Stream(codec_type='video', width=1024, height=768)
> >
> > That only works if you dict is in exactly the right form, but that would
> be the case anyway.
> One very *very* important aspect of a huge number of JSON-based
> protocols is that they absolutely will not break if new elements are
> added. In other words, I look at the things I'm interested in, but
> those streams also have a ton of other information (frame rate,
> metadata, pixel format), which could get augmented at any time, and I
> should just happily ignore the parts I'm not looking for. Making that
> work with dataclasses (a) is even more boilerplate, and (b) would
> obscure the relationship between the dataclass and the JSON schema.

I believe some folks have asked for the ability for  **kwargs to be tacked
on to the dataclass generated __init__ -- I don't know if it will happen,
but that would address this use case.

Not sure what you mean by "obscure the relationship between the dataclass
and the JSON schema."

I guess you mean that the dataclass will then accept non-schema conforming
JSON, but if you don't want it to do that, then do allow that. For my part,
in an application that I'm doing all of of JSON -- data classes, I
explicitly add and "extra_data" field, so I can capture anything in the
JSON that doesn't have a "proper" place.

After I posted, I realized that dataclasses are probably not the simplest
solution -- but SimpleNamespace could be:

In [9]: stream_info = {'codec_type': 'video',
   ...:                'width': 1024,
   ...:                'height': 768,
   ...:                }

In [10]: stream = types.SimpleNamespace(**stream_info)

In [11]: stream.codec_type
Out[11]: 'video'

In [12]: stream.height
Out[12]: 768

In [13]: stream.width
Out[13]: 1024

In any case, if you don't like how dataclasses or SimpleNamespace does it,
then write you own custom class / converter -- I don't see the need for it
to be a language feature.

I'm not sure what you mean here about code vs data. What is the
> difference that you're drawing? Ultimately, I need to read a
> particular data structure and find the interesting parts of it. It's
> not about code. The only code is "iterate over info->streams, look at
> the codec_type, width, height, perform arithmetic on videos".

The distinction I'm trying to draw (and I did say it was a fuzzy one in
Python) is that data are things you can store in variables -- e.g. the keys
of a dict can be hard coded (known at code-writing time) or stored in a

Code is things like variable and attribute names that have to known at
code-writing time (baring metaprogramming techniques, get/setattr, etc).

In this case, we are looking to auto-extract variable from a dict -- you
can't even start to write that code unless you know what the keys in the
dict are -- if that's the case, then you know (at least part of) the
schema, and you can use dataclasses, etc, and get your code.

I"ve worked with systems (the netcdf4 library for example, if you want an
obscure one :-) ) that auto translate essentially keys in a dict to object
attributes. it seems pretty nifty at first:

ds = Dataset("the_file.nc")

But it ends up just making things harder -- you need to poke into the file
to see what names will be there, it's actually harder to introspect (can't
just look at .keys() ) -- and things really go to heck if the keys in your
data don't follow Python variable naming rules:

In [14]: stream_info = {'codec-type': 'video',
    ...:                'width': 1024,
    ...:                'height': 768,
    ...:                }

In [15]: stream = types.SimpleNamespace(**stream_info)

In [16]: stream.codec-type
AttributeError                            Traceback (most recent call last)
<ipython-input-16-26291ce709a5> in <module>
----> 1 stream.codec-type

AttributeError: 'types.SimpleNamespace' object has no attribute 'codec'

In [17]: stream.codec_type
AttributeError                            Traceback (most recent call last)
<ipython-input-17-6225e2eacec1> in <module>
----> 1 stream.codec_type

AttributeError: 'types.SimpleNamespace' object has no attribute 'codec_type'

In [18]: getattr(stream, 'codec-type')
Out[18]: 'video'

So: I say, keep your data in dicts, and if you want to load a code object
with that data, do it in a clearly defined way.

Again, in a quick script, maybe it'd be helpful occasionally, mostly saving
some typing (all those darn square brackets and quotes)[*] -- but I don't
think that's worth a language feature.


[*] I'm not being facetious here -- I write a lot of quick scripts, and DO
find typing:
a lot easier than

But I don't think it's worth a language feature

And now that I've thought about it -- maybe I'll start using the
SimpleNamespace trick in some of those quick scripts.


Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
Message archived at 
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to