Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 6:37 AM Adam Funk wrote: > Is there a "bulletproof" version of json.dump somewhere that will > convert bytes to str, any other iterables to list, etc., so you can > just get your data into a file & keep working? > Is the data only being read by python programs? If so, consider using pickle: https://docs.python.org/3/library/pickle.html Unlike json dumping, the goal of pickle is to represent objects as exactly as possible and *not* to be interoperable with other languages. If you're using json to pass data between python and some other language, you don't want to silently convert bytes to strings. If you have a bytestring of utf-8 data, you want to utf-8 decode it before passing it to json.dumps. Likewise, if you have latin-1 data, you want to latin-1 decode it. There is no universal and correct bytes-to-string conversion. On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico wrote: > Maybe what we need is to fork out the default JSON encoder into two, > or have a "strict=True" or "strict=False" flag. In non-strict mode, > round-tripping is not guaranteed, and various types will be folded to > each other - mainly, many built-in and stdlib types will be > represented in strings. In strict mode, compliance with the RFC is > ensured (so ValueError will be raised on inf/nan), and everything > should round-trip safely. > Wouldn't it be reasonable to represent this as an encoder which is provided by `json`? i.e. from json import dumps, UnsafeJSONEncoder ... json.dumps(foo, cls=UnsafeJSONEncoder) Emphasizing the "Unsafe" part of this and introducing people to the idea of setting an encoder also seems nice. On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico wrote: > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list > wrote: > > > The 'json' module already fails to provide round-trip functionality: > > > > >>> for data in ({True: 1}, {1: 2}, (1, 2)): > > ... if json.loads(json.dumps(data)) != data: > > ... print('oops', data, json.loads(json.dumps(data))) > > ... > > oops {True: 1} {'true': 1} > > oops {1: 2} {'1': 2} > > oops (1, 2) [1, 2] > > There's a fundamental limitation of JSON in that it requires string > keys, so this is an obvious transformation. I suppose you could call > that one a bug too, but it's very useful and not too dangerous. (And > then there's the tuple-to-list transformation, which I think probably > shouldn't happen, although I don't think that's likely to cause issues > either.) Ideally, all of these bits of support for non-JSON types should be opt-in, not opt-out. But it's not worth making a breaking change to the stdlib over this. Especially for new programmers, the notion that deserialize(serialize(x)) != x just seems like a recipe for subtle bugs. You're never guaranteed that the deserialized object will match the original, but shouldn't one of the goals of a de/serialization library be to get it as close as is reasonable? I've seen people do things which boil down to json.loads(x)["some_id"] == UUID(...) plenty of times. It's obviously wrong and the fix is easy, but isn't making the default json encoder less strict just encouraging this type of bug? Comparing JSON data against non-JSON types is part of the same category of errors: conflating JSON with dictionaries. It's very easy for people to make this mistake, especially since JSON syntax is a subset of python dict syntax, so I don't think `json.dumps` should be encouraging it. On Tue, Jul 7, 2020 at 6:52 AM Adam Funk wrote: > Here's another "I'd expect to have to deal with this sort of thing in > Java" example I just ran into: > > >>> r = requests.head(url, allow_redirects=True) > >>> print(json.dumps(r.headers, indent=2)) > ... > TypeError: Object of type CaseInsensitiveDict is not JSON serializable > >>> print(json.dumps(dict(r.headers), indent=2)) > { > "Content-Type": "text/html; charset=utf-8", > "Server": "openresty", > ... > } > Why should the JSON encoder know about an arbitrary dict-like type? It might implement Mapping, but there's no way for json.dumps to know that in the general case (because not everything which implements Mapping actually inherits from the Mapping ABC). Converting it to a type which json.dumps understands is a reasonable constraint. Also, wouldn't it be fair, if your object is "case insensitive" to serialize it as { "CONTENT-TYPE": ... } or { "content-type": ... } or ... ? `r.headers["content-type"]` presumably gets a hit. `json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a KeyError. This seems very much out of scope for the json package because it's not clear what it's supposed to do with this type. Libraries should ask users to specify what they mean and not make potentially harmful assumptions. Best, -Stephen -- https://mail.python.org/mailman/listinfo/python-list
How to handle async and inheritance?
Hi all, I'm looking at a conflict between code sharing via inheritance and async usage. I would greatly appreciate any guidance, ideas, or best practices which might help. I'll speak here in terms of a toy example, but, if anyone wants to look at the real code, I'm working on webargs. [1] Specifically, we have a `Parser` class and an `AsyncParser` subclass, and the two have a lot of code duplication to handle async/await. [2] I've got an inheritance structure like this: class MyAbstractType: ... class ConcreteType(MyAbstractType): ... class AsyncConcreteType(MyAbstractType): ... One of my goals, of course, is to share code between ConcreteType and AsyncConcreteType via their parent. But the trouble is that there are functions defined like this: class MyAbstractType: def foo(self): x = self.bar() y = self.baz(x) ... # some code here, let's say 20 lines class AsyncConcreteType(MyAbstractType): async def foo(self): x = await self.bar() y = self.baz(x) ... # the same 20 lines as above, but with an `await` added every-other line I'm aware that I'm looking at "function color" and that my scenario is pitting two language features -- inheritance and async -- against one another. But I don't see a clean way out if we want to support an "async-aware" version of a class with synchronous methods. What I tried already, which I couldn't get to work, was to either fiddle with things like `inspect` to see if the current function is async or to use a class variable to indicate that the current class is the async version. The idea was to write something like class MyAbstractType: _use_async_calls = False def foo(self): x = self._await_if_i_am_async(self.bar) y = self.baz(x) ... and that way, the async subclass just needs to change signatures to be async with little stubs and set the flag, class AsyncConcreteType(MyAbstractType): _use_async_calls = True async def foo(self): return super().foo() async def bar(self): return super().bar() but this (some of you are ahead of me on this, I'm sure!) did not work at all. I couldn't find any way to write `_await_if_i_am_async`, other than possibly doing some weird things with `exec`. Once I concluded that the python wouldn't let me decide whether or not to use await at runtime, at least with tools of which I'm aware, I basically gave up on that route. However, it seems like there should be some clever technique for defining `MyAbstractType.foo` such that it awaits on certain calls *if* there's some indication that it should do so. It's obviously possible with `exec`, but I don't want to convert all of the core codepaths into giant `exec` blocks. Perhaps there's a way which is safer and more maintainable though? If anyone has experience in this space and can offer up a good solution, I would love to hear about it. And if someone wants to go above-and-beyond and look at webargs, and suggest a better way for us to support aiohttp, I'd obviously welcome that kind of help as well! Thanks in advance, and best regards, -Stephen [1] https://github.com/marshmallow-code/webargs [2] https://github.com/marshmallow-code/webargs/blob/6668d267fa4135cf3f653e422bd168298f2213a8/src/webargs/asyncparser.py#L24 -- https://mail.python.org/mailman/listinfo/python-list
Re: How to test?
Worth noting: by assertTrue he probably meant assertEqual. But I'd recommend using assertIn [1] if you're using unittest to check output written to stdout/stderr. That way, your tests are slightly more robust to changes in the exact output. pytest may also be helpful for this (or any!) type of testing. Disclaimer/warning: pytest can be confusing even for experienced python programmers because it does some fancy things. But if you put in the time to learn it, it's very popular because of the way it structures testsuites and code reuse (i.e. fixtures). It can do a lot to help you, and provides output capturing out of the box [2] as well as some handy tools for building temporary testing directories [3]. [1] https://docs.python.org/3.6/library/unittest.html#unittest.TestCase.assertIn [2] https://docs.pytest.org/en/stable/capture.html [3] https://docs.pytest.org/en/stable/tmpdir.html On Fri, Jun 19, 2020 at 2:18 PM Terry Reedy wrote: > On 6/17/2020 12:34 PM, Tony Flury via Python-list wrote: > > > In a recent application that I wrote (where output to the console was > > important), I tested it using the 'unittest' framework, and by patching > > sys.stderr to be a StringIO - that way my test case could inspect what > > was being output. > > Tony's code with hard returns added so that code lines remain separated > instead of wrapping. > > with patch('sys.stderr', StringIO()) as stderr: >application.do_stuff() self.assertTrue(stderr.getvalue(), >'Woops - that didn\'t work') > This doc, worth reading more than once, is > https://docs.python.org/3/library/unittest.mock.html#the-patchers > > -- > Terry Jan Reedy > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list