Re: Bulletproof json.dump?

2020-07-07 Thread Stephen Rosen
On Mon, Jul 6, 2020 at 6:37 AM Adam Funk  wrote:

> Is there a "bulletproof" version of json.dump somewhere that will
> convert bytes to str, any other iterables to list, etc., so you can
> just get your data into a file & keep working?
>

Is the data only being read by python programs? If so, consider using
pickle: https://docs.python.org/3/library/pickle.html
Unlike json dumping, the goal of pickle is to represent objects as exactly
as possible and *not* to be interoperable with other languages.


If you're using json to pass data between python and some other language,
you don't want to silently convert bytes to strings.
If you have a bytestring of utf-8 data, you want to utf-8 decode it before
passing it to json.dumps.
Likewise, if you have latin-1 data, you want to latin-1 decode it.
There is no universal and correct bytes-to-string conversion.

On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico  wrote:

> Maybe what we need is to fork out the default JSON encoder into two,
> or have a "strict=True" or "strict=False" flag. In non-strict mode,
> round-tripping is not guaranteed, and various types will be folded to
> each other - mainly, many built-in and stdlib types will be
> represented in strings. In strict mode, compliance with the RFC is
> ensured (so ValueError will be raised on inf/nan), and everything
> should round-trip safely.
>

Wouldn't it be reasonable to represent this as an encoder which is provided
by `json`? i.e.

from json import dumps, UnsafeJSONEncoder
...
json.dumps(foo, cls=UnsafeJSONEncoder)

Emphasizing the "Unsafe" part of this and introducing people to the idea of
setting an encoder also seems nice.


On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico  wrote:

> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
>  wrote:
> >

> The 'json' module already fails to provide round-trip functionality:
> >
> > >>> for data in ({True: 1}, {1: 2}, (1, 2)):
> > ... if json.loads(json.dumps(data)) != data:
> > ... print('oops', data, json.loads(json.dumps(data)))
> > ...
> > oops {True: 1} {'true': 1}
> > oops {1: 2} {'1': 2}
> > oops (1, 2) [1, 2]
>
> There's a fundamental limitation of JSON in that it requires string
> keys, so this is an obvious transformation. I suppose you could call
> that one a bug too, but it's very useful and not too dangerous. (And
> then there's the tuple-to-list transformation, which I think probably
> shouldn't happen, although I don't think that's likely to cause issues
> either.)


Ideally, all of these bits of support for non-JSON types should be opt-in,
not opt-out.
But it's not worth making a breaking change to the stdlib over this.

Especially for new programmers, the notion that
deserialize(serialize(x)) != x
just seems like a recipe for subtle bugs.

You're never guaranteed that the deserialized object will match the
original, but shouldn't one of the goals of a de/serialization library be
to get it as close as is reasonable?


I've seen people do things which boil down to

json.loads(x)["some_id"] == UUID(...)

plenty of times. It's obviously wrong and the fix is easy, but isn't making
the default json encoder less strict just encouraging this type of bug?

Comparing JSON data against non-JSON types is part of the same category of
errors: conflating JSON with dictionaries.
It's very easy for people to make this mistake, especially since JSON
syntax is a subset of python dict syntax, so I don't think `json.dumps`
should be encouraging it.

On Tue, Jul 7, 2020 at 6:52 AM Adam Funk  wrote:

> Here's another "I'd expect to have to deal with this sort of thing in
> Java" example I just ran into:
>
> >>> r = requests.head(url, allow_redirects=True)
> >>> print(json.dumps(r.headers, indent=2))
> ...
> TypeError: Object of type CaseInsensitiveDict is not JSON serializable
> >>> print(json.dumps(dict(r.headers), indent=2))
> {
>   "Content-Type": "text/html; charset=utf-8",
>   "Server": "openresty",
> ...
> }
>

Why should the JSON encoder know about an arbitrary dict-like type?
It might implement Mapping, but there's no way for json.dumps to know that
in the general case (because not everything which implements Mapping
actually inherits from the Mapping ABC).
Converting it to a type which json.dumps understands is a reasonable
constraint.

Also, wouldn't it be fair, if your object is "case insensitive" to
serialize it as
  { "CONTENT-TYPE": ... } or { "content-type": ... } or ...
?

`r.headers["content-type"]` presumably gets a hit.
`json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a
KeyError.

This seems very much out of scope for the json package because it's not
clear what it's supposed to do with this type.
Libraries should ask users to specify what they mean and not make
potentially harmful assumptions.

Best,
-Stephen
-- 
https://mail.python.org/mailman/listinfo/python-list


How to handle async and inheritance?

2020-06-30 Thread Stephen Rosen
Hi all,

I'm looking at a conflict between code sharing via inheritance and async
usage. I would greatly appreciate any guidance, ideas, or best practices
which might help.

I'll speak here in terms of a toy example, but, if anyone wants to look at
the real code, I'm working on webargs. [1] Specifically, we have a `Parser`
class and an `AsyncParser` subclass, and the two have a lot of code
duplication to handle async/await. [2]


I've got an inheritance structure like this:

class MyAbstractType: ...
class ConcreteType(MyAbstractType): ...
class AsyncConcreteType(MyAbstractType): ...

One of my goals, of course, is to share code between ConcreteType and
AsyncConcreteType via their parent.
But the trouble is that there are functions defined like this:

class MyAbstractType:
def foo(self):
x = self.bar()
y = self.baz(x)
... # some code here, let's say 20 lines

class AsyncConcreteType(MyAbstractType):
async def foo(self):
x = await self.bar()
y = self.baz(x)
... # the same 20 lines as above, but with an `await` added
every-other line


I'm aware that I'm looking at "function color" and that my scenario is
pitting two language features -- inheritance and async -- against one
another. But I don't see a clean way out if we want to support an
"async-aware" version of a class with synchronous methods.

What I tried already, which I couldn't get to work, was to either fiddle
with things like `inspect` to see if the current function is async or to
use a class variable to indicate that the current class is the async
version. The idea was to write something like

class MyAbstractType:
_use_async_calls = False
def foo(self):
x = self._await_if_i_am_async(self.bar)
y = self.baz(x)
...

and that way, the async subclass just needs to change signatures to be
async with little stubs and set the flag,

class AsyncConcreteType(MyAbstractType):
_use_async_calls = True
async def foo(self):
return super().foo()
async def bar(self):
return super().bar()

but this (some of you are ahead of me on this, I'm sure!) did not work at
all. I couldn't find any way to write `_await_if_i_am_async`, other than
possibly doing some weird things with `exec`.
Once I concluded that the python wouldn't let me decide whether or not to
use await at runtime, at least with tools of which I'm aware, I basically
gave up on that route.


However, it seems like there should be some clever technique for defining
`MyAbstractType.foo` such that it awaits on certain calls *if* there's some
indication that it should do so. It's obviously possible with `exec`, but I
don't want to convert all of the core codepaths into giant `exec` blocks.
Perhaps there's a way which is safer and more maintainable though?


If anyone has experience in this space and can offer up a good solution, I
would love to hear about it.
And if someone wants to go above-and-beyond and look at webargs, and
suggest a better way for us to support aiohttp, I'd obviously welcome that
kind of help as well!

Thanks in advance, and best regards,
-Stephen


[1] https://github.com/marshmallow-code/webargs
[2]
https://github.com/marshmallow-code/webargs/blob/6668d267fa4135cf3f653e422bd168298f2213a8/src/webargs/asyncparser.py#L24
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test?

2020-06-20 Thread Stephen Rosen
Worth noting: by assertTrue he probably meant assertEqual.
But I'd recommend using assertIn [1] if you're using unittest to check
output written to stdout/stderr.
That way, your tests are slightly more robust to changes in the exact
output.


pytest may also be helpful for this (or any!) type of testing.
Disclaimer/warning: pytest can be confusing even for experienced python
programmers because it does some fancy things.
But if you put in the time to learn it, it's very popular because of the
way it structures testsuites and code reuse (i.e. fixtures).

It can do a lot to help you, and provides output capturing out of the box
[2] as well as some handy tools for building temporary testing directories
[3].


[1]
https://docs.python.org/3.6/library/unittest.html#unittest.TestCase.assertIn
[2] https://docs.pytest.org/en/stable/capture.html
[3] https://docs.pytest.org/en/stable/tmpdir.html

On Fri, Jun 19, 2020 at 2:18 PM Terry Reedy  wrote:

> On 6/17/2020 12:34 PM, Tony Flury via Python-list wrote:
>
> > In a recent application that I wrote (where output to the console was
> > important), I tested it using the 'unittest' framework, and by patching
> > sys.stderr to be a StringIO - that way my test case could inspect what
> > was being output.
>
> Tony's code with hard returns added so that code lines remain separated
> instead of wrapping.
>
> with patch('sys.stderr', StringIO()) as stderr:
>application.do_stuff()  self.assertTrue(stderr.getvalue(),
>'Woops - that didn\'t work')
> This doc, worth reading more than once, is
> https://docs.python.org/3/library/unittest.mock.html#the-patchers
>
> --
> Terry Jan Reedy
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list