[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Andrew Barnert via Python-ideas Thu, 08 Aug 2019 09:36:03 -0700

On Aug 8, 2019, at 03:22, Richard Musil <risa20...@gmail.com> wrote:
> 
> What matters is that I did not find a way how to fix it with the standard 
> `json` module. I have the JSON file generated by another program (C++ code, 
> which uses nlohmann/json library), which serializes one of the floats to the 
> value above.

...

> If anyone would want to know, why the last digit matters (or why I cannot
> double quote the floats), it is because the file has a secure hash attached
> and this basically breaks it.

If you need to exactly match a JSON file byte for byte, you really shouldn’t
rely on parsing it and re-creating it in the first place, and especially not
with two different libraries.

The fact that your C++ library is apparently using a different rounding mode in
representing floats than Python’s default round-to-even. But different
libraries also have different rules for when they switch to exponential
numbers, and how they represent that. And a C++ library may well represent
64-bit integers above 1<<56 imprecisely, while Python won’t. And, beyond
numbers, different libraries produce different white space, different ordering
within dicts, and different escaped representations of strings (not to mention
how they handle things like “\uDEAD”, which the spec says is legal but doesn’t
tell you how to interpret, because it doesn’t map to any Unicode character).
There’s no way to guarantee that dumps(loads(x) == x, even if you use Decimal
instead of float.

And this isn’t really a limitation of either of the libraries you’re using,
it’s the way JSON is supposed to work, by design. Even if both libraries follow
all of the interoperability recommendations in the RFC, they’re still not
expected to produce the same bytes for the same input.

Usually you just shouldn’t be hashing JSON files. But sometimes you have to, to
fit into a poorly-designed ecosystem that you can’t change. In that case, if
your goal is to write a program that sometimes makes a substantive change (in
which case you want to re-sign the package, or tell the client there’s an
update to download, etc.), but usually doesn’t, and you want it t leave the
file byte-for-byte unchanged (so you don’t need to re-sign, re-download, etc.),
the best thing to do is check that the dict is unchanged and, if so, not write
the file at all, or write back the original un-parsed string.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/2QIZZEBBIJDSWDUSCIYEEWC5TIGXYBXX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Reply via email to