On Aug 8, 2019, at 03:22, Richard Musil <risa20...@gmail.com> wrote:
> 
> What matters is that I did not find a way how to fix it with the standard 
> `json` module. I have the JSON file generated by another program (C++ code, 
> which uses nlohmann/json library), which serializes one of the floats to the 
> value above. 

...

> If anyone would want to know, why the last digit matters (or why I cannot 
> double quote the floats), it is because the file has a secure hash attached 
> and this basically breaks it.

If you need to exactly match a JSON file byte for byte, you really shouldn’t 
rely on parsing it and re-creating it in the first place, and especially not 
with two different libraries.

The fact that your C++ library is apparently using a different rounding mode in 
representing floats than Python’s default round-to-even. But different 
libraries also have different rules for when they switch to exponential 
numbers, and how they represent that. And a C++ library may well represent 
64-bit integers above 1<<56 imprecisely, while Python won’t.  And, beyond 
numbers, different libraries produce different white space, different ordering 
within dicts, and different escaped representations of strings (not to mention 
how they handle things like “\uDEAD”, which the spec says is legal but doesn’t 
tell you how to interpret, because it doesn’t map to any Unicode character). 
There’s no way to guarantee that dumps(loads(x) == x, even if you use Decimal 
instead of float.

And this isn’t really a limitation of either of the libraries you’re using, 
it’s the way JSON is supposed to work, by design. Even if both libraries follow 
all of the interoperability recommendations in the RFC, they’re still not 
expected to produce the same bytes for the same input.

Usually you just shouldn’t be hashing JSON files. But sometimes you have to, to 
fit into a poorly-designed ecosystem that you can’t change. In that case, if 
your goal is to write a program that sometimes makes a substantive change (in 
which case you want to re-sign the package, or tell the client there’s an 
update to download, etc.), but usually doesn’t, and you want it t leave the 
file byte-for-byte unchanged (so you don’t need to re-sign, re-download, etc.), 
the best thing to do is check that the dict is unchanged and, if so, not write 
the file at all, or write back the original un-parsed string.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2QIZZEBBIJDSWDUSCIYEEWC5TIGXYBXX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to