[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Richard Musil Fri, 23 Aug 2019 00:35:04 -0700

I have originally planned to post the proposal on bpo, but things turned
out unexpectedly (for me), so I am returning here.


I wrote the patch (for the Python part). If anyone is interested it is here:
https://github.com/python/cpython/compare/master...risa2000:json_patch

The patch follows the original idea about serializing the custom type to
JSON and I believe it is "as simple as it gets" except the JSON number
validity check, which turned out to be problematic.

I run some timeit benchmarks on my code, and compared it to simplejson. The
test I run was:

(simplejson)
py -m timeit -s "import simplejson as sjson; from decimal import Decimal;
d=[Decimal('1.000000000000000001')]*10000" "sjson.dumps(d)"

(my code)
py -m timeit -s "import json; from decimal import Decimal;
d=[Decimal('1.000000000000000001')]*10000" "json.dumps(d,
dump_as_number=Decimal)"

Since my code runs in pure Python only, I disabled the C lib in simplejson
too. Here are the results:
simplejson - with C code: 50 loops, best of 5: 5.89 msec per loop
simplejson - pure Python: 20 loops, best of 5: 10.5 msec per loop
json_patch (regex check): 10 loops, best of 5: 21.3 msec per loop
json_patch (float check): 20 loops, best of 5: 15.1 msec per loop
json_patch (no check): 50 loops, best of 5: 9.75 msec per loop

The different "checks" mark different _check_json_num implementations
(included in the code). "float check" is used just as an example of
something accessible (and possibly faster), but I guess there could be
cases which float accepts, but which are not valid JSON numbers.

The JSON validity check turned out to be the cause of the performance hit.
simpljson does not do any validity check on Decimal output, so it is on par
in perf with "no check" (I guess it is a tad bit slower because it
implements and handles more features in the encoder loop).

I previously argued with Paul that making an assumption about the object
output validity based on its type is not safe (which I still hold), but
making it safe in this particular case presents the performance hit I
cannot accept, or to word it differently, if I should choose between stdlib
json and simplejson, while knowing that the stdlib runs 50-100% slower (but
safe), I would choose simplejson.

>From the previous discussion here I also understood that letting the custom
type serialize without the validity check is unacceptable for some. Since I
am basically indifferent in this matter, I would not argue about it either.

Which leaves me with only one possible outcome (which seems to be
acceptable) - porting the Decimal handling from simplejson to stdlib. Apart
from the fact that simplejson already has it, so if I need it, I could use
simplejson, the other part is that whoever pulled simplejson code into
stdlib either made deliberate effort to remove this particular
functionality (if it was present at the time) or never considered it worthy
to add (once it was added to simplejson).

Second point is that when looking at the code in the stdlib and in
simplejson, it is clear that simplejson has more features (and seems also
to be more actively maintained) than the stdlib code, so importing one
particular feature into the stdlib just to make it "less inferior" without
any additional benefit seems like a waste of time.

Why simplejson remained separated from the main CPython is also a question
(I guess there was/is a reason), because it seems like including the code
completely and maintain it inside CPython could be better use of the
resources.

Richard

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PRRP2EVQQUA3KA45343TVALGD7DUNOIG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Reply via email to