On Tue, Sep 15, 2020 at 7:30 PM Christopher Barker <python...@gmail.com> wrote:
> On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.tur...@gmail.com> wrote: > >> json.load and json.dump already default to UTF8 and already have >> parameters for json loading and dumping. >> > > yes, of course. > > json.loads and json.dumps exist only because there was no way to >> distinguish between a string containing JSON and a file path string. >> (They probably should've been .loadstr and .dumpstr, but it's too late >> for that now) >> > > I think they exist because that was the pickle API from years ago -- > though maybe that's why the pickle API had them. Though I think you have it > a bit backwards -- you can't pass a path into loads/dumps for that reason. > If they were created because that distinction couldn't be made, then > load/sump would have accepted a string path back in the day. > > TBH, I think it would be great to just have .load and .dump read the file >> with standard params when a path-like ( hasattr(obj, '__path__') ) is >> passed, but the suggested disadvantages of this are: >> >> - https://docs.python.org/3/library/functions.html#open >> >> > The default encoding is platform dependent (whatever >> locale.getpreferredencoding() returns), but any text encoding supported by >> Python can be used. See the codecs module for the list of supported >> encodings. >> > > that's not a reason at all -- the reason is that some folks think > overloading a function like this is bad API design. And it's been the way > it's been for a long time, so probably better to add a new function(s), > rather than extend the API of an existing one. > .load - reads a file object .loadf - reads a file object that it opens for you from a str path or an object with an obj.__path__ .loads - reads from a string-like object or .load - reads a file object or creates a file object from a path or an obj.__path__ and closes it after reading .loads - reads from a For backwards-compatibility (without a check for `sys.version_info[:2]` or `hasattr(json, 'loadf')`, handling the file (e.g. using a context manager) will still be the way it's done. > > >> - .load and .dump don't default to UTF8? >> AFAIU, they do default to UTF-8. Do they instead currently default to >> locale.getpreferredencoding() instead of the JSON spec(s) * >> encoding= was removed from .loads and was never accepted by json.load >> or json.dump >> > > I think dump defaults to UTF-8. But load is a bit odd (and not that well > documented). > > it appears to accept a file_like object that returns either a string or a > byte object from its read() method. If strings, then the decoding is done. > if bytes, then I assume that it's using utf-8. > > This, by the way, should be better documented. > I agree: https://github.com/python/cpython/blob/master/Lib/json/__init__.py > > >> - .load and .dump would also need to accept an encoding= parameter for >> non-spec data that don't want to continue handling the file themselves >> - pickle.load has an encoding= parameter >> > > .loads doesn't now, so I don't see why they would need to with the > proposed change. You can always encode/decode ahead of time however you > want, either in the file-like object or by passing decoded str to > .loads/dumps. > pickle.loads does accept an encoding= parameter; and that's the API we were matching. Handling the file object will continue to be the backwards-compatible way to do it . > > >> - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode) >> > > no, I think that's clear. in fact, you can't currently dump to a binary > file: > > In [26]: json.dump(obj, open('tiny-enc.json', 'wb')) > > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > <ipython-input-26-02e9bcd47a3e> in <module> > ----> 1 json.dump(obj, open('tiny-enc.json', 'wb')) > > ~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp, > skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, > default, sort_keys, **kw) > 178 # a debuggability cost > 179 for chunk in iterable: > --> 180 fp.write(chunk) > 181 > 182 > > TypeError: a bytes-like object is required, not 'str' > > That's the beauty of Python 3's text model :-) > > JSON Specs: >> - https://tools.ietf.org/html/rfc7159#section-8.1 : >> >> > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default >> encoding is UTF-8, >> > > So THAT is interesting. But the current implementation does not directly > support anything but UTF-8, and I think it's fine that that still be the > case. If anyone is using the other two, it's an esoteric case, and they can > encode/decode by hand. > The Python JSON implementation should support the full JSON spec (including UTF-8, UTF-16, and UTF-32) and should default to UTF-8. > > > So, could we just have .load and .dump accept a path-like and an > encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 > / UTF-32 anyway)? > > These are separate questions, but I'll say: > > Yes, it could take a path-like. But I think there was not much support for > that in this discussion. > A path str or a path-like. Is there any reason not to also support a path-like object with this API, too? > > No -- there is no need for encoding parameter -- the other two options are > rare and can be done by hand. > There is a need for an encoding parameter in order to support the full JSON spec. Whether creating a new .loadf or just extending .load is the solution, the method should accept an encoding parameter. > > BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding. > A user can encode it any way they want when passing it along. > > This, in fact, is all very Python3 text model compatible -- the > encoding/decoding should happen as close to IO as possible. > Is there precedent for handling the file for the user in any other stdlib functions? Extending the pickle and marshal APIs should also occur with this PR if accepted. > > If there were no backward compatibility options, and it were me, I would > only use strings in/out of the json module, but I think that ship has > sailed. > The obj.__json__ protocol discussions discussed various ways to implement customizable serialization of object graphs containing complex types to JSON/JSON5 and/or JSON-LD (which BTW supports complex types like complex fractions) > > Anyway -- if anyone wants to push for overloading .load()/dump(), rather > than making two new loadf() and dumpf() functions, then speak now -- that > will take more discussion, and maybe a PEP. > I don't see why one or the other would need a PEP so long as the new functionality is backward-compatible? > > -CHB > > > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4ZT2VBI242ULHXPPVUMOUMB7Z5DVC6EM/ Code of Conduct: http://python.org/psf/codeofconduct/