On Tue, Sep 15, 2020 at 7:30 PM Christopher Barker <python...@gmail.com>
wrote:

> On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.tur...@gmail.com> wrote:
>
>> json.load and json.dump already default to UTF8 and already have
>> parameters for json loading and dumping.
>>
>
> yes, of course.
>
> json.loads and json.dumps exist only because there was no way to
>> distinguish between a string containing JSON and a file path string.
>> (They probably should've been .loadstr and .dumpstr, but it's too late
>> for that now)
>>
>
> I think they exist because that was the pickle API from years ago --
> though maybe that's why the pickle API had them. Though I think you have it
> a bit backwards -- you can't pass a path into loads/dumps for that reason.
> If they were created because that distinction couldn't be made, then
> load/sump would have accepted a string path back in the day.
>
> TBH, I think it would be great to just have .load and .dump read the file
>> with standard params when a path-like ( hasattr(obj, '__path__') ) is
>> passed, but the suggested disadvantages of this are:
>>
>> - https://docs.python.org/3/library/functions.html#open
>>
>>   > The default encoding is platform dependent (whatever
>> locale.getpreferredencoding() returns), but any text encoding supported by
>> Python can be used. See the codecs module for the list of supported
>> encodings.
>>
>
> that's not a reason at all -- the reason is that some folks think
> overloading a function like this is bad API design. And it's been the way
> it's been for a long time, so probably better to add a new function(s),
> rather than extend the API of an existing one.
>

.load - reads a file object
.loadf - reads a file object that it opens for you from a str path or an
object with an obj.__path__
.loads - reads from a string-like object

or

.load - reads a file object or creates a file object from a path or an
obj.__path__ and closes it after reading
.loads - reads from a

For backwards-compatibility (without a check for `sys.version_info[:2]` or
`hasattr(json, 'loadf')`, handling the file (e.g. using a context manager)
will still be the way it's done.


>
>
>> - .load and .dump don't default to UTF8?
>>   AFAIU, they do default to UTF-8. Do they instead currently default to
>> locale.getpreferredencoding() instead of the JSON spec(s) *
>>   encoding= was removed from .loads and was never accepted by json.load
>> or json.dump
>>
>
> I think dump defaults to UTF-8. But load is a bit odd (and not that well
> documented).
>
> it appears to accept a file_like object that returns either a string or a
> byte object from its read() method. If strings, then the decoding is done.
> if bytes, then I assume that it's using utf-8.
>
> This, by the way, should be better documented.
>

I agree: https://github.com/python/cpython/blob/master/Lib/json/__init__.py


>
>
>> - .load and .dump would also need to accept an encoding= parameter for
>> non-spec data that don't want to continue handling the file themselves
>>   - pickle.load has an encoding= parameter
>>
>
> .loads doesn't now, so I don't see why they would need to with the
> proposed change. You can always encode/decode ahead of time however you
> want, either in the file-like object or by passing decoded str to
> .loads/dumps.
>

pickle.loads does accept an encoding= parameter; and that's the API  we
were matching.

Handling the file object will continue to be the backwards-compatible way
to do it .


>
>
>> - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
>>
>
> no, I think that's clear. in fact, you can't currently dump to a binary
> file:
>
> In [26]: json.dump(obj, open('tiny-enc.json', 'wb'))
>
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> <ipython-input-26-02e9bcd47a3e> in <module>
> ----> 1 json.dump(obj, open('tiny-enc.json', 'wb'))
>
> ~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp,
> skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators,
> default, sort_keys, **kw)
>     178     # a debuggability cost
>     179     for chunk in iterable:
> --> 180         fp.write(chunk)
>     181
>     182
>
> TypeError: a bytes-like object is required, not 'str'
>
> That's the beauty of Python 3's text model :-)
>
> JSON Specs:
>> - https://tools.ietf.org/html/rfc7159#section-8.1  :
>>
>>   > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
>>    encoding is UTF-8,
>>
>
> So THAT is interesting. But the current implementation does not directly
> support anything but UTF-8, and I think it's fine that that still be the
> case. If anyone is using the other two, it's an esoteric case, and they can
> encode/decode by hand.
>

The Python JSON implementation should support the full JSON spec (including
UTF-8, UTF-16, and UTF-32) and should default to UTF-8.


>
> > So, could we just have .load and .dump accept a path-like and an
> encoding= parameter (because they need to be able to specify UTF-8 / UTF-16
> / UTF-32 anyway)?
>
> These are separate questions, but I'll say:
>
> Yes, it could take a path-like. But I think there was not much support for
> that in this discussion.
>

A path str or a path-like. Is there any reason not to also support a
path-like object with this API, too?


>
> No -- there is no need for encoding parameter -- the other two options are
> rare and can be done by hand.
>

There is a need for an encoding parameter in order to support the full JSON
spec. Whether creating a new .loadf or just extending .load is the
solution, the method should accept an encoding parameter.


>
> BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding.
> A user can encode it any way they want when passing it along.
>
> This, in fact, is all very Python3 text model compatible -- the
> encoding/decoding should happen as close to IO as possible.
>

Is there precedent for handling the file for the user in any other stdlib
functions?

Extending the pickle and marshal APIs should also occur with this PR if
accepted.


>
> If there were no backward compatibility options, and it were me, I would
> only use strings in/out of the json module, but I think that ship has
> sailed.
>

The obj.__json__ protocol discussions discussed various ways to implement
customizable serialization of object graphs containing complex types to
JSON/JSON5 and/or JSON-LD (which BTW supports complex types like complex
fractions)


>
> Anyway -- if anyone wants to push for overloading .load()/dump(), rather
> than making two new loadf() and dumpf() functions, then speak now -- that
> will take more discussion, and maybe a PEP.
>

I don't see why one or the other would need a PEP so long as the new
functionality is backward-compatible?


>
> -CHB
>
>
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4ZT2VBI242ULHXPPVUMOUMB7Z5DVC6EM/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to