Re: Bulletproof json.dump?

2020-07-09 Thread Adam Funk
On 2020-07-07, Stephen Rosen wrote:

> On Mon, Jul 6, 2020 at 6:37 AM Adam Funk  wrote:
>
>> Is there a "bulletproof" version of json.dump somewhere that will
>> convert bytes to str, any other iterables to list, etc., so you can
>> just get your data into a file & keep working?
>>
>
> Is the data only being read by python programs? If so, consider using
> pickle: https://docs.python.org/3/library/pickle.html
> Unlike json dumping, the goal of pickle is to represent objects as exactly
> as possible and *not* to be interoperable with other languages.
>
>
> If you're using json to pass data between python and some other language,
> you don't want to silently convert bytes to strings.
> If you have a bytestring of utf-8 data, you want to utf-8 decode it before
> passing it to json.dumps.
> Likewise, if you have latin-1 data, you want to latin-1 decode it.
> There is no universal and correct bytes-to-string conversion.
>
> On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico  wrote:
>
>> Maybe what we need is to fork out the default JSON encoder into two,
>> or have a "strict=True" or "strict=False" flag. In non-strict mode,
>> round-tripping is not guaranteed, and various types will be folded to
>> each other - mainly, many built-in and stdlib types will be
>> represented in strings. In strict mode, compliance with the RFC is
>> ensured (so ValueError will be raised on inf/nan), and everything
>> should round-trip safely.
>>
>
> Wouldn't it be reasonable to represent this as an encoder which is provided
> by `json`? i.e.
>
> from json import dumps, UnsafeJSONEncoder
> ...
> json.dumps(foo, cls=UnsafeJSONEncoder)
>
> Emphasizing the "Unsafe" part of this and introducing people to the idea of
> setting an encoder also seems nice.
>
>
> On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico  wrote:
>
>> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
>>  wrote:
>> >
>
>> The 'json' module already fails to provide round-trip functionality:
>> >
>> > >>> for data in ({True: 1}, {1: 2}, (1, 2)):
>> > ... if json.loads(json.dumps(data)) != data:
>> > ... print('oops', data, json.loads(json.dumps(data)))
>> > ...
>> > oops {True: 1} {'true': 1}
>> > oops {1: 2} {'1': 2}
>> > oops (1, 2) [1, 2]
>>
>> There's a fundamental limitation of JSON in that it requires string
>> keys, so this is an obvious transformation. I suppose you could call
>> that one a bug too, but it's very useful and not too dangerous. (And
>> then there's the tuple-to-list transformation, which I think probably
>> shouldn't happen, although I don't think that's likely to cause issues
>> either.)
>
>
> Ideally, all of these bits of support for non-JSON types should be opt-in,
> not opt-out.
> But it's not worth making a breaking change to the stdlib over this.
>
> Especially for new programmers, the notion that
> deserialize(serialize(x)) != x
> just seems like a recipe for subtle bugs.
>
> You're never guaranteed that the deserialized object will match the
> original, but shouldn't one of the goals of a de/serialization library be
> to get it as close as is reasonable?
>
>
> I've seen people do things which boil down to
>
> json.loads(x)["some_id"] == UUID(...)
>
> plenty of times. It's obviously wrong and the fix is easy, but isn't making
> the default json encoder less strict just encouraging this type of bug?
>
> Comparing JSON data against non-JSON types is part of the same category of
> errors: conflating JSON with dictionaries.
> It's very easy for people to make this mistake, especially since JSON
> syntax is a subset of python dict syntax, so I don't think `json.dumps`
> should be encouraging it.
>
> On Tue, Jul 7, 2020 at 6:52 AM Adam Funk  wrote:
>
>> Here's another "I'd expect to have to deal with this sort of thing in
>> Java" example I just ran into:
>>
>> >>> r = requests.head(url, allow_redirects=True)
>> >>> print(json.dumps(r.headers, indent=2))
>> ...
>> TypeError: Object of type CaseInsensitiveDict is not JSON serializable
>> >>> print(json.dumps(dict(r.headers), indent=2))
>> {
>>   "Content-Type": "text/html; charset=utf-8",
>>   "Server": "openresty",
>> ...
>> }
>>
>
> Why should the JSON encoder know about an arbitrary dict-like type?
> It might implement Mapping, but there's no way for json.dumps to know that
> in the general case (because not everything which implements Mapping
> actually inherits from the Mapping ABC).
> Converting it to a type which json.dumps understands is a reasonable
> constraint.
>
> Also, wouldn't it be fair, if your object is "case insensitive" to
> serialize it as
>   { "CONTENT-TYPE": ... } or { "content-type": ... } or ...
> ?
>
> `r.headers["content-type"]` presumably gets a hit.
> `json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a
> KeyError.
>
> This seems very much out of scope for the json package because it's not
> clear what it's supposed to do with this type.
> Libraries should ask users to specify what they mean 

Re: Bulletproof json.dump?

2020-07-07 Thread J. Pic
Try jsonlight.dumps it'll just work.

Le mar. 7 juil. 2020 à 12:53, Adam Funk  a écrit :

> On 2020-07-06, Adam Funk wrote:
>
> > On 2020-07-06, Chris Angelico wrote:
> >> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
> >> wrote:
>
> >>> While I agree entirely with your point, there is however perhaps room
> >>> for a bit more helpfulness from the json module. There is no sensible
> >>> reason I can think of that it refuses to serialize sets, for example.
> >>
> >> Sets don't exist in JSON. I think that's a sensible reason.
> >
> > I don't agree.  Tuples & lists don't exist separately in JSON, but
> > both are serializable (to the same thing).  Non-string keys aren't
> > allowed in JSON, but it silently converts numbers to strings instead
> > of barfing.  Typically, I've been using sets to deduplicate values as
> > I go along, & having to walk through the whole object changing them to
> > lists before serialization strikes me as the kind of pointless labor
> > that I expect when I'm using Java.  ;-)
>
> Here's another "I'd expect to have to deal with this sort of thing in
> Java" example I just ran into:
>
>
> >>> r = requests.head(url, allow_redirects=True)
> >>> print(json.dumps(r.headers, indent=2))
> ...
> TypeError: Object of type CaseInsensitiveDict is not JSON serializable
> >>> print(json.dumps(dict(r.headers), indent=2))
> {
>   "Content-Type": "text/html; charset=utf-8",
>   "Server": "openresty",
> ...
> }
>
>
> --
> I'm after rebellion --- I'll settle for lies.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-07 Thread Stephen Rosen
On Mon, Jul 6, 2020 at 6:37 AM Adam Funk  wrote:

> Is there a "bulletproof" version of json.dump somewhere that will
> convert bytes to str, any other iterables to list, etc., so you can
> just get your data into a file & keep working?
>

Is the data only being read by python programs? If so, consider using
pickle: https://docs.python.org/3/library/pickle.html
Unlike json dumping, the goal of pickle is to represent objects as exactly
as possible and *not* to be interoperable with other languages.


If you're using json to pass data between python and some other language,
you don't want to silently convert bytes to strings.
If you have a bytestring of utf-8 data, you want to utf-8 decode it before
passing it to json.dumps.
Likewise, if you have latin-1 data, you want to latin-1 decode it.
There is no universal and correct bytes-to-string conversion.

On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico  wrote:

> Maybe what we need is to fork out the default JSON encoder into two,
> or have a "strict=True" or "strict=False" flag. In non-strict mode,
> round-tripping is not guaranteed, and various types will be folded to
> each other - mainly, many built-in and stdlib types will be
> represented in strings. In strict mode, compliance with the RFC is
> ensured (so ValueError will be raised on inf/nan), and everything
> should round-trip safely.
>

Wouldn't it be reasonable to represent this as an encoder which is provided
by `json`? i.e.

from json import dumps, UnsafeJSONEncoder
...
json.dumps(foo, cls=UnsafeJSONEncoder)

Emphasizing the "Unsafe" part of this and introducing people to the idea of
setting an encoder also seems nice.


On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico  wrote:

> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
>  wrote:
> >

> The 'json' module already fails to provide round-trip functionality:
> >
> > >>> for data in ({True: 1}, {1: 2}, (1, 2)):
> > ... if json.loads(json.dumps(data)) != data:
> > ... print('oops', data, json.loads(json.dumps(data)))
> > ...
> > oops {True: 1} {'true': 1}
> > oops {1: 2} {'1': 2}
> > oops (1, 2) [1, 2]
>
> There's a fundamental limitation of JSON in that it requires string
> keys, so this is an obvious transformation. I suppose you could call
> that one a bug too, but it's very useful and not too dangerous. (And
> then there's the tuple-to-list transformation, which I think probably
> shouldn't happen, although I don't think that's likely to cause issues
> either.)


Ideally, all of these bits of support for non-JSON types should be opt-in,
not opt-out.
But it's not worth making a breaking change to the stdlib over this.

Especially for new programmers, the notion that
deserialize(serialize(x)) != x
just seems like a recipe for subtle bugs.

You're never guaranteed that the deserialized object will match the
original, but shouldn't one of the goals of a de/serialization library be
to get it as close as is reasonable?


I've seen people do things which boil down to

json.loads(x)["some_id"] == UUID(...)

plenty of times. It's obviously wrong and the fix is easy, but isn't making
the default json encoder less strict just encouraging this type of bug?

Comparing JSON data against non-JSON types is part of the same category of
errors: conflating JSON with dictionaries.
It's very easy for people to make this mistake, especially since JSON
syntax is a subset of python dict syntax, so I don't think `json.dumps`
should be encouraging it.

On Tue, Jul 7, 2020 at 6:52 AM Adam Funk  wrote:

> Here's another "I'd expect to have to deal with this sort of thing in
> Java" example I just ran into:
>
> >>> r = requests.head(url, allow_redirects=True)
> >>> print(json.dumps(r.headers, indent=2))
> ...
> TypeError: Object of type CaseInsensitiveDict is not JSON serializable
> >>> print(json.dumps(dict(r.headers), indent=2))
> {
>   "Content-Type": "text/html; charset=utf-8",
>   "Server": "openresty",
> ...
> }
>

Why should the JSON encoder know about an arbitrary dict-like type?
It might implement Mapping, but there's no way for json.dumps to know that
in the general case (because not everything which implements Mapping
actually inherits from the Mapping ABC).
Converting it to a type which json.dumps understands is a reasonable
constraint.

Also, wouldn't it be fair, if your object is "case insensitive" to
serialize it as
  { "CONTENT-TYPE": ... } or { "content-type": ... } or ...
?

`r.headers["content-type"]` presumably gets a hit.
`json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a
KeyError.

This seems very much out of scope for the json package because it's not
clear what it's supposed to do with this type.
Libraries should ask users to specify what they mean and not make
potentially harmful assumptions.

Best,
-Stephen
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-07 Thread Adam Funk
On 2020-07-06, Adam Funk wrote:

> On 2020-07-06, Chris Angelico wrote:
>> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
>> wrote:

>>> While I agree entirely with your point, there is however perhaps room
>>> for a bit more helpfulness from the json module. There is no sensible
>>> reason I can think of that it refuses to serialize sets, for example.
>>
>> Sets don't exist in JSON. I think that's a sensible reason.
>
> I don't agree.  Tuples & lists don't exist separately in JSON, but
> both are serializable (to the same thing).  Non-string keys aren't
> allowed in JSON, but it silently converts numbers to strings instead
> of barfing.  Typically, I've been using sets to deduplicate values as
> I go along, & having to walk through the whole object changing them to
> lists before serialization strikes me as the kind of pointless labor
> that I expect when I'm using Java.  ;-)

Here's another "I'd expect to have to deal with this sort of thing in
Java" example I just ran into:


>>> r = requests.head(url, allow_redirects=True)
>>> print(json.dumps(r.headers, indent=2))
...
TypeError: Object of type CaseInsensitiveDict is not JSON serializable
>>> print(json.dumps(dict(r.headers), indent=2))
{
  "Content-Type": "text/html; charset=utf-8",
  "Server": "openresty",
...
}


-- 
I'm after rebellion --- I'll settle for lies.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread J. Pic
You can achieve round-tripping by maintaining a type mapping in code, for a
single datatype it would look like:

newloads(datetime, newdumps(datetime.now())

If those would rely on __dump__ and __load__ functions in the fashion of
pickle then nested data structures would also be easy:

@dataclass
class YourStruct:
dt = datetime
children = []

@classmethod
def __load__(cls, data):
return cls(
dt=datetime.fromisoformat(data['dt']),
children=[cls.__load__(c) for c in data['children']])
)

def __dump__(self):
return dict(
dt=self.dt.isoformat(),
children=[c.__dump__() for c in self.children],
)

If your datetime is not being loaded from C-code you can even monkey patch
it and add __load__ and __dump__ on it and data round-trip as long as you
keep the type mapping in a method.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Chris Angelico  wrote:
> On Tue, Jul 7, 2020 at 12:01 AM Jon Ribbens via Python-list
> wrote:
>> I think what you're saying is, if we do:
>>
>> json1 = json.dumps(foo)
>> json2 = json.dumps(json.loads(json1))
>> assert json1 == json2
>>
>> the assertion should never fail (given that Python dictionaries are
>> ordered these days). I seems to me that should probably be true
>> regardless of any 'strict mode' flag - I can't immediately think of
>> any reason it wouldn't be.
>
> Right. But in strict mode, the stronger assertion would hold:
>
> assert obj == json.loads(json.dumps(obj))
>
> Also, the intermediate text would be RFC-compliant. If this cannot be
> done, ValueError would be raised. (Or maybe TypeError in some cases.)

Yes, I agree (although you'd need to call it something other than
'strict' mode, since that flag already exists). But note nothing
I am suggesting would involve JSONEncoder ever producing non-standard
output (except in cases where it already would).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Tue, Jul 7, 2020 at 12:01 AM Jon Ribbens via Python-list
 wrote:
>
> On 2020-07-06, Chris Angelico  wrote:
> > I think that even in non-strict mode, round-tripping should be
> > achieved after one iteration. That is to say, anything you can
> > JSON-encode will JSON-decode to something that would create the same
> > encoded form. Not sure if there's anything that would violate that
> > (weak) guarantee.
>
> I think what you're saying is, if we do:
>
> json1 = json.dumps(foo)
> json2 = json.dumps(json.loads(json1))
> assert json1 == json2
>
> the assertion should never fail (given that Python dictionaries are
> ordered these days). I seems to me that should probably be true
> regardless of any 'strict mode' flag - I can't immediately think of
> any reason it wouldn't be.

Right. But in strict mode, the stronger assertion would hold:

assert obj == json.loads(json.dumps(obj))

Also, the intermediate text would be RFC-compliant. If this cannot be
done, ValueError would be raised. (Or maybe TypeError in some cases.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Chris Angelico  wrote:
> I think that even in non-strict mode, round-tripping should be
> achieved after one iteration. That is to say, anything you can
> JSON-encode will JSON-decode to something that would create the same
> encoded form. Not sure if there's anything that would violate that
> (weak) guarantee.

I think what you're saying is, if we do:

json1 = json.dumps(foo)
json2 = json.dumps(json.loads(json1))
assert json1 == json2

the assertion should never fail (given that Python dictionaries are
ordered these days). I seems to me that should probably be true
regardless of any 'strict mode' flag - I can't immediately think of
any reason it wouldn't be.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Frank Millman  wrote:
> On 2020-07-06 3:08 PM, Jon Ribbens via Python-list wrote:
>> On 2020-07-06, Frank Millman  wrote:
>>> On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:
 While I agree entirely with your point, there is however perhaps room
 for a bit more helpfulness from the json module. There is no sensible
 reason I can think of that it refuses to serialize sets, for example.
 Going a bit further and, for example, automatically calling isoformat()
 on date/time/datetime objects would perhaps be a bit more controversial,
 but would frequently be useful, and there's no obvious downside that
 occurs to me.
>>>
>>> I may be missing something, but that would cause a downside for me.
>>>
>>> I store Python lists and dicts in a database by calling dumps() when
>>> saving them to the database and loads() when retrieving them.
>>>
>>> If a date was 'dumped' using isoformat(), then on retrieval I would not
>>> know whether it was originally a string, which must remain as is, or was
>>> originally a date object, which must be converted back to a date object.
>>>
>>> There is no perfect answer, but my solution works fairly well. When
>>> dumping, I use 'default=repr'. This means that dates get dumped as
>>> 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to
>>> detect that it is actually a date object.
>> 
>> There is no difference whatsoever between matching on the repr output
>> you show above and matching on ISO-8601 datetimes, except that at least
>> ISO-8601 is an actual standard. So no, you haven't found a downside.
>
> I don't understand. As you say, ISO-8601 is a standard, so the original 
> object could well have been a string in that format. So how do you 
> distinguish between an object that started out as a string, and an 
> object that started out as a date/datetime object?

With your method, how do you distinguish between an object that started
out as a string, and an object that started out as a date/datetime
object? The answer with both my method and your method is that you
cannot - and therefore my method is not a "downside" compared to yours.
Not to mention, I am not suggesting that your method should be
disallowed if you want to continue using it - I am suggesting that
your code could be simplified and your job made easier by my suggested
improvement.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Mon, Jul 6, 2020 at 11:39 PM Adam Funk  wrote:
>
> Aha, I think the default=repr option is probably just what I need;
> maybe (at least in the testing stages) something like this:
>
> try:
> with open(output_file, 'w') as f:
> json.dump(f)
> except TypeError:
> print('unexpected item in the bagging area!')
> with open(output_file, 'w') as f:
> json.dump(f, default=repr)
>
> and then I'd know when I need to go digging through the output for
> bytes, sets, etc., but at least I'd have the output to examine.
>

Easier:

def proclaimed_repr():
seen = False
def show_obj(obj):
nonlocal seen
if not seen:
seen = True
print("unexpected item in the bagging area!")
return repr(obj)
return show_obj

json.dump(f, default=proclaimed_repr())

If you don't care about "resetting" the marker, you can just use a
global or a default-arg hack:

def show_obj(obj, seen=[]):
if not seen:
seen.push(True)
print("unexpected item in the bagging area!")
return repr(obj)

json.dump(f, default=show_obj)

Either way, you can stick this function off in a utilities collection,
and then use it without fiddling with try/except.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Chris Angelico wrote:

> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
> wrote:
>>
>> On 2020-07-06, Chris Angelico  wrote:
>> > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>> >> Is there a "bulletproof" version of json.dump somewhere that will
>> >> convert bytes to str, any other iterables to list, etc., so you can
>> >> just get your data into a file & keep working?
>> >
>> > That's the PHP definition of "bulletproof" - whatever happens, no
>> > matter how bad, just keep right on going.
>>
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>
> Sets don't exist in JSON. I think that's a sensible reason.

I don't agree.  Tuples & lists don't exist separately in JSON, but
both are serializable (to the same thing).  Non-string keys aren't
allowed in JSON, but it silently converts numbers to strings instead
of barfing.  Typically, I've been using sets to deduplicate values as
I go along, & having to walk through the whole object changing them to
lists before serialization strikes me as the kind of pointless labor
that I expect when I'm using Java.  ;-)



>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>
> They wouldn't round-trip without some way of knowing which strings
> represent date/times. If you just want a one-way output format, it's
> not too hard to subclass the encoder - there's an example right there
> in the docs (showing how to create a representation for complex
> numbers). The vanilla JSON encoder shouldn't do any of this. In fact,
> just supporting infinities and nans is fairly controversial - see
> other threads happening right now.
>
> Maybe what people want is a pretty printer instead?
>
> https://docs.python.org/3/library/pprint.html
>
> Resilient against recursive data structures, able to emit Python-like
> code for many formats, is as readable as JSON, and is often
> round-trippable. It lacks JSON's interoperability, but if you're
> trying to serialize sets and datetimes, you're forfeiting that anyway.
>
> ChrisA


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Mon, Jul 6, 2020 at 11:31 PM Jon Ribbens via Python-list
 wrote:
>
> On 2020-07-06, Chris Angelico  wrote:
> > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
> > wrote:
> >> The 'json' module already fails to provide round-trip functionality:
> >>
> >> >>> for data in ({True: 1}, {1: 2}, (1, 2)):
> >> ... if json.loads(json.dumps(data)) != data:
> >> ... print('oops', data, json.loads(json.dumps(data)))
> >> ...
> >> oops {True: 1} {'true': 1}
> >> oops {1: 2} {'1': 2}
> >> oops (1, 2) [1, 2]
> >
> > There's a fundamental limitation of JSON in that it requires string
> > keys, so this is an obvious transformation. I suppose you could call
> > that one a bug too, but it's very useful and not too dangerous. (And
> > then there's the tuple-to-list transformation, which I think probably
> > shouldn't happen, although I don't think that's likely to cause issues
> > either.)
>
> That's my point though - there's almost no difference between allowing
> encoding of tuples and allowing encoding of sets. Any argument against
> the latter would also apply against the former. The only possible excuse
> for the difference is "historical reasons", and given that it would be
> useful to allow it, and there would be no negative consequences, this
> hardly seems sufficient.
>
> >> No. I want a JSON encoder to output JSON to be read by a JSON decoder.
> >
> > Does it need to round-trip, though? If you stringify your datetimes,
> > you can't decode it reliably any more. What's the purpose here?
>
> It doesn't need to round trip (which as mentioned above is fortunate
> because the existing module already doesn't round trip). The main use
> I have, and I should imagine the main use anyone has, for JSON is
> interoperability - to safely store and send data in a format in which
> it can be read by non-Python code. If you need, say, date/times to
> be understood as date/times by the receiving code they'll have to
> deal with that explicitly already. Improving Python to allow sending
> them at least gets us part way there by eliminating half the work.

That's fair.

Maybe what we need is to fork out the default JSON encoder into two,
or have a "strict=True" or "strict=False" flag. In non-strict mode,
round-tripping is not guaranteed, and various types will be folded to
each other - mainly, many built-in and stdlib types will be
represented in strings. In strict mode, compliance with the RFC is
ensured (so ValueError will be raised on inf/nan), and everything
should round-trip safely.

I think that even in non-strict mode, round-tripping should be
achieved after one iteration. That is to say, anything you can
JSON-encode will JSON-decode to something that would create the same
encoded form. Not sure if there's anything that would violate that
(weak) guarantee.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Frank Millman wrote:

> On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:
>> On 2020-07-06, Chris Angelico  wrote:
>>> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
 Is there a "bulletproof" version of json.dump somewhere that will
 convert bytes to str, any other iterables to list, etc., so you can
 just get your data into a file & keep working?
>>>
>>> That's the PHP definition of "bulletproof" - whatever happens, no
>>> matter how bad, just keep right on going.
>> 
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>> 
>
> I may be missing something, but that would cause a downside for me.
>
> I store Python lists and dicts in a database by calling dumps() when 
> saving them to the database and loads() when retrieving them.
>
> If a date was 'dumped' using isoformat(), then on retrieval I would not 
> know whether it was originally a string, which must remain as is, or was 
> originally a date object, which must be converted back to a date object.
>
> There is no perfect answer, but my solution works fairly well. When 
> dumping, I use 'default=repr'. This means that dates get dumped as 
> 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to 
> detect that it is actually a date object.
>
> I use the same trick for Decimal objects.
>
> Maybe the OP could do something similar.

Aha, I think the default=repr option is probably just what I need;
maybe (at least in the testing stages) something like this:

try:
with open(output_file, 'w') as f:
json.dump(f)
except TypeError:
print('unexpected item in the bagging area!')
with open(output_file, 'w') as f:
json.dump(f, default=repr)

and then I'd know when I need to go digging through the output for
bytes, sets, etc., but at least I'd have the output to examine.


-- 
Well, we had a lot of luck on Venus
We always had a ball on Mars
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Adam Funk
On 2020-07-06, Chris Angelico wrote:

> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>>
>> Hi,
>>
>> I have a program that does a lot of work with URLs and requests,
>> collecting data over about an hour, & then writing the collated data
>> to a JSON file.  The first time I ran it, the json.dump failed because
>> there was a bytes value instead of a str, so I had to figure out where
>> that was coming from before I could get any data out.  I've previously
>> run into the problem of collecting values in sets (for deduplication)
>> & forgetting to walk through the big data object changing them to
>> lists before serializing.
>>
>> Is there a "bulletproof" version of json.dump somewhere that will
>> convert bytes to str, any other iterables to list, etc., so you can
>> just get your data into a file & keep working?
>>
>
> That's the PHP definition of "bulletproof" - whatever happens, no
> matter how bad, just keep right on going. If you really want some way

Well played!

> to write "just anything" to your file, I recommend not using JSON -
> instead, write out the repr of your data structure. That'll give a
> decent result for bytes, str, all forms of numbers, and pretty much
> any collection, and it won't break if given something that can't
> safely be represented.

Interesting point.  At least the TypeError message does say what the
unacceptable type is ("Object of type set is not JSON serializable").


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Frank Millman

On 2020-07-06 3:08 PM, Jon Ribbens via Python-list wrote:

On 2020-07-06, Frank Millman  wrote:

On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:

While I agree entirely with your point, there is however perhaps room
for a bit more helpfulness from the json module. There is no sensible
reason I can think of that it refuses to serialize sets, for example.
Going a bit further and, for example, automatically calling isoformat()
on date/time/datetime objects would perhaps be a bit more controversial,
but would frequently be useful, and there's no obvious downside that
occurs to me.


I may be missing something, but that would cause a downside for me.

I store Python lists and dicts in a database by calling dumps() when
saving them to the database and loads() when retrieving them.

If a date was 'dumped' using isoformat(), then on retrieval I would not
know whether it was originally a string, which must remain as is, or was
originally a date object, which must be converted back to a date object.

There is no perfect answer, but my solution works fairly well. When
dumping, I use 'default=repr'. This means that dates get dumped as
'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to
detect that it is actually a date object.


There is no difference whatsoever between matching on the repr output
you show above and matching on ISO-8601 datetimes, except that at least
ISO-8601 is an actual standard. So no, you haven't found a downside.



I don't understand. As you say, ISO-8601 is a standard, so the original 
object could well have been a string in that format. So how do you 
distinguish between an object that started out as a string, and an 
object that started out as a date/datetime object?


Frank
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Chris Angelico  wrote:
> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
> wrote:
>> The 'json' module already fails to provide round-trip functionality:
>>
>> >>> for data in ({True: 1}, {1: 2}, (1, 2)):
>> ... if json.loads(json.dumps(data)) != data:
>> ... print('oops', data, json.loads(json.dumps(data)))
>> ...
>> oops {True: 1} {'true': 1}
>> oops {1: 2} {'1': 2}
>> oops (1, 2) [1, 2]
>
> There's a fundamental limitation of JSON in that it requires string
> keys, so this is an obvious transformation. I suppose you could call
> that one a bug too, but it's very useful and not too dangerous. (And
> then there's the tuple-to-list transformation, which I think probably
> shouldn't happen, although I don't think that's likely to cause issues
> either.)

That's my point though - there's almost no difference between allowing
encoding of tuples and allowing encoding of sets. Any argument against
the latter would also apply against the former. The only possible excuse
for the difference is "historical reasons", and given that it would be
useful to allow it, and there would be no negative consequences, this
hardly seems sufficient.

>> No. I want a JSON encoder to output JSON to be read by a JSON decoder.
>
> Does it need to round-trip, though? If you stringify your datetimes,
> you can't decode it reliably any more. What's the purpose here?

It doesn't need to round trip (which as mentioned above is fortunate
because the existing module already doesn't round trip). The main use
I have, and I should imagine the main use anyone has, for JSON is
interoperability - to safely store and send data in a format in which
it can be read by non-Python code. If you need, say, date/times to
be understood as date/times by the receiving code they'll have to
deal with that explicitly already. Improving Python to allow sending
them at least gets us part way there by eliminating half the work.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, J. Pic  wrote:
> Well I made a suggestion on python-ideas and a PyPi lib came out of it, but
> since you can't patch a lot of internal types it's not so useful.
>
> Feel free to try it out:
>
> https://yourlabs.io/oss/jsonlight/

While I applaud your experimentation, that is not suitable for any purpose.
You would probably do better by starting off subclassing json.JSONEncoder.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list
 wrote:
>
> On 2020-07-06, Chris Angelico  wrote:
> > On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
> > wrote:
> >> While I agree entirely with your point, there is however perhaps room
> >> for a bit more helpfulness from the json module. There is no sensible
> >> reason I can think of that it refuses to serialize sets, for example.
> >
> > Sets don't exist in JSON. I think that's a sensible reason.
>
> It is not. Tuples don't exist either, and yet they're supported.

Hmm, I didn't know that. Possibly it's as much a bug as the inf/nan issue.

> >> Going a bit further and, for example, automatically calling isoformat()
> >> on date/time/datetime objects would perhaps be a bit more controversial,
> >> but would frequently be useful, and there's no obvious downside that
> >> occurs to me.
> >
> > They wouldn't round-trip without some way of knowing which strings
> > represent date/times.
>
> The 'json' module already fails to provide round-trip functionality:
>
> >>> for data in ({True: 1}, {1: 2}, (1, 2)):
> ... if json.loads(json.dumps(data)) != data:
> ... print('oops', data, json.loads(json.dumps(data)))
> ...
> oops {True: 1} {'true': 1}
> oops {1: 2} {'1': 2}
> oops (1, 2) [1, 2]

There's a fundamental limitation of JSON in that it requires string
keys, so this is an obvious transformation. I suppose you could call
that one a bug too, but it's very useful and not too dangerous. (And
then there's the tuple-to-list transformation, which I think probably
shouldn't happen, although I don't think that's likely to cause issues
either.)

> > Maybe what people want is a pretty printer instead?
>
> No. I want a JSON encoder to output JSON to be read by a JSON decoder.

Does it need to round-trip, though? If you stringify your datetimes,
you can't decode it reliably any more. What's the purpose here?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Frank Millman  wrote:
> On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>
> I may be missing something, but that would cause a downside for me.
>
> I store Python lists and dicts in a database by calling dumps() when 
> saving them to the database and loads() when retrieving them.
>
> If a date was 'dumped' using isoformat(), then on retrieval I would not 
> know whether it was originally a string, which must remain as is, or was 
> originally a date object, which must be converted back to a date object.
>
> There is no perfect answer, but my solution works fairly well. When 
> dumping, I use 'default=repr'. This means that dates get dumped as 
> 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to 
> detect that it is actually a date object.

There is no difference whatsoever between matching on the repr output
you show above and matching on ISO-8601 datetimes, except that at least
ISO-8601 is an actual standard. So no, you haven't found a downside.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Chris Angelico  wrote:
> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
> wrote:
>> While I agree entirely with your point, there is however perhaps room
>> for a bit more helpfulness from the json module. There is no sensible
>> reason I can think of that it refuses to serialize sets, for example.
>
> Sets don't exist in JSON. I think that's a sensible reason.

It is not. Tuples don't exist either, and yet they're supported.

>> Going a bit further and, for example, automatically calling isoformat()
>> on date/time/datetime objects would perhaps be a bit more controversial,
>> but would frequently be useful, and there's no obvious downside that
>> occurs to me.
>
> They wouldn't round-trip without some way of knowing which strings
> represent date/times.

The 'json' module already fails to provide round-trip functionality:

>>> for data in ({True: 1}, {1: 2}, (1, 2)):
... if json.loads(json.dumps(data)) != data:
... print('oops', data, json.loads(json.dumps(data)))
... 
oops {True: 1} {'true': 1}
oops {1: 2} {'1': 2}
oops (1, 2) [1, 2]

> Maybe what people want is a pretty printer instead?

No. I want a JSON encoder to output JSON to be read by a JSON decoder.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread J. Pic
Well I made a suggestion on python-ideas and a PyPi lib came out of it, but
since you can't patch a lot of internal types it's not so useful.

Feel free to try it out:

https://yourlabs.io/oss/jsonlight/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Frank Millman

On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote:

On 2020-07-06, Chris Angelico  wrote:

On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:

Is there a "bulletproof" version of json.dump somewhere that will
convert bytes to str, any other iterables to list, etc., so you can
just get your data into a file & keep working?


That's the PHP definition of "bulletproof" - whatever happens, no
matter how bad, just keep right on going.


While I agree entirely with your point, there is however perhaps room
for a bit more helpfulness from the json module. There is no sensible
reason I can think of that it refuses to serialize sets, for example.
Going a bit further and, for example, automatically calling isoformat()
on date/time/datetime objects would perhaps be a bit more controversial,
but would frequently be useful, and there's no obvious downside that
occurs to me.



I may be missing something, but that would cause a downside for me.

I store Python lists and dicts in a database by calling dumps() when 
saving them to the database and loads() when retrieving them.


If a date was 'dumped' using isoformat(), then on retrieval I would not 
know whether it was originally a string, which must remain as is, or was 
originally a date object, which must be converted back to a date object.


There is no perfect answer, but my solution works fairly well. When 
dumping, I use 'default=repr'. This means that dates get dumped as 
'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to 
detect that it is actually a date object.


I use the same trick for Decimal objects.

Maybe the OP could do something similar.

Frank Millman

--
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list
 wrote:
>
> On 2020-07-06, Chris Angelico  wrote:
> > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
> >> Is there a "bulletproof" version of json.dump somewhere that will
> >> convert bytes to str, any other iterables to list, etc., so you can
> >> just get your data into a file & keep working?
> >
> > That's the PHP definition of "bulletproof" - whatever happens, no
> > matter how bad, just keep right on going.
>
> While I agree entirely with your point, there is however perhaps room
> for a bit more helpfulness from the json module. There is no sensible
> reason I can think of that it refuses to serialize sets, for example.

Sets don't exist in JSON. I think that's a sensible reason.

> Going a bit further and, for example, automatically calling isoformat()
> on date/time/datetime objects would perhaps be a bit more controversial,
> but would frequently be useful, and there's no obvious downside that
> occurs to me.

They wouldn't round-trip without some way of knowing which strings
represent date/times. If you just want a one-way output format, it's
not too hard to subclass the encoder - there's an example right there
in the docs (showing how to create a representation for complex
numbers). The vanilla JSON encoder shouldn't do any of this. In fact,
just supporting infinities and nans is fairly controversial - see
other threads happening right now.

Maybe what people want is a pretty printer instead?

https://docs.python.org/3/library/pprint.html

Resilient against recursive data structures, able to emit Python-like
code for many formats, is as readable as JSON, and is often
round-trippable. It lacks JSON's interoperability, but if you're
trying to serialize sets and datetimes, you're forfeiting that anyway.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Jon Ribbens via Python-list
On 2020-07-06, Chris Angelico  wrote:
> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>> Is there a "bulletproof" version of json.dump somewhere that will
>> convert bytes to str, any other iterables to list, etc., so you can
>> just get your data into a file & keep working?
>
> That's the PHP definition of "bulletproof" - whatever happens, no
> matter how bad, just keep right on going.

While I agree entirely with your point, there is however perhaps room
for a bit more helpfulness from the json module. There is no sensible
reason I can think of that it refuses to serialize sets, for example.
Going a bit further and, for example, automatically calling isoformat()
on date/time/datetime objects would perhaps be a bit more controversial,
but would frequently be useful, and there's no obvious downside that
occurs to me.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bulletproof json.dump?

2020-07-06 Thread Chris Angelico
On Mon, Jul 6, 2020 at 8:36 PM Adam Funk  wrote:
>
> Hi,
>
> I have a program that does a lot of work with URLs and requests,
> collecting data over about an hour, & then writing the collated data
> to a JSON file.  The first time I ran it, the json.dump failed because
> there was a bytes value instead of a str, so I had to figure out where
> that was coming from before I could get any data out.  I've previously
> run into the problem of collecting values in sets (for deduplication)
> & forgetting to walk through the big data object changing them to
> lists before serializing.
>
> Is there a "bulletproof" version of json.dump somewhere that will
> convert bytes to str, any other iterables to list, etc., so you can
> just get your data into a file & keep working?
>

That's the PHP definition of "bulletproof" - whatever happens, no
matter how bad, just keep right on going. If you really want some way
to write "just anything" to your file, I recommend not using JSON -
instead, write out the repr of your data structure. That'll give a
decent result for bytes, str, all forms of numbers, and pretty much
any collection, and it won't break if given something that can't
safely be represented.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Bulletproof json.dump?

2020-07-06 Thread Adam Funk
Hi,

I have a program that does a lot of work with URLs and requests,
collecting data over about an hour, & then writing the collated data
to a JSON file.  The first time I ran it, the json.dump failed because
there was a bytes value instead of a str, so I had to figure out where
that was coming from before I could get any data out.  I've previously
run into the problem of collecting values in sets (for deduplication)
& forgetting to walk through the big data object changing them to
lists before serializing.

Is there a "bulletproof" version of json.dump somewhere that will
convert bytes to str, any other iterables to list, etc., so you can
just get your data into a file & keep working?

(I'm using Python 3.7.)

Thanks!

-- 
Slade was the coolest band in England. They were the kind of guys
that would push your car out of a ditch.  ---Alice Cooper
-- 
https://mail.python.org/mailman/listinfo/python-list