Re: Bulletproof json.dump?
On 2020-07-07, Stephen Rosen wrote: > On Mon, Jul 6, 2020 at 6:37 AM Adam Funk wrote: > >> Is there a "bulletproof" version of json.dump somewhere that will >> convert bytes to str, any other iterables to list, etc., so you can >> just get your data into a file & keep working? >> > > Is the data only being read by python programs? If so, consider using > pickle: https://docs.python.org/3/library/pickle.html > Unlike json dumping, the goal of pickle is to represent objects as exactly > as possible and *not* to be interoperable with other languages. > > > If you're using json to pass data between python and some other language, > you don't want to silently convert bytes to strings. > If you have a bytestring of utf-8 data, you want to utf-8 decode it before > passing it to json.dumps. > Likewise, if you have latin-1 data, you want to latin-1 decode it. > There is no universal and correct bytes-to-string conversion. > > On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico wrote: > >> Maybe what we need is to fork out the default JSON encoder into two, >> or have a "strict=True" or "strict=False" flag. In non-strict mode, >> round-tripping is not guaranteed, and various types will be folded to >> each other - mainly, many built-in and stdlib types will be >> represented in strings. In strict mode, compliance with the RFC is >> ensured (so ValueError will be raised on inf/nan), and everything >> should round-trip safely. >> > > Wouldn't it be reasonable to represent this as an encoder which is provided > by `json`? i.e. > > from json import dumps, UnsafeJSONEncoder > ... > json.dumps(foo, cls=UnsafeJSONEncoder) > > Emphasizing the "Unsafe" part of this and introducing people to the idea of > setting an encoder also seems nice. > > > On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico wrote: > >> On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list >> wrote: >> > > >> The 'json' module already fails to provide round-trip functionality: >> > >> > >>> for data in ({True: 1}, {1: 2}, (1, 2)): >> > ... if json.loads(json.dumps(data)) != data: >> > ... print('oops', data, json.loads(json.dumps(data))) >> > ... >> > oops {True: 1} {'true': 1} >> > oops {1: 2} {'1': 2} >> > oops (1, 2) [1, 2] >> >> There's a fundamental limitation of JSON in that it requires string >> keys, so this is an obvious transformation. I suppose you could call >> that one a bug too, but it's very useful and not too dangerous. (And >> then there's the tuple-to-list transformation, which I think probably >> shouldn't happen, although I don't think that's likely to cause issues >> either.) > > > Ideally, all of these bits of support for non-JSON types should be opt-in, > not opt-out. > But it's not worth making a breaking change to the stdlib over this. > > Especially for new programmers, the notion that > deserialize(serialize(x)) != x > just seems like a recipe for subtle bugs. > > You're never guaranteed that the deserialized object will match the > original, but shouldn't one of the goals of a de/serialization library be > to get it as close as is reasonable? > > > I've seen people do things which boil down to > > json.loads(x)["some_id"] == UUID(...) > > plenty of times. It's obviously wrong and the fix is easy, but isn't making > the default json encoder less strict just encouraging this type of bug? > > Comparing JSON data against non-JSON types is part of the same category of > errors: conflating JSON with dictionaries. > It's very easy for people to make this mistake, especially since JSON > syntax is a subset of python dict syntax, so I don't think `json.dumps` > should be encouraging it. > > On Tue, Jul 7, 2020 at 6:52 AM Adam Funk wrote: > >> Here's another "I'd expect to have to deal with this sort of thing in >> Java" example I just ran into: >> >> >>> r = requests.head(url, allow_redirects=True) >> >>> print(json.dumps(r.headers, indent=2)) >> ... >> TypeError: Object of type CaseInsensitiveDict is not JSON serializable >> >>> print(json.dumps(dict(r.headers), indent=2)) >> { >> "Content-Type": "text/html; charset=utf-8", >> "Server": "openresty", >> ... >> } >> > > Why should the JSON encoder know about an arbitrary dict-like type? > It might implement Mapping, but there's no way for json.dumps to know that > in the general case (because not everything which implements Mapping > actually inherits from the Mapping ABC). > Converting it to a type which json.dumps understands is a reasonable > constraint. > > Also, wouldn't it be fair, if your object is "case insensitive" to > serialize it as > { "CONTENT-TYPE": ... } or { "content-type": ... } or ... > ? > > `r.headers["content-type"]` presumably gets a hit. > `json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a > KeyError. > > This seems very much out of scope for the json package because it's not > clear what it's supposed to do with this type. > Libraries should ask users to specify what they mean
Re: Bulletproof json.dump?
Try jsonlight.dumps it'll just work. Le mar. 7 juil. 2020 à 12:53, Adam Funk a écrit : > On 2020-07-06, Adam Funk wrote: > > > On 2020-07-06, Chris Angelico wrote: > >> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list > >> wrote: > > >>> While I agree entirely with your point, there is however perhaps room > >>> for a bit more helpfulness from the json module. There is no sensible > >>> reason I can think of that it refuses to serialize sets, for example. > >> > >> Sets don't exist in JSON. I think that's a sensible reason. > > > > I don't agree. Tuples & lists don't exist separately in JSON, but > > both are serializable (to the same thing). Non-string keys aren't > > allowed in JSON, but it silently converts numbers to strings instead > > of barfing. Typically, I've been using sets to deduplicate values as > > I go along, & having to walk through the whole object changing them to > > lists before serialization strikes me as the kind of pointless labor > > that I expect when I'm using Java. ;-) > > Here's another "I'd expect to have to deal with this sort of thing in > Java" example I just ran into: > > > >>> r = requests.head(url, allow_redirects=True) > >>> print(json.dumps(r.headers, indent=2)) > ... > TypeError: Object of type CaseInsensitiveDict is not JSON serializable > >>> print(json.dumps(dict(r.headers), indent=2)) > { > "Content-Type": "text/html; charset=utf-8", > "Server": "openresty", > ... > } > > > -- > I'm after rebellion --- I'll settle for lies. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 6:37 AM Adam Funk wrote: > Is there a "bulletproof" version of json.dump somewhere that will > convert bytes to str, any other iterables to list, etc., so you can > just get your data into a file & keep working? > Is the data only being read by python programs? If so, consider using pickle: https://docs.python.org/3/library/pickle.html Unlike json dumping, the goal of pickle is to represent objects as exactly as possible and *not* to be interoperable with other languages. If you're using json to pass data between python and some other language, you don't want to silently convert bytes to strings. If you have a bytestring of utf-8 data, you want to utf-8 decode it before passing it to json.dumps. Likewise, if you have latin-1 data, you want to latin-1 decode it. There is no universal and correct bytes-to-string conversion. On Mon, Jul 6, 2020 at 9:45 AM Chris Angelico wrote: > Maybe what we need is to fork out the default JSON encoder into two, > or have a "strict=True" or "strict=False" flag. In non-strict mode, > round-tripping is not guaranteed, and various types will be folded to > each other - mainly, many built-in and stdlib types will be > represented in strings. In strict mode, compliance with the RFC is > ensured (so ValueError will be raised on inf/nan), and everything > should round-trip safely. > Wouldn't it be reasonable to represent this as an encoder which is provided by `json`? i.e. from json import dumps, UnsafeJSONEncoder ... json.dumps(foo, cls=UnsafeJSONEncoder) Emphasizing the "Unsafe" part of this and introducing people to the idea of setting an encoder also seems nice. On Mon, Jul 6, 2020 at 9:12 AM Chris Angelico wrote: > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list > wrote: > > > The 'json' module already fails to provide round-trip functionality: > > > > >>> for data in ({True: 1}, {1: 2}, (1, 2)): > > ... if json.loads(json.dumps(data)) != data: > > ... print('oops', data, json.loads(json.dumps(data))) > > ... > > oops {True: 1} {'true': 1} > > oops {1: 2} {'1': 2} > > oops (1, 2) [1, 2] > > There's a fundamental limitation of JSON in that it requires string > keys, so this is an obvious transformation. I suppose you could call > that one a bug too, but it's very useful and not too dangerous. (And > then there's the tuple-to-list transformation, which I think probably > shouldn't happen, although I don't think that's likely to cause issues > either.) Ideally, all of these bits of support for non-JSON types should be opt-in, not opt-out. But it's not worth making a breaking change to the stdlib over this. Especially for new programmers, the notion that deserialize(serialize(x)) != x just seems like a recipe for subtle bugs. You're never guaranteed that the deserialized object will match the original, but shouldn't one of the goals of a de/serialization library be to get it as close as is reasonable? I've seen people do things which boil down to json.loads(x)["some_id"] == UUID(...) plenty of times. It's obviously wrong and the fix is easy, but isn't making the default json encoder less strict just encouraging this type of bug? Comparing JSON data against non-JSON types is part of the same category of errors: conflating JSON with dictionaries. It's very easy for people to make this mistake, especially since JSON syntax is a subset of python dict syntax, so I don't think `json.dumps` should be encouraging it. On Tue, Jul 7, 2020 at 6:52 AM Adam Funk wrote: > Here's another "I'd expect to have to deal with this sort of thing in > Java" example I just ran into: > > >>> r = requests.head(url, allow_redirects=True) > >>> print(json.dumps(r.headers, indent=2)) > ... > TypeError: Object of type CaseInsensitiveDict is not JSON serializable > >>> print(json.dumps(dict(r.headers), indent=2)) > { > "Content-Type": "text/html; charset=utf-8", > "Server": "openresty", > ... > } > Why should the JSON encoder know about an arbitrary dict-like type? It might implement Mapping, but there's no way for json.dumps to know that in the general case (because not everything which implements Mapping actually inherits from the Mapping ABC). Converting it to a type which json.dumps understands is a reasonable constraint. Also, wouldn't it be fair, if your object is "case insensitive" to serialize it as { "CONTENT-TYPE": ... } or { "content-type": ... } or ... ? `r.headers["content-type"]` presumably gets a hit. `json.loads(json.dumps(dict(r.headers)))["content-type"]` will get a KeyError. This seems very much out of scope for the json package because it's not clear what it's supposed to do with this type. Libraries should ask users to specify what they mean and not make potentially harmful assumptions. Best, -Stephen -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Adam Funk wrote: > On 2020-07-06, Chris Angelico wrote: >> On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list >> wrote: >>> While I agree entirely with your point, there is however perhaps room >>> for a bit more helpfulness from the json module. There is no sensible >>> reason I can think of that it refuses to serialize sets, for example. >> >> Sets don't exist in JSON. I think that's a sensible reason. > > I don't agree. Tuples & lists don't exist separately in JSON, but > both are serializable (to the same thing). Non-string keys aren't > allowed in JSON, but it silently converts numbers to strings instead > of barfing. Typically, I've been using sets to deduplicate values as > I go along, & having to walk through the whole object changing them to > lists before serialization strikes me as the kind of pointless labor > that I expect when I'm using Java. ;-) Here's another "I'd expect to have to deal with this sort of thing in Java" example I just ran into: >>> r = requests.head(url, allow_redirects=True) >>> print(json.dumps(r.headers, indent=2)) ... TypeError: Object of type CaseInsensitiveDict is not JSON serializable >>> print(json.dumps(dict(r.headers), indent=2)) { "Content-Type": "text/html; charset=utf-8", "Server": "openresty", ... } -- I'm after rebellion --- I'll settle for lies. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
You can achieve round-tripping by maintaining a type mapping in code, for a single datatype it would look like: newloads(datetime, newdumps(datetime.now()) If those would rely on __dump__ and __load__ functions in the fashion of pickle then nested data structures would also be easy: @dataclass class YourStruct: dt = datetime children = [] @classmethod def __load__(cls, data): return cls( dt=datetime.fromisoformat(data['dt']), children=[cls.__load__(c) for c in data['children']]) ) def __dump__(self): return dict( dt=self.dt.isoformat(), children=[c.__dump__() for c in self.children], ) If your datetime is not being loaded from C-code you can even monkey patch it and add __load__ and __dump__ on it and data round-trip as long as you keep the type mapping in a method. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Tue, Jul 7, 2020 at 12:01 AM Jon Ribbens via Python-list > wrote: >> I think what you're saying is, if we do: >> >> json1 = json.dumps(foo) >> json2 = json.dumps(json.loads(json1)) >> assert json1 == json2 >> >> the assertion should never fail (given that Python dictionaries are >> ordered these days). I seems to me that should probably be true >> regardless of any 'strict mode' flag - I can't immediately think of >> any reason it wouldn't be. > > Right. But in strict mode, the stronger assertion would hold: > > assert obj == json.loads(json.dumps(obj)) > > Also, the intermediate text would be RFC-compliant. If this cannot be > done, ValueError would be raised. (Or maybe TypeError in some cases.) Yes, I agree (although you'd need to call it something other than 'strict' mode, since that flag already exists). But note nothing I am suggesting would involve JSONEncoder ever producing non-standard output (except in cases where it already would). -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Tue, Jul 7, 2020 at 12:01 AM Jon Ribbens via Python-list wrote: > > On 2020-07-06, Chris Angelico wrote: > > I think that even in non-strict mode, round-tripping should be > > achieved after one iteration. That is to say, anything you can > > JSON-encode will JSON-decode to something that would create the same > > encoded form. Not sure if there's anything that would violate that > > (weak) guarantee. > > I think what you're saying is, if we do: > > json1 = json.dumps(foo) > json2 = json.dumps(json.loads(json1)) > assert json1 == json2 > > the assertion should never fail (given that Python dictionaries are > ordered these days). I seems to me that should probably be true > regardless of any 'strict mode' flag - I can't immediately think of > any reason it wouldn't be. Right. But in strict mode, the stronger assertion would hold: assert obj == json.loads(json.dumps(obj)) Also, the intermediate text would be RFC-compliant. If this cannot be done, ValueError would be raised. (Or maybe TypeError in some cases.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > I think that even in non-strict mode, round-tripping should be > achieved after one iteration. That is to say, anything you can > JSON-encode will JSON-decode to something that would create the same > encoded form. Not sure if there's anything that would violate that > (weak) guarantee. I think what you're saying is, if we do: json1 = json.dumps(foo) json2 = json.dumps(json.loads(json1)) assert json1 == json2 the assertion should never fail (given that Python dictionaries are ordered these days). I seems to me that should probably be true regardless of any 'strict mode' flag - I can't immediately think of any reason it wouldn't be. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Frank Millman wrote: > On 2020-07-06 3:08 PM, Jon Ribbens via Python-list wrote: >> On 2020-07-06, Frank Millman wrote: >>> On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote: While I agree entirely with your point, there is however perhaps room for a bit more helpfulness from the json module. There is no sensible reason I can think of that it refuses to serialize sets, for example. Going a bit further and, for example, automatically calling isoformat() on date/time/datetime objects would perhaps be a bit more controversial, but would frequently be useful, and there's no obvious downside that occurs to me. >>> >>> I may be missing something, but that would cause a downside for me. >>> >>> I store Python lists and dicts in a database by calling dumps() when >>> saving them to the database and loads() when retrieving them. >>> >>> If a date was 'dumped' using isoformat(), then on retrieval I would not >>> know whether it was originally a string, which must remain as is, or was >>> originally a date object, which must be converted back to a date object. >>> >>> There is no perfect answer, but my solution works fairly well. When >>> dumping, I use 'default=repr'. This means that dates get dumped as >>> 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to >>> detect that it is actually a date object. >> >> There is no difference whatsoever between matching on the repr output >> you show above and matching on ISO-8601 datetimes, except that at least >> ISO-8601 is an actual standard. So no, you haven't found a downside. > > I don't understand. As you say, ISO-8601 is a standard, so the original > object could well have been a string in that format. So how do you > distinguish between an object that started out as a string, and an > object that started out as a date/datetime object? With your method, how do you distinguish between an object that started out as a string, and an object that started out as a date/datetime object? The answer with both my method and your method is that you cannot - and therefore my method is not a "downside" compared to yours. Not to mention, I am not suggesting that your method should be disallowed if you want to continue using it - I am suggesting that your code could be simplified and your job made easier by my suggested improvement. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 11:39 PM Adam Funk wrote: > > Aha, I think the default=repr option is probably just what I need; > maybe (at least in the testing stages) something like this: > > try: > with open(output_file, 'w') as f: > json.dump(f) > except TypeError: > print('unexpected item in the bagging area!') > with open(output_file, 'w') as f: > json.dump(f, default=repr) > > and then I'd know when I need to go digging through the output for > bytes, sets, etc., but at least I'd have the output to examine. > Easier: def proclaimed_repr(): seen = False def show_obj(obj): nonlocal seen if not seen: seen = True print("unexpected item in the bagging area!") return repr(obj) return show_obj json.dump(f, default=proclaimed_repr()) If you don't care about "resetting" the marker, you can just use a global or a default-arg hack: def show_obj(obj, seen=[]): if not seen: seen.push(True) print("unexpected item in the bagging area!") return repr(obj) json.dump(f, default=show_obj) Either way, you can stick this function off in a utilities collection, and then use it without fiddling with try/except. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list > wrote: >> >> On 2020-07-06, Chris Angelico wrote: >> > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: >> >> Is there a "bulletproof" version of json.dump somewhere that will >> >> convert bytes to str, any other iterables to list, etc., so you can >> >> just get your data into a file & keep working? >> > >> > That's the PHP definition of "bulletproof" - whatever happens, no >> > matter how bad, just keep right on going. >> >> While I agree entirely with your point, there is however perhaps room >> for a bit more helpfulness from the json module. There is no sensible >> reason I can think of that it refuses to serialize sets, for example. > > Sets don't exist in JSON. I think that's a sensible reason. I don't agree. Tuples & lists don't exist separately in JSON, but both are serializable (to the same thing). Non-string keys aren't allowed in JSON, but it silently converts numbers to strings instead of barfing. Typically, I've been using sets to deduplicate values as I go along, & having to walk through the whole object changing them to lists before serialization strikes me as the kind of pointless labor that I expect when I'm using Java. ;-) >> Going a bit further and, for example, automatically calling isoformat() >> on date/time/datetime objects would perhaps be a bit more controversial, >> but would frequently be useful, and there's no obvious downside that >> occurs to me. > > They wouldn't round-trip without some way of knowing which strings > represent date/times. If you just want a one-way output format, it's > not too hard to subclass the encoder - there's an example right there > in the docs (showing how to create a representation for complex > numbers). The vanilla JSON encoder shouldn't do any of this. In fact, > just supporting infinities and nans is fairly controversial - see > other threads happening right now. > > Maybe what people want is a pretty printer instead? > > https://docs.python.org/3/library/pprint.html > > Resilient against recursive data structures, able to emit Python-like > code for many formats, is as readable as JSON, and is often > round-trippable. It lacks JSON's interoperability, but if you're > trying to serialize sets and datetimes, you're forfeiting that anyway. > > ChrisA -- "It is the role of librarians to keep government running in difficult times," replied Dramoren. "Librarians are the last line of defence against chaos." (McMullen 2001) -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 11:31 PM Jon Ribbens via Python-list wrote: > > On 2020-07-06, Chris Angelico wrote: > > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list > > wrote: > >> The 'json' module already fails to provide round-trip functionality: > >> > >> >>> for data in ({True: 1}, {1: 2}, (1, 2)): > >> ... if json.loads(json.dumps(data)) != data: > >> ... print('oops', data, json.loads(json.dumps(data))) > >> ... > >> oops {True: 1} {'true': 1} > >> oops {1: 2} {'1': 2} > >> oops (1, 2) [1, 2] > > > > There's a fundamental limitation of JSON in that it requires string > > keys, so this is an obvious transformation. I suppose you could call > > that one a bug too, but it's very useful and not too dangerous. (And > > then there's the tuple-to-list transformation, which I think probably > > shouldn't happen, although I don't think that's likely to cause issues > > either.) > > That's my point though - there's almost no difference between allowing > encoding of tuples and allowing encoding of sets. Any argument against > the latter would also apply against the former. The only possible excuse > for the difference is "historical reasons", and given that it would be > useful to allow it, and there would be no negative consequences, this > hardly seems sufficient. > > >> No. I want a JSON encoder to output JSON to be read by a JSON decoder. > > > > Does it need to round-trip, though? If you stringify your datetimes, > > you can't decode it reliably any more. What's the purpose here? > > It doesn't need to round trip (which as mentioned above is fortunate > because the existing module already doesn't round trip). The main use > I have, and I should imagine the main use anyone has, for JSON is > interoperability - to safely store and send data in a format in which > it can be read by non-Python code. If you need, say, date/times to > be understood as date/times by the receiving code they'll have to > deal with that explicitly already. Improving Python to allow sending > them at least gets us part way there by eliminating half the work. That's fair. Maybe what we need is to fork out the default JSON encoder into two, or have a "strict=True" or "strict=False" flag. In non-strict mode, round-tripping is not guaranteed, and various types will be folded to each other - mainly, many built-in and stdlib types will be represented in strings. In strict mode, compliance with the RFC is ensured (so ValueError will be raised on inf/nan), and everything should round-trip safely. I think that even in non-strict mode, round-tripping should be achieved after one iteration. That is to say, anything you can JSON-encode will JSON-decode to something that would create the same encoded form. Not sure if there's anything that would violate that (weak) guarantee. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Frank Millman wrote: > On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote: >> On 2020-07-06, Chris Angelico wrote: >>> On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: Is there a "bulletproof" version of json.dump somewhere that will convert bytes to str, any other iterables to list, etc., so you can just get your data into a file & keep working? >>> >>> That's the PHP definition of "bulletproof" - whatever happens, no >>> matter how bad, just keep right on going. >> >> While I agree entirely with your point, there is however perhaps room >> for a bit more helpfulness from the json module. There is no sensible >> reason I can think of that it refuses to serialize sets, for example. >> Going a bit further and, for example, automatically calling isoformat() >> on date/time/datetime objects would perhaps be a bit more controversial, >> but would frequently be useful, and there's no obvious downside that >> occurs to me. >> > > I may be missing something, but that would cause a downside for me. > > I store Python lists and dicts in a database by calling dumps() when > saving them to the database and loads() when retrieving them. > > If a date was 'dumped' using isoformat(), then on retrieval I would not > know whether it was originally a string, which must remain as is, or was > originally a date object, which must be converted back to a date object. > > There is no perfect answer, but my solution works fairly well. When > dumping, I use 'default=repr'. This means that dates get dumped as > 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to > detect that it is actually a date object. > > I use the same trick for Decimal objects. > > Maybe the OP could do something similar. Aha, I think the default=repr option is probably just what I need; maybe (at least in the testing stages) something like this: try: with open(output_file, 'w') as f: json.dump(f) except TypeError: print('unexpected item in the bagging area!') with open(output_file, 'w') as f: json.dump(f, default=repr) and then I'd know when I need to go digging through the output for bytes, sets, etc., but at least I'd have the output to examine. -- Well, we had a lot of luck on Venus We always had a ball on Mars -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: >> >> Hi, >> >> I have a program that does a lot of work with URLs and requests, >> collecting data over about an hour, & then writing the collated data >> to a JSON file. The first time I ran it, the json.dump failed because >> there was a bytes value instead of a str, so I had to figure out where >> that was coming from before I could get any data out. I've previously >> run into the problem of collecting values in sets (for deduplication) >> & forgetting to walk through the big data object changing them to >> lists before serializing. >> >> Is there a "bulletproof" version of json.dump somewhere that will >> convert bytes to str, any other iterables to list, etc., so you can >> just get your data into a file & keep working? >> > > That's the PHP definition of "bulletproof" - whatever happens, no > matter how bad, just keep right on going. If you really want some way Well played! > to write "just anything" to your file, I recommend not using JSON - > instead, write out the repr of your data structure. That'll give a > decent result for bytes, str, all forms of numbers, and pretty much > any collection, and it won't break if given something that can't > safely be represented. Interesting point. At least the TypeError message does say what the unacceptable type is ("Object of type set is not JSON serializable"). -- "It is the role of librarians to keep government running in difficult times," replied Dramoren. "Librarians are the last line of defence against chaos." (McMullen 2001) -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06 3:08 PM, Jon Ribbens via Python-list wrote: On 2020-07-06, Frank Millman wrote: On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote: While I agree entirely with your point, there is however perhaps room for a bit more helpfulness from the json module. There is no sensible reason I can think of that it refuses to serialize sets, for example. Going a bit further and, for example, automatically calling isoformat() on date/time/datetime objects would perhaps be a bit more controversial, but would frequently be useful, and there's no obvious downside that occurs to me. I may be missing something, but that would cause a downside for me. I store Python lists and dicts in a database by calling dumps() when saving them to the database and loads() when retrieving them. If a date was 'dumped' using isoformat(), then on retrieval I would not know whether it was originally a string, which must remain as is, or was originally a date object, which must be converted back to a date object. There is no perfect answer, but my solution works fairly well. When dumping, I use 'default=repr'. This means that dates get dumped as 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to detect that it is actually a date object. There is no difference whatsoever between matching on the repr output you show above and matching on ISO-8601 datetimes, except that at least ISO-8601 is an actual standard. So no, you haven't found a downside. I don't understand. As you say, ISO-8601 is a standard, so the original object could well have been a string in that format. So how do you distinguish between an object that started out as a string, and an object that started out as a date/datetime object? Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list > wrote: >> The 'json' module already fails to provide round-trip functionality: >> >> >>> for data in ({True: 1}, {1: 2}, (1, 2)): >> ... if json.loads(json.dumps(data)) != data: >> ... print('oops', data, json.loads(json.dumps(data))) >> ... >> oops {True: 1} {'true': 1} >> oops {1: 2} {'1': 2} >> oops (1, 2) [1, 2] > > There's a fundamental limitation of JSON in that it requires string > keys, so this is an obvious transformation. I suppose you could call > that one a bug too, but it's very useful and not too dangerous. (And > then there's the tuple-to-list transformation, which I think probably > shouldn't happen, although I don't think that's likely to cause issues > either.) That's my point though - there's almost no difference between allowing encoding of tuples and allowing encoding of sets. Any argument against the latter would also apply against the former. The only possible excuse for the difference is "historical reasons", and given that it would be useful to allow it, and there would be no negative consequences, this hardly seems sufficient. >> No. I want a JSON encoder to output JSON to be read by a JSON decoder. > > Does it need to round-trip, though? If you stringify your datetimes, > you can't decode it reliably any more. What's the purpose here? It doesn't need to round trip (which as mentioned above is fortunate because the existing module already doesn't round trip). The main use I have, and I should imagine the main use anyone has, for JSON is interoperability - to safely store and send data in a format in which it can be read by non-Python code. If you need, say, date/times to be understood as date/times by the receiving code they'll have to deal with that explicitly already. Improving Python to allow sending them at least gets us part way there by eliminating half the work. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, J. Pic wrote: > Well I made a suggestion on python-ideas and a PyPi lib came out of it, but > since you can't patch a lot of internal types it's not so useful. > > Feel free to try it out: > > https://yourlabs.io/oss/jsonlight/ While I applaud your experimentation, that is not suitable for any purpose. You would probably do better by starting off subclassing json.JSONEncoder. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 11:06 PM Jon Ribbens via Python-list wrote: > > On 2020-07-06, Chris Angelico wrote: > > On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list > > wrote: > >> While I agree entirely with your point, there is however perhaps room > >> for a bit more helpfulness from the json module. There is no sensible > >> reason I can think of that it refuses to serialize sets, for example. > > > > Sets don't exist in JSON. I think that's a sensible reason. > > It is not. Tuples don't exist either, and yet they're supported. Hmm, I didn't know that. Possibly it's as much a bug as the inf/nan issue. > >> Going a bit further and, for example, automatically calling isoformat() > >> on date/time/datetime objects would perhaps be a bit more controversial, > >> but would frequently be useful, and there's no obvious downside that > >> occurs to me. > > > > They wouldn't round-trip without some way of knowing which strings > > represent date/times. > > The 'json' module already fails to provide round-trip functionality: > > >>> for data in ({True: 1}, {1: 2}, (1, 2)): > ... if json.loads(json.dumps(data)) != data: > ... print('oops', data, json.loads(json.dumps(data))) > ... > oops {True: 1} {'true': 1} > oops {1: 2} {'1': 2} > oops (1, 2) [1, 2] There's a fundamental limitation of JSON in that it requires string keys, so this is an obvious transformation. I suppose you could call that one a bug too, but it's very useful and not too dangerous. (And then there's the tuple-to-list transformation, which I think probably shouldn't happen, although I don't think that's likely to cause issues either.) > > Maybe what people want is a pretty printer instead? > > No. I want a JSON encoder to output JSON to be read by a JSON decoder. Does it need to round-trip, though? If you stringify your datetimes, you can't decode it reliably any more. What's the purpose here? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Frank Millman wrote: > On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote: >> While I agree entirely with your point, there is however perhaps room >> for a bit more helpfulness from the json module. There is no sensible >> reason I can think of that it refuses to serialize sets, for example. >> Going a bit further and, for example, automatically calling isoformat() >> on date/time/datetime objects would perhaps be a bit more controversial, >> but would frequently be useful, and there's no obvious downside that >> occurs to me. > > I may be missing something, but that would cause a downside for me. > > I store Python lists and dicts in a database by calling dumps() when > saving them to the database and loads() when retrieving them. > > If a date was 'dumped' using isoformat(), then on retrieval I would not > know whether it was originally a string, which must remain as is, or was > originally a date object, which must be converted back to a date object. > > There is no perfect answer, but my solution works fairly well. When > dumping, I use 'default=repr'. This means that dates get dumped as > 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to > detect that it is actually a date object. There is no difference whatsoever between matching on the repr output you show above and matching on ISO-8601 datetimes, except that at least ISO-8601 is an actual standard. So no, you haven't found a downside. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list > wrote: >> While I agree entirely with your point, there is however perhaps room >> for a bit more helpfulness from the json module. There is no sensible >> reason I can think of that it refuses to serialize sets, for example. > > Sets don't exist in JSON. I think that's a sensible reason. It is not. Tuples don't exist either, and yet they're supported. >> Going a bit further and, for example, automatically calling isoformat() >> on date/time/datetime objects would perhaps be a bit more controversial, >> but would frequently be useful, and there's no obvious downside that >> occurs to me. > > They wouldn't round-trip without some way of knowing which strings > represent date/times. The 'json' module already fails to provide round-trip functionality: >>> for data in ({True: 1}, {1: 2}, (1, 2)): ... if json.loads(json.dumps(data)) != data: ... print('oops', data, json.loads(json.dumps(data))) ... oops {True: 1} {'true': 1} oops {1: 2} {'1': 2} oops (1, 2) [1, 2] > Maybe what people want is a pretty printer instead? No. I want a JSON encoder to output JSON to be read by a JSON decoder. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
Well I made a suggestion on python-ideas and a PyPi lib came out of it, but since you can't patch a lot of internal types it's not so useful. Feel free to try it out: https://yourlabs.io/oss/jsonlight/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06 2:06 PM, Jon Ribbens via Python-list wrote: On 2020-07-06, Chris Angelico wrote: On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: Is there a "bulletproof" version of json.dump somewhere that will convert bytes to str, any other iterables to list, etc., so you can just get your data into a file & keep working? That's the PHP definition of "bulletproof" - whatever happens, no matter how bad, just keep right on going. While I agree entirely with your point, there is however perhaps room for a bit more helpfulness from the json module. There is no sensible reason I can think of that it refuses to serialize sets, for example. Going a bit further and, for example, automatically calling isoformat() on date/time/datetime objects would perhaps be a bit more controversial, but would frequently be useful, and there's no obvious downside that occurs to me. I may be missing something, but that would cause a downside for me. I store Python lists and dicts in a database by calling dumps() when saving them to the database and loads() when retrieving them. If a date was 'dumped' using isoformat(), then on retrieval I would not know whether it was originally a string, which must remain as is, or was originally a date object, which must be converted back to a date object. There is no perfect answer, but my solution works fairly well. When dumping, I use 'default=repr'. This means that dates get dumped as 'datetime.date(2020, 7, 6)'. I look for that pattern on retrieval to detect that it is actually a date object. I use the same trick for Decimal objects. Maybe the OP could do something similar. Frank Millman -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 10:11 PM Jon Ribbens via Python-list wrote: > > On 2020-07-06, Chris Angelico wrote: > > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: > >> Is there a "bulletproof" version of json.dump somewhere that will > >> convert bytes to str, any other iterables to list, etc., so you can > >> just get your data into a file & keep working? > > > > That's the PHP definition of "bulletproof" - whatever happens, no > > matter how bad, just keep right on going. > > While I agree entirely with your point, there is however perhaps room > for a bit more helpfulness from the json module. There is no sensible > reason I can think of that it refuses to serialize sets, for example. Sets don't exist in JSON. I think that's a sensible reason. > Going a bit further and, for example, automatically calling isoformat() > on date/time/datetime objects would perhaps be a bit more controversial, > but would frequently be useful, and there's no obvious downside that > occurs to me. They wouldn't round-trip without some way of knowing which strings represent date/times. If you just want a one-way output format, it's not too hard to subclass the encoder - there's an example right there in the docs (showing how to create a representation for complex numbers). The vanilla JSON encoder shouldn't do any of this. In fact, just supporting infinities and nans is fairly controversial - see other threads happening right now. Maybe what people want is a pretty printer instead? https://docs.python.org/3/library/pprint.html Resilient against recursive data structures, able to emit Python-like code for many formats, is as readable as JSON, and is often round-trippable. It lacks JSON's interoperability, but if you're trying to serialize sets and datetimes, you're forfeiting that anyway. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On 2020-07-06, Chris Angelico wrote: > On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: >> Is there a "bulletproof" version of json.dump somewhere that will >> convert bytes to str, any other iterables to list, etc., so you can >> just get your data into a file & keep working? > > That's the PHP definition of "bulletproof" - whatever happens, no > matter how bad, just keep right on going. While I agree entirely with your point, there is however perhaps room for a bit more helpfulness from the json module. There is no sensible reason I can think of that it refuses to serialize sets, for example. Going a bit further and, for example, automatically calling isoformat() on date/time/datetime objects would perhaps be a bit more controversial, but would frequently be useful, and there's no obvious downside that occurs to me. -- https://mail.python.org/mailman/listinfo/python-list
Re: Bulletproof json.dump?
On Mon, Jul 6, 2020 at 8:36 PM Adam Funk wrote: > > Hi, > > I have a program that does a lot of work with URLs and requests, > collecting data over about an hour, & then writing the collated data > to a JSON file. The first time I ran it, the json.dump failed because > there was a bytes value instead of a str, so I had to figure out where > that was coming from before I could get any data out. I've previously > run into the problem of collecting values in sets (for deduplication) > & forgetting to walk through the big data object changing them to > lists before serializing. > > Is there a "bulletproof" version of json.dump somewhere that will > convert bytes to str, any other iterables to list, etc., so you can > just get your data into a file & keep working? > That's the PHP definition of "bulletproof" - whatever happens, no matter how bad, just keep right on going. If you really want some way to write "just anything" to your file, I recommend not using JSON - instead, write out the repr of your data structure. That'll give a decent result for bytes, str, all forms of numbers, and pretty much any collection, and it won't break if given something that can't safely be represented. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Bulletproof json.dump?
Hi, I have a program that does a lot of work with URLs and requests, collecting data over about an hour, & then writing the collated data to a JSON file. The first time I ran it, the json.dump failed because there was a bytes value instead of a str, so I had to figure out where that was coming from before I could get any data out. I've previously run into the problem of collecting values in sets (for deduplication) & forgetting to walk through the big data object changing them to lists before serializing. Is there a "bulletproof" version of json.dump somewhere that will convert bytes to str, any other iterables to list, etc., so you can just get your data into a file & keep working? (I'm using Python 3.7.) Thanks! -- Slade was the coolest band in England. They were the kind of guys that would push your car out of a ditch. ---Alice Cooper -- https://mail.python.org/mailman/listinfo/python-list