[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2021-07-10 Thread Christian Tanzer

Christian Tanzer  added the comment:

Json keys *are strings*. 

That‘s why json.dump stringifies all keys. If you want to argue that this 
behavior is wrong I wouldn’t protest except for that it breaks extant code.

But arguing that sorting the stringified keys would violate user’s expectations 
or lead to problems down the line makes no sense. The user is asking for an 
object with string keys and they want the keys sorted. That is unambiguous and 
well defined.

Neither does adding a second argument make any sense, it would just increase 
the confusion.

My problem was that Python 3.x threw an exception about this for a complex json 
object in a context where it was not at all obvious what was going on. And the 
code in question had worked for years in Python 2.

This bug report is many, many years old and I don’t much care one way or 
another but I am very sad that 

 Practicality beats purity 

got utterly lost in and after the transition to Python 3.

Christian Tanzer

> On 10.07.2021, at 16:12, Andrei Kulakov  wrote:
> 
> Andrei Kulakov  added the comment:
> 
> Some observations:
> 
> - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too 
> much to ask users to do this type of preprocessing manually before dumping to 
> json.
> 
> - the error doesn't seem too onerous to me. 'unorderable types: str() < 
> int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, 
> and then the issue is obvious.
> 
> - to me, current behaviour seems preferable to silently guessing that users 
> wants stringified sorting, especially since it can surface as a problem way 
> down the line.
> 
> - what makes this issue interesting is that in roughly half of cases (I'm 
> guessing) the user will want object sorted and then cast to string and would 
> be surprised if the reverse happened, and in the other half cases the user 
> would want them stringified, then sorted, and would be surprised if that 
> didn't happen.
> 
> It depends on the perspective: you may think of the json as a representation 
> of a dict of objects, that just happen to be in json format; or you can think 
> of it as a json document with string keys (of course) that just happen to 
> come from a dict of objects. Both can be valid depending on the use case.
> 
> Given all of this, I propose keeping current behavior for the existing arg, 
> and adding another arg for 'stringify then sort' behavior. Then we'll have no 
> silent guessing and the "unorderable" type error message can point the user 
> directly to the new argument.
> 
> If the user reads the docs before using this method, they will see two clear 
> options with respective tradeoffs and decide which one to use.
> 
> So either by reading the docs or running into the error, the user will have a 
> clear explanation and a clear and convenient solution.
> 
> --
> nosy: +andrei.avk
> 
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2021-07-10 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

Some observations:

 - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too 
much to ask users to do this type of preprocessing manually before dumping to 
json.

 - the error doesn't seem too onerous to me. 'unorderable types: str() < int()' 
If uncertain, a user can go to interactive shell and try `1 < "2"`, and then 
the issue is obvious.

 - to me, current behaviour seems preferable to silently guessing that users 
wants stringified sorting, especially since it can surface as a problem way 
down the line.

 - what makes this issue interesting is that in roughly half of cases (I'm 
guessing) the user will want object sorted and then cast to string and would be 
surprised if the reverse happened, and in the other half cases the user would 
want them stringified, then sorted, and would be surprised if that didn't 
happen.

It depends on the perspective: you may think of the json as a representation of 
a dict of objects, that just happen to be in json format; or you can think of 
it as a json document with string keys (of course) that just happen to come 
from a dict of objects. Both can be valid depending on the use case.

Given all of this, I propose keeping current behavior for the existing arg, and 
adding another arg for 'stringify then sort' behavior. Then we'll have no 
silent guessing and the "unorderable" type error message can point the user 
directly to the new argument.

If the user reads the docs before using this method, they will see two clear 
options with respective tradeoffs and decide which one to use.

So either by reading the docs or running into the error, the user will have a 
clear explanation and a clear and convenient solution.

--
nosy: +andrei.avk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2021-01-07 Thread naught101


naught101  added the comment:

I want to do something like this:

hashlib.md5(json.dumps(d, sort_keys=True))

So I can check if a dict's contents are the same as a DB version, but I can't 
guarantee that all the keys are strings, so it breaks, annoyingly. I would very 
much like the apply-default-function-then-sort approach. Until then, my 
work-around is this:

def deep_stringize_dict_keys(item):
"""Converts all keys to strings in a nested dictionary"""
if isinstance(item, dict):
return {str(k): deep_stringize_dict_keys(v) for k, v in 
item.items()}

if isinstance(item, list):
# This will check only ONE layer deep for nested dictionaries 
inside lists.
# If you need deeper than that, you're probably doing something 
stupid.
if any(isinstance(v, dict) for v in item):
return [deep_stringize_dict_keys(v) if isinstance(v, dict) else 
v
for v in item]

# I don't care about tuples, since they don't exist in JSON

return item

Maybe it can be of some use for others.

--
nosy: +naught101

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2019-09-07 Thread Roundup Robot


Change by Roundup Robot :


--
pull_requests: +15382
pull_request: https://github.com/python/cpython/pull/15691

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-06-29 Thread James Edwards


Change by James Edwards :


--
keywords: +patch
pull_requests: +7617
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-06-29 Thread James Edwards


James Edwards  added the comment:

This came up in a StackOverflow question[1] today, so I took a stab at 
addressing the error.  The changes don't restore the 2.x behavior, but just do 
as R. David Murray suggested and coerce the keys to strings prior to sorting to 
prevent the error.

The changes in _json.c and json.decoder are handled slightly differently in the 
case of skipkeys.

Both create a list of (coerced_key, value) pairs, sorts it (when specified), 
and uses that in place of the PyDict_Items / .items().

When skipkeys=True and invalid (uncoercible) keys are found, the c code will 
just not append that item to the coerced_items list while the python code uses 
None to signal that item should be filtered out.

(That being said, I'm not a huge fan of the approach I used in the Python code 
and may rewrite using .append instead of a generator.

The c code could definitely use a review when it comes to reference counts.

Fork commit: 
https://github.com/jheiv/cpython/commit/8d3612f56a137da0d26b83d00507ff2f11bca9bb

[1] 
https://stackoverflow.com/questions/51093268/why-am-i-getting-typeerror-unorderable-types-str-int-in-this-code

--
nosy: +jedwards

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread R. David Murray

R. David Murray  added the comment:

json keys *are* strings, so the fact that we support other object types as keys 
and coerce them to strings is an "extra feature" of python, and is actually a 
somewhat questionable feature.  The reproducible use case is solved by the fact 
that dicts are now ordered, with no extra work on the part of the programmer.  
Likewise, if you want custom sorting you can ensure your dict is ordered the 
way you want it to be, as you indicate.  The remaining use case for sort_keys, 
then (and one for which it is *commonly* used) is sorting the keys 
lexicographically, and for that, sorting the coereced strings is correct per 
the json standard (in which all keys are required to be strings).

Note that sort_keys will not be removed for backward compatibility reasons, so 
the only question is whether or not the increased functionality of coercing 
first is worth the trouble to implement.  I'm actually only +0 on it, since I 
don't consider it good practice to json-ize dicts that have non-string keys.  
The reason I'm + is because it would increase backward compatibility with 
python2 (not the ordering of the output, we can't make that work, but in the 
fact that it would no longer raise an error in python3).

We'll see if other core developers agree or disagree.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread Aaron Hall

Aaron Hall  added the comment:

>From a design standpoint, I'm fairly certain the sort_keys argument was 
>created due to Python's dicts being arbitrarily ordered.

Coercing to strings before sorting is unsatisfactory because, e.g. numbers sort 
lexicographically instead of by numeric value when strings.

>>> import json
>>> json.dumps({i:i**2 for i in range(15)}, sort_keys=True)
'{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, 
"9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}'
>>> json.dumps({str(i):i**2 for i in range(15)}, sort_keys=True)
'{"0": 0, "1": 1, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196, "2": 
4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81}'

Changing the order of operations is just going to create more issues, IMHO.

Now that users can sort their dicts prior to providing them to the function, 
e.g.:

>>> json.dumps({str(i):i**2 for i in range(15)})
'{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, 
"9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}'

we could deprecate the argument, or just keep it as-is for hysterical raisins.

Regardless, I'd close this as "won't fix".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread R. David Murray

R. David Murray  added the comment:

I'm fairly certain (though not 100%, obviously :) that coercing first and then 
sorting would be accepted if someone wants to create a PR for this.

--
nosy: +r.david.murray
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread Christian Tanzer

Christian Tanzer  added the comment:

Aaron Hall wrote at Sun, 20 May 2018 16:49:06 +:

> Now that dicts are sortable, does that make the sort_keys argument redundant?
>
> Should this bug be changed to "won't fix"?

https://bugs.python.org/issue25457#msg317216 is as good an answer as I
could give.

Considering that I openend the bug more than 2.5 years ago, it doesn't
really matter though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-21 Thread zachrahan

zachrahan  added the comment:

Well, "wontfix" would be appropriate in the context of deprecating the 
sort_keys option (over the course of however many releases) and documenting 
that the new procedure for getting JSON output in a specific order is to ensure 
that the input dict was created in that order.

Certainly for regression testing, sort_keys is no longer needed, but that's not 
the only reason people are using that option. (It's certainly not why I use the 
option -- my use stems from sort_keys improving human readability of the JSON.)

But outside of deprecating sort_keys wholesale, it is still a bug that 
sort_keys=True can cause an error on input that would otherwise be valid for 
json.dump[s].

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2018-05-20 Thread Aaron Hall

Aaron Hall  added the comment:

Now that dicts are sortable, does that make the sort_keys argument redundant?

Should this bug be changed to "won't fix"?

--
nosy: +Aaron Hall

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2017-06-28 Thread zachrahan

zachrahan added the comment:

This one just bit me too. It seems that if JSON serialization accepts 
non-string dict keys, it should make sure to accept them in all circumstances. 
Currently, there is an error *only* with mixed-type dicts, *only* when 
sort_keys=True.

In addition, the error raised in such cases is especially unhelpful. Running 
the following:
json.dumps({3:1, 'foo':'bar'}, sort_keys=True)

produces a stack trace that terminates in a function defined in C, with this 
error:
TypeError: '<' not supported between instances of 'str' and 'int'

That error doesn't give non-experts very much to go on...!

The fix is reasonably simple: coerce dict keys to strings *before* trying to 
sort the keys, not after. The only fuss in making such a patch is that the 
behavior has to be fixed in both _json.c and in json/encode.py.

The only other consistent behavior would be to disallow non-string keys, but 
that behavior is at this point very well entrenched. So it only makes sense 
that encoding should be patched to not fail in corner cases.

--
nosy: +zachrahan
versions: +Python 3.7 -Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-23 Thread Christian Tanzer

Christian Tanzer added the comment:

Josh Rosenberg wrote at Fri, 23 Oct 2015 02:45:51 +:

> The Python 2 sort order is a result of the "arbitrary but consistent
> fallback comparison" (omitting details, it's comparing the names of
> the types), thus the "strange" sort order.

Thanks. I knew that.

> Python 3 (justifiably) said that incomparable types should be
> incomparable rather than silently behaving in non-intuitive ways,
> hiding errors.

"justifiably" is debatable. I consider the change ill-conveived.

Displaying a dictionary (or just its keys) in a readable, or just
reproducible, way is useful in many contexts. Python 3 broke that for
very little, INMNSHO, gain.

I consider "hiding errors" a myth, to say it politely!

> Python is being rather generous by allowing non-string keys, because
> the  JSON spec ( http://json.org/ ) only allows the keys ("names" in
> JSON parlance) to be strings. So you're already a bit in the weeds as
> far as compliant JSON goes if you have non-string keys.

There are two possibilities:

1) Accepting non-string keys is intended. Then `sort_keys` shouldn't
   break like it does.

   As far as JSON goes, the output of `json.dump[s]` contains string keys.

2) Accepting non-string keys is a bug. Then `json.dump[s]` should be
   changed to not accept them.

Mixing both approaches is the worst of all worlds.

> Since mixed type keys lack meaningful sort order, I'm not sure it's
> wrong to reject attempts to sort them.

The documentation says:

If sort_keys is True (default False), then the output of dictionaries
will be sorted by key; this is useful for regression tests to ensure
that JSON serializations can be compared on a day-to-day basis.

**Reproducible** is the keyword here.

**Readability** is another one. Even if the sort order is "strange",
it is much better than random order, if you are looking for a specific
key.

For the record, it was a test failing under Python 3.5 that triggered
this bug report.

> > As all they keys are dumped as strings, a simple solution would be to
> > sort after converting to strings.
> Converting to string is as
> arbitrary and full of potential for silently incorrect comparisons as
> the Python 2 behavior, and reintroducing it seems like a bad idea.

json.dumps already does the conversion::

>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
'{"foo": "bar", "1": 42, "null": "nada"}'

Another run::

>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
'{"1": 42, "foo": "bar", "null": "nada"}'

That difference is exactly the reason for `sort_keys`.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-23 Thread Christian Tanzer

Christian Tanzer added the comment:

Josh Rosenberg wrote at Fri, 23 Oct 2015 02:56:30 +:

> As a workaround (should you absolutely need to sort keys by some
> arbitrary criteria), you can initialize a collections.OrderedDict from
> the sorted items of your original dict (using whatever key function
> you like), then dump without using sort_keys=True.

Sigh...

I already implemented a workaround but it's not as simple as you
think — the dictionary in question is nested.

The problem is that this is just another unnecessary difficulty when
trying to move to Python 3.x.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-22 Thread Christian Tanzer

New submission from Christian Tanzer:

In Python 3, trying to json-dump a dict with keys of different types fails with 
a TypeError when sort_keys is specified:

python2.7
===

Python 2.7.10 (default, May 29 2015, 10:02:30) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
'{"null": "nada", "1": 42, "foo": "bar"}'

python3.5


Python 3.5.0 (default, Oct  5 2015, 12:03:13) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps
**kw).encode(obj)
  File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
TypeError: unorderable types: str() < int()

Note that the documentation explicitly allows keys of different, if basic, 
types:

  If skipkeys is True (default: False), then dict keys that are not of a basic 
type (str, int, float, bool, None) will be skipped instead of raising a 
TypeError.

As all they keys are dumped as strings, a simple solution would be to sort 
after converting to strings. Looking closely at the output of Python 2, the 
sort order is a bit strange!

--
components: Library (Lib)
messages: 253324
nosy: tan...@swing.co.at
priority: normal
severity: normal
status: open
title: json dump fails for mixed-type keys when sort_keys is specified
type: behavior
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-22 Thread Josh Rosenberg

Josh Rosenberg added the comment:

As a workaround (should you absolutely need to sort keys by some arbitrary 
criteria), you can initialize a collections.OrderedDict from the sorted items 
of your original dict (using whatever key function you like), then dump without 
using sort_keys=True. For example, your suggested behavior (treat all keys as 
str) could be achieved by the user by replacing:

json.dumps(mydict, sort_keys=True)

with:

json.dumps(collections.OrderedDict(sorted(mydict.items(), key=str)))

Particularly in 3.5 (where OrderedDict is a C builtin), this shouldn't incur 
too much additional overhead (`sort_keys` has to make a sorted list 
intermediate anyway), and the output is the same, without introducing implicit 
hiding of errors.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-22 Thread Josh Rosenberg

Josh Rosenberg added the comment:

The Python 2 sort order is a result of the "arbitrary but consistent fallback 
comparison" (omitting details, it's comparing the names of the types), thus the 
"strange" sort order. Python 3 (justifiably) said that incomparable types 
should be incomparable rather than silently behaving in non-intuitive ways, 
hiding errors.

Python is being rather generous by allowing non-string keys, because the  JSON 
spec ( http://json.org/ ) only allows the keys ("names" in JSON parlance) to be 
strings. So you're already a bit in the weeds as far as compliant JSON goes if 
you have non-string keys.

Since mixed type keys lack meaningful sort order, I'm not sure it's wrong to 
reject attempts to sort them. Converting to string is as arbitrary and full of 
potential for silently incorrect comparisons as the Python 2 behavior, and 
reintroducing it seems like a bad idea.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25457] json dump fails for mixed-type keys when sort_keys is specified

2015-10-22 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Oops, minor flaw with that. It's str-ifying the tuples, not the keys, which 
could (in some cases) cause issues with keys whose reprs have different 
quoting. So you'd end up with lambdas. Boo. Anyway, corrected version (which 
would probably not be one-lined in real code):

json.dumps(collections.OrderedDict(sorted(mydict.items(), key=lambda x: 
str(x[0]

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com