New submission from David Barnett:

Many sloppy JSON APIs return data with poorly-encoded strings, either with 
single-quotes as delimiters:
  {'foo': 'bar'}
or with no delimiters at all on keys:
  {foo: 'bar'}

The json library is useless for making sense of this data, because all it will 
tell you is "No JSON object could be decoded".

It would be incredibly helpful if the json library had some special non-strict 
decoding mode that could interpret these common misspellings of JSON strings. 
Or, more generally, it could accept another decoder hook to reformat the 
remaining string, something like:
  def malformed_hook(remaining, parent):
      if remaining.startswith("'"):
          # We know this is a string, regex it to find the end.
          m = re.match(pyparsing.quotedString.reString, remaining)
          if m is not None:
              # Use json.dumps to add quotes around string literal.
              return json.dumps(eval(m.group(0))) + remaining[m.end():]
      # If we're inside an object, this could be a naked object key.
      if isinstance(parent, dict):
          m = re.match(r'([a-zA-Z_]\w*):', remaining)
          if m is not None:
              return json.dumps(m.group(1)) + remaining[len(m.group(1)):]
  print(json.loads("['foo', null, {a: 'b'}]", malformed_hook=malformed_hook))
  ['foo', None, {'a': 'b'}]

This would at least save you having to write a parser/tokenizer from scratch.

----------
components: Library (Lib)
messages: 208430
nosy: mu_mind
priority: normal
severity: normal
status: open
title: json library needs a non-strict option to decode single-quoted strings
type: enhancement

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20298>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to