Re: Relaxed, or best-efforts JSON parser for Python?

2015-10-12 Thread victor . hooi
On Monday, October 12, 2015 at 10:02:13 PM UTC+11, Laura Creighton wrote:
> In a message of Sun, 11 Oct 2015 17:56:33 -0700, Victor Hooi writes:
> >Hi,
> >
> >I'm attempting to parse MongoDB loglines.
> >
> >The formatting of these loglines could best be described as JSON-like...
> >
> >For example - arrays 
> >
> >Anyhow, say I had the following logline snippet:
> >
> >{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { 
> > w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { 
> > w: 1 } } }
> >
> >This won't parse with json.loads() - the main issues is the missing 
> >quotation marks (") around the strings.
> >
> >My question, is there a more lenient, or relaxed JSON parser available for 
> >Python, that will try to do a best-efforts parsing of non-spec JSON?
> >
> >Cheers,
> >Victor
> >-- 
> >https://mail.python.org/mailman/listinfo/python-list
> 
> Won't this 
> http://blog.mongodb.org/post/85123256973/introducing-mtools
> https://github.com/rueckstiess/mtools
> https://pypi.python.org/pypi/mtools/1.1.3
> 
> be better? :)

Hi,

@MRAB - Thanks for the tip. I did actually think of doing that as well - it's 
what we (MongoDB) do internally for a few of our tools, but was really hoping 
to avoid going down the regex route. However, this is what I'm doing for now:

locks = re.sub(r"(\w+):", "\"\g<1>\":", locks)

@Random832 - No, it's not YAML. The MongoDB log format issort of JSON, but 
not. IMHO ,it's a bit of an ugly mess. So things like string fields aren't 
quoted, you have random custom types, parentheses aren't necessarily balanced 
(e.g. if you have long loglines that get truncated at 10K characters etc.). I 
could go on.

@Laura Creighton - Yup, mtools is actually written by a colleague of mine =). 
Awesome guy. He does a lot of stuff to work around the idiosyncrasies of the 
MongoDB log format. However, there's quite a bit of overhead to using the full 
module for this - for this use case, I just needed to parse a specific "locks" 
document from a logline, so I was hoping for a clean way to just take it and 
parse it - in this case, the only issue that could hit us (AFAIK) is the lack 
of quotes around string fields. If they ever introduced a field with spaces in 
itI don't know what would happen, lol.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Relaxed, or best-efforts JSON parser for Python?

2015-10-12 Thread Laura Creighton
In a message of Sun, 11 Oct 2015 17:56:33 -0700, Victor Hooi writes:
>Hi,
>
>I'm attempting to parse MongoDB loglines.
>
>The formatting of these loglines could best be described as JSON-like...
>
>For example - arrays 
>
>Anyhow, say I had the following logline snippet:
>
>{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 
> 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 
> } } }
>
>This won't parse with json.loads() - the main issues is the missing quotation 
>marks (") around the strings.
>
>My question, is there a more lenient, or relaxed JSON parser available for 
>Python, that will try to do a best-efforts parsing of non-spec JSON?
>
>Cheers,
>Victor
>-- 
>https://mail.python.org/mailman/listinfo/python-list

Won't this 
http://blog.mongodb.org/post/85123256973/introducing-mtools
https://github.com/rueckstiess/mtools
https://pypi.python.org/pypi/mtools/1.1.3

be better? :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Relaxed, or best-efforts JSON parser for Python?

2015-10-11 Thread Random832
Victor Hooi  writes:

> My question, is there a more lenient, or relaxed JSON parser available
> for Python, that will try to do a best-efforts parsing of non-spec
> JSON?

In an answer to a similar question on StackExchange, using YAML was
suggested.

http://stackoverflow.com/questions/9104930

Is it possible that this format is in fact YAML? It does have the spaces
after each colon as mentioned in a comment, and it seems more likely to
me than that a major package like MongoDB aimed for JSON and missed. 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Relaxed, or best-efforts JSON parser for Python?

2015-10-11 Thread MRAB

On 2015-10-12 01:56, Victor Hooi wrote:

Hi,

I'm attempting to parse MongoDB loglines.

The formatting of these loglines could best be described as JSON-like...

For example - arrays

Anyhow, say I had the following logline snippet:

 { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 
2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } 
} }

This won't parse with json.loads() - the main issues is the missing quotation marks 
(") around the strings.

My question, is there a more lenient, or relaxed JSON parser available for 
Python, that will try to do a best-efforts parsing of non-spec JSON?


Have you tried first adding the quotes using the re module?

>>> import json, re
>>> line = '{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { 
acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: 
{ acquireCount: { w: 1 } } }'

>>> json.loads(re.sub(r'(\w+)', r'"\1"', line))

--
https://mail.python.org/mailman/listinfo/python-list