Re: Relaxed, or best-efforts JSON parser for Python?
On Monday, October 12, 2015 at 10:02:13 PM UTC+11, Laura Creighton wrote: > In a message of Sun, 11 Oct 2015 17:56:33 -0700, Victor Hooi writes: > >Hi, > > > >I'm attempting to parse MongoDB loglines. > > > >The formatting of these loglines could best be described as JSON-like... > > > >For example - arrays > > > >Anyhow, say I had the following logline snippet: > > > >{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { > > w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { > > w: 1 } } } > > > >This won't parse with json.loads() - the main issues is the missing > >quotation marks (") around the strings. > > > >My question, is there a more lenient, or relaxed JSON parser available for > >Python, that will try to do a best-efforts parsing of non-spec JSON? > > > >Cheers, > >Victor > >-- > >https://mail.python.org/mailman/listinfo/python-list > > Won't this > http://blog.mongodb.org/post/85123256973/introducing-mtools > https://github.com/rueckstiess/mtools > https://pypi.python.org/pypi/mtools/1.1.3 > > be better? :) Hi, @MRAB - Thanks for the tip. I did actually think of doing that as well - it's what we (MongoDB) do internally for a few of our tools, but was really hoping to avoid going down the regex route. However, this is what I'm doing for now: locks = re.sub(r"(\w+):", "\"\g<1>\":", locks) @Random832 - No, it's not YAML. The MongoDB log format issort of JSON, but not. IMHO ,it's a bit of an ugly mess. So things like string fields aren't quoted, you have random custom types, parentheses aren't necessarily balanced (e.g. if you have long loglines that get truncated at 10K characters etc.). I could go on. @Laura Creighton - Yup, mtools is actually written by a colleague of mine =). Awesome guy. He does a lot of stuff to work around the idiosyncrasies of the MongoDB log format. However, there's quite a bit of overhead to using the full module for this - for this use case, I just needed to parse a specific "locks" document from a logline, so I was hoping for a clean way to just take it and parse it - in this case, the only issue that could hit us (AFAIK) is the lack of quotes around string fields. If they ever introduced a field with spaces in itI don't know what would happen, lol. -- https://mail.python.org/mailman/listinfo/python-list
Re: Relaxed, or best-efforts JSON parser for Python?
In a message of Sun, 11 Oct 2015 17:56:33 -0700, Victor Hooi writes: >Hi, > >I'm attempting to parse MongoDB loglines. > >The formatting of these loglines could best be described as JSON-like... > >For example - arrays > >Anyhow, say I had the following logline snippet: > >{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: > 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 > } } } > >This won't parse with json.loads() - the main issues is the missing quotation >marks (") around the strings. > >My question, is there a more lenient, or relaxed JSON parser available for >Python, that will try to do a best-efforts parsing of non-spec JSON? > >Cheers, >Victor >-- >https://mail.python.org/mailman/listinfo/python-list Won't this http://blog.mongodb.org/post/85123256973/introducing-mtools https://github.com/rueckstiess/mtools https://pypi.python.org/pypi/mtools/1.1.3 be better? :) -- https://mail.python.org/mailman/listinfo/python-list
Re: Relaxed, or best-efforts JSON parser for Python?
Victor Hooi writes: > My question, is there a more lenient, or relaxed JSON parser available > for Python, that will try to do a best-efforts parsing of non-spec > JSON? In an answer to a similar question on StackExchange, using YAML was suggested. http://stackoverflow.com/questions/9104930 Is it possible that this format is in fact YAML? It does have the spaces after each colon as mentioned in a comment, and it seems more likely to me than that a major package like MongoDB aimed for JSON and missed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Relaxed, or best-efforts JSON parser for Python?
On 2015-10-12 01:56, Victor Hooi wrote: Hi, I'm attempting to parse MongoDB loglines. The formatting of these loglines could best be described as JSON-like... For example - arrays Anyhow, say I had the following logline snippet: { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } This won't parse with json.loads() - the main issues is the missing quotation marks (") around the strings. My question, is there a more lenient, or relaxed JSON parser available for Python, that will try to do a best-efforts parsing of non-spec JSON? Have you tried first adding the quotes using the re module? >>> import json, re >>> line = '{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }' >>> json.loads(re.sub(r'(\w+)', r'"\1"', line)) -- https://mail.python.org/mailman/listinfo/python-list