Re: Noob Parsing question

2015-02-18 Thread kai . peters

   Given
  
   data = 
   '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'
  
   How can I efficiently get dictionaries for each of the data blocks 
   framed by  ?
  
   Thanks for any help
 
  The question here is: What _can't_ happen? For instance, what happens
  if Fred's name contains a greater-than symbol, or a caret?
 
  If those absolutely cannot happen, your parser can be fairly
  straight-forward. Just put together some basic splitting (maybe a
  regex), and then split on the caret inside that. Otherwise, you may
  need a more stateful parser.
 
  ChrisA
 
  The data string is guaranteed to be clean - no such irregularities occur.
 
 Okay!
 
 (Side point: You've stripped off all citations, here, so it's not
 clear who said what. My shorthand signature isn't as useful as the
 full line identifying date, time, and person. It's polite to keep
 those lines, at least for the first level of quoting.)
 
 What you want can be done with a regular expression. (Yes, yes, I
 know; now you have two problems.)
 
  data = 
  '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'
  re.findall(.*?,data)
 ['a=14^b=Fred^c=45.22^', 'a=22^b=Joe^', 'a=17^c=3.20^', 
 'a=72^b=Soup^']
 
 From there, you can crack open the different pieces:
 
  for piece in re.findall(.*?,data):
 ... d = {}
 ... for elem in piece[1:-2].split(^):
 ... key, value = elem.split(=,1)
 ... d[key] = value
 ... print(d)
 ...
 {'c': '45.22', 'b': 'Fred', 'a': '14'}
 {'b': 'Joe', 'a': '22'}
 {'c': '3.20', 'a': '17'}
 {'b': 'Soup', 'a': '72'}
 
 If you need some of those to be integers or floats, you'll need to do
 some post-processing on it, but this guarantees that you get the data
 out reliably. It depends on not having any of the special characters
 =^ inside the elements, but other than that, it should be safe.
 
 ChrisA

Thanks for your help - much appreciated!

KP
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Noob Parsing question

2015-02-17 Thread kai . peters

  Given
 
  data = 
  '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'
 
  How can I efficiently get dictionaries for each of the data blocks framed 
  by  ?
 
  Thanks for any help
 
 The question here is: What _can't_ happen? For instance, what happens
 if Fred's name contains a greater-than symbol, or a caret?
 
 If those absolutely cannot happen, your parser can be fairly
 straight-forward. Just put together some basic splitting (maybe a
 regex), and then split on the caret inside that. Otherwise, you may
 need a more stateful parser.
 
 ChrisA

The data string is guaranteed to be clean - no such irregularities occur.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Noob Parsing question

2015-02-17 Thread Chris Angelico
On Wed, Feb 18, 2015 at 3:07 PM,  kai.pet...@gmail.com wrote:
 Given

 data = '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'

 How can I efficiently get dictionaries for each of the data blocks framed by 
  ?

 Thanks for any help

The question here is: What _can't_ happen? For instance, what happens
if Fred's name contains a greater-than symbol, or a caret?

If those absolutely cannot happen, your parser can be fairly
straight-forward. Just put together some basic splitting (maybe a
regex), and then split on the caret inside that. Otherwise, you may
need a more stateful parser.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Noob Parsing question

2015-02-17 Thread Chris Angelico
On Wed, Feb 18, 2015 at 3:35 PM,  kai.pet...@gmail.com wrote:
  Given
 
  data = 
  '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'
 
  How can I efficiently get dictionaries for each of the data blocks framed 
  by  ?
 
  Thanks for any help

 The question here is: What _can't_ happen? For instance, what happens
 if Fred's name contains a greater-than symbol, or a caret?

 If those absolutely cannot happen, your parser can be fairly
 straight-forward. Just put together some basic splitting (maybe a
 regex), and then split on the caret inside that. Otherwise, you may
 need a more stateful parser.

 ChrisA

 The data string is guaranteed to be clean - no such irregularities occur.

Okay!

(Side point: You've stripped off all citations, here, so it's not
clear who said what. My shorthand signature isn't as useful as the
full line identifying date, time, and person. It's polite to keep
those lines, at least for the first level of quoting.)

What you want can be done with a regular expression. (Yes, yes, I
know; now you have two problems.)

 data = 
 '{[a=14^b=Fred^c=45.22^a=22^b=Joe^a=17^c=3.20^][a=72^b=Soup^]}'
 re.findall(.*?,data)
['a=14^b=Fred^c=45.22^', 'a=22^b=Joe^', 'a=17^c=3.20^', 'a=72^b=Soup^']

From there, you can crack open the different pieces:

 for piece in re.findall(.*?,data):
... d = {}
... for elem in piece[1:-2].split(^):
... key, value = elem.split(=,1)
... d[key] = value
... print(d)
...
{'c': '45.22', 'b': 'Fred', 'a': '14'}
{'b': 'Joe', 'a': '22'}
{'c': '3.20', 'a': '17'}
{'b': 'Soup', 'a': '72'}

If you need some of those to be integers or floats, you'll need to do
some post-processing on it, but this guarantees that you get the data
out reliably. It depends on not having any of the special characters
=^ inside the elements, but other than that, it should be safe.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list