On Feb 12, 6:41 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote: > En Fri, 12 Feb 2010 10:41:40 -0300, Eknath Venkataramani > <eknath.i...@gmail.com> escribió: > > > I am trying to write a parser in pyparsing. > > Help Me.http://paste.pocoo.org/show/177078/is the code and this is > > input > > file:http://paste.pocoo.org/show/177076/. > > I get output as: > > <generator object at 0xb723b80c> > > There is nothing wrong with pyparsing here. scanString() returns a > generator, like this: > > py> g = (x for x in range(20) if x % 3 == 1) > py> g > <generator object <genexpr> at 0x00E50D78> > Unfortunately, your grammar doesn't match the input text, so your generator doesn't return anything.
I think you are taking sort of brute force approach to this problem, and you need to think a little more abstractly. You can't just pick a fragment and then write an expression for it, and then the next and then stitch them together - well you *can* but it helps to think both abstract and concrete at the same time. With the exception of your one key of "\'", this is a pretty basic recursive grammar. Recursive grammars are a little complicated to start with, so I'll start with a non-recursive part. And I'll work more bottom-up or inside-out. Let's start by looking at these items: count => 8, baajaar => 0.87628353, kiraae => 0.02341598, lii => 0.02178813, adr => 0.01978462, gyiimn => 0.01765590, Each item has a name (which you called "eng", so I'll keep that expression), a '=>' and *something*. In the end, we won't really care about the '=>' strings, they aren't really part of the keys or the associated values, they are just delimiting strings - they are important during parsing, but afterwards we don't really care about them. So we'll start with a pyparsing expression for this: keyval = eng + Suppress('=>') + something Sometimes, the something is an integer, sometimes it's a floating point number. I'll define some more generic forms for these than your original number, and a separate expression for a real number: integer = Combine(Optional('-') + Word(nums)) realnum = Combine(Optional('-') + Word(nums) + '.' + Word(nums)) When we parse for these two, we need to be careful to check for a realnum before an integer, so that we don't accidentally parse the leading of "3.1415" as the integer "3". something = realnum | integer So now we can parse this fragment using a delimitedList expression (which takes care of the intervening commas, and also suppresses them from the results: filedata = """ count => 8, baajaar => 0.87628353, kiraae => 0.02341598, lii => 0.02178813, adr => 0.01978462, gyiimn => 0.01765590,""" print delimitedList(keyval).parseString(filedata) Gives: ['count', '8', 'baajaar', '0.87628353', 'kiraae', '0.02341598', 'lii', '0.02178813', 'adr', '0.01978462', 'gyiimn', '0.01765590'] Right off the bat, we see that we want a little more structure to these results, so that the keys and values are grouped naturally by the parser. The easy way to do this is with Group, as in: keyval = Group(eng + Suppress('=>') + something) With this one change, we now get: [['count', '8'], ['baajaar', '0.87628353'], ['kiraae', '0.02341598'], ['lii', '0.02178813'], ['adr', '0.01978462'], ['gyiimn', '0.01765590']] Now we need to add the recursive part of your grammar. A nested input looks like: confident => { count => 4, trans => { ashhvsht => 0.75100505, phraarmnbh => 0.08341708, }, }, So in addition to integers and reals, our "something" could also be a nested list of keyvals: something = realnum | integer | (lparen + delimitedList(keyval) + rparen) This is *almost* right, with just a couple of tweaks: - the list of keyvals may have a comma after the last item before the closing '}' - we really want to suppress the opening and closing braces (lparen and rparen) - for similar structure reasons, we'll enclose the list of keyvals in a Group to retain the data hierarchy lparen,rparen = map(Suppress, "{}") something = realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen) The recursive problem is that we have defined keyval using something, and something using keyval. You can't do that in Python. So we use the pyparsing class Forward to "forward" declare something: something = Forward() keyval = Group(eng + Suppress('=>') + something) To define something as a Forward, we use the '<<' shift operator: something << (realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen)) Our grammar now looks like: lparen,rparen = map(Suppress, "{}") something = Forward() keyval = Group(eng + Suppress('=>') + something) something << (realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen)) To parse your entire input file, use a delimitedList(keyval) results = delimitedList(keyval).parseString(filedata) (There is one problem - one of your keynames is "\'". I don't know if this is a typo or intentional. If you need to accommodate even this as a keyname, just change your definition of eng to Word(alphas +r"\'").) Now if I parse your original string, I get (using the pprint module to format the results): [['markets', [['count', '8'], ['trans', [['baajaar', '0.87628353'], ['kiraae', '0.02341598'], ['lii', '0.02178813'], ['adr', '0.01978462'], ['gyiimn', '0.01765590'], ['baaaaromn', '0.01765590'], ['sdk', '0.01728024'], ['kaanuun', '0.00613574'], ',']], ',']], ['confident', [['count', '4'], ['trans', [['ashhvsht', '0.75100505'], ['phraarmnbh', '0.08341708'], ['athmvishhvaas', '0.08090452'], ['milte', '0.03768845'], ['utnii', '0.02110553'], ['anaa', '0.01432161'], ['jitne', '0.01155779'], ',']], ',']], ['consumers', [['count', '34'], ['trans', [['upbhokhtaaomn', '0.48493883'], ['upbhokhtaa', '0.27374792'], ['zrurtomn', '0.02753605'], ['suuchnaa', '0.02707965'], ['ghraahkomn', '0.02580174'], ['ne', '0.02574089'], ["\\'", '0.01947301'], ['jnmt', '0.01527414'], ',']], ',']]] But there is one more card up pyparsing's sleeve. Just as your original parser used "english" to apply a results name to your keys, it would be nice if our parser would return not a list of key-value pairs, but an actual dict-like object. Pyparsing's Dict class enhances the results in just this way. Use Dict to wrap our repetitive structures, and it will automatically define results names for us, reading the first element of each group as the key, and the remaining items in the group as the value: something << (realnum | integer | Dict(lparen + delimitedList(keyval) + Optional(',').suppress() + rparen)) results = Dict(delimitedList(keyval)).parseString(filedata) print results.dump() Gives this hierarchical structure: - confident: - count: 4 - trans: - anaa: 0.01432161 - ashhvsht: 0.75100505 - athmvishhvaas: 0.08090452 - jitne: 0.01155779 - milte: 0.03768845 - phraarmnbh: 0.08341708 - utnii: 0.02110553 - consumers: - count: 34 - trans: - \': 0.01947301 - ghraahkomn: 0.02580174 - jnmt: 0.01527414 - ne: 0.02574089 - suuchnaa: 0.02707965 - upbhokhtaa: 0.27374792 - upbhokhtaaomn: 0.48493883 - zrurtomn: 0.02753605 - markets: - count: 8 - trans: - adr: 0.01978462 - baaaaromn: 0.01765590 - baajaar: 0.87628353 - gyiimn: 0.01765590 - kaanuun: 0.00613574 - kiraae: 0.02341598 - lii: 0.02178813 - sdk: 0.01728024 You can access these fields by name like dict elements: print results.keys() print results["confident"].keys() print results["confident"]["trans"]["jitne"] If the names are valid Python identifiers (which "\'" is *not*), you can access their fields like attributes of an object: print results.confident.trans.jitne for k in results.keys(): print k, results[k].count Prints: ['confident', 'markets', 'consumers'] ['count', 'trans'] 0.01155779 0.01155779 confident 4 markets 8 consumers 34 I've posted the full program at http://pyparsing.pastebin.com/f1d0e2182. Welcome to pyparsing! -- Paul -- http://mail.python.org/mailman/listinfo/python-list