Re: C parsing fun
Károly Kiripolszky [EMAIL PROTECTED] wrote: I've found a brute-force solution. In the preprocessing phase I simply strip out the comments (things inside comments won't appear in the result) and replace curly brackets with these symbols: #::OPEN::# and #::CLOSE::#. This fails when the code already has the strings #::OPEN::# and #::CLOSE:: in it. -- Roberto Bonvallet -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
Yes, of course. But you can still fine-tune the code for the sources you want to parse. The C++ header files I needed to analyze contained no such strings. I believe there are very few real-life .h files out there containing those. In fact I chose #::OPEN::# and #::CLOSE::# because they're more foreign to C++ like eg. ::OPEN or #OPEN would be. I hope this makes sense. :) Roberto Bonvallet írta: Károly Kiripolszky [EMAIL PROTECTED] wrote: I've found a brute-force solution. In the preprocessing phase I simply strip out the comments (things inside comments won't appear in the result) and replace curly brackets with these symbols: #::OPEN::# and #::CLOSE::#. This fails when the code already has the strings #::OPEN::# and #::CLOSE:: in it. -- Roberto Bonvallet -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
Helo again! When I came up with this idea on how to parse C files with ease, I was at home and I only have access to the sources in subject in the office. So I've tried the previously posted algorithm on the actual source today and I realized my originally example data I've ran the test with was so simple, that with some header files the algorithm still failed. I had to make some further changes and by now I was able to parse 1135 header files in 5 seconds with no errors. This may be considered as spamming, but this package is so small I don't wan't to create a page for it on SF or Google Code. Furthermore I want to provide people who find this topic a working solution, so here's the latest not-so-elegant-brute-force-but-fast parser: http://kiri.csing.hu/stack/python/bloppy-0.3.zip On Feb 5, 1:43 pm, karoly.kiripolszky [EMAIL PROTECTED] wrote: Helo ppl! At the job I was given the task to make a script to analyze C++ code based on concepts my boss had. To do this I needed to represent C++ code structure in Python somehow. I read the docs for Yapps, pyparsing and other stuff like those, then I came up with a very simple idea. I realized that bracketed code is almost like a Python list, except I have to replace curly brackets with squared ones and surround the remaining stuff with quotes. This process invokes no recursion or node objects, only pure string manipulations so I believe it's really fast. Finally I can get the resulting list by calling eval() with the string. For example when I need to parse a class definition, I only need to look for a list item containing the pattern *class*, and the next item will be the contents of the class as another list. You can grab the code at: http://kiri.csing.hu/stack/python/bloppy-0.1.zip (test script [test.py] included) -- http://mail.python.org/mailman/listinfo/python-list
C parsing fun
Helo ppl! At the job I was given the task to make a script to analyze C++ code based on concepts my boss had. To do this I needed to represent C++ code structure in Python somehow. I read the docs for Yapps, pyparsing and other stuff like those, then I came up with a very simple idea. I realized that bracketed code is almost like a Python list, except I have to replace curly brackets with squared ones and surround the remaining stuff with quotes. This process invokes no recursion or node objects, only pure string manipulations so I believe it's really fast. Finally I can get the resulting list by calling eval() with the string. For example when I need to parse a class definition, I only need to look for a list item containing the pattern *class*, and the next item will be the contents of the class as another list. You can grab the code at: http://kiri.csing.hu/stack/python/bloppy-0.1.zip (test script [test.py] included) -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
based on concepts my boss had. To do this I needed to represent C++ code structure in Python somehow. I read the docs for Yapps, pyparsing and other stuff like those, then I came up with a very simple idea. I realized that bracketed code is almost like a Python list, except I have to replace curly brackets with squared ones and surround the remaining stuff with quotes. This process invokes no recursion or node yes that's a nice solution sometimes it's not enough though (won't work on code obfuscated with macros) anyway if you need something more sophisticated then i'd recommend gccxml or it's python binding: http://www.language-binding.net/pygccxml/pygccxml.html -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
Thx for responding, Szabolcs! I've already tried that, but couldn't manage to get it to work. The source I tried to parse is a huge MSVC 7.1 solution containing about 38 projects, and I believe the code is so complex that it has too many different dependencies and GCC just can't handle them. Btw I'm not deeply familiar with C++ compilers, so maybe it was because of compiler misconfiguration, but I really don't know... Szabolcs Nagy írta: based on concepts my boss had. To do this I needed to represent C++ code structure in Python somehow. I read the docs for Yapps, pyparsing and other stuff like those, then I came up with a very simple idea. I realized that bracketed code is almost like a Python list, except I have to replace curly brackets with squared ones and surround the remaining stuff with quotes. This process invokes no recursion or node yes that's a nice solution sometimes it's not enough though (won't work on code obfuscated with macros) anyway if you need something more sophisticated then i'd recommend gccxml or it's python binding: http://www.language-binding.net/pygccxml/pygccxml.html -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
Marc 'BlackJack' Rintsch írta: In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
You're right, thank you for the comment! I will look after how to avoid this. Marc 'BlackJack' Rintsch írta: In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
Károly Kiripolszky wrote: You're right, thank you for the comment! I will look after how to avoid this. And after you have resolved this 'small' ;-) detail you will probably notice, that some full functional and in wide use being parser have still trouble with this ... Claudio Marc 'BlackJack' Rintsch írta: In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
I've found a brute-force solution. In the preprocessing phase I simply strip out the comments (things inside comments won't appear in the result) and replace curly brackets with these symbols: #::OPEN::# and #::CLOSE::#. After parsing I convert them back. In fact I can disclude commented lines from the analyzis as I only have to cope with production code. Claudio Grondi írta: Károly Kiripolszky wrote: You're right, thank you for the comment! I will look after how to avoid this. And after you have resolved this 'small' ;-) detail you will probably notice, that some full functional and in wide use being parser have still trouble with this ... Claudio Marc 'BlackJack' Rintsch írta: In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: C parsing fun
http://kiri.csing.hu/stack/python/bloppy-0.2.zip Test data now also contains brackets in literal strings. Claudio Grondi írta: Károly Kiripolszky wrote: You're right, thank you for the comment! I will look after how to avoid this. And after you have resolved this 'small' ;-) detail you will probably notice, that some full functional and in wide use being parser have still trouble with this ... Claudio Marc 'BlackJack' Rintsch írta: In [EMAIL PROTECTED], karoly.kiripolszky wrote: and the great thing is that the algorithm can be used with any language that structures the code with brackets, like PHP and many others. But it fails if brackets appear in comments or literal strings. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list