Re: C parsing fun

2007-02-08 Thread Roberto Bonvallet
Károly Kiripolszky [EMAIL PROTECTED] wrote:
 I've found a brute-force solution. In the preprocessing phase I simply
 strip out the comments (things inside comments won't appear in the
 result) and replace curly brackets with these symbols: #::OPEN::# and
 #::CLOSE::#.

This fails when the code already has the strings #::OPEN::# and
#::CLOSE:: in it.

-- 
Roberto Bonvallet
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-08 Thread Károly Kiripolszky
Yes, of course. But you can still fine-tune the code for the sources
you want to parse. The C++ header files I needed to analyze contained
no such strings. I believe there are very few real-life .h files out
there containing those. In fact I chose #::OPEN::# and #::CLOSE::#
because they're more foreign to C++ like eg. ::OPEN or #OPEN would be.
I hope this makes sense. :)

Roberto Bonvallet írta:
 Károly Kiripolszky [EMAIL PROTECTED] wrote:
  I've found a brute-force solution. In the preprocessing phase I simply
  strip out the comments (things inside comments won't appear in the
  result) and replace curly brackets with these symbols: #::OPEN::# and
  #::CLOSE::#.

 This fails when the code already has the strings #::OPEN::# and
 #::CLOSE:: in it.

 --
 Roberto Bonvallet

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-06 Thread Károly Kiripolszky
Helo again!

When I came up with this idea on how to parse C files with ease, I was
at home and I only have access to the sources in subject in the
office. So I've tried the previously posted algorithm on the actual
source today and I realized my originally example data I've ran the
test with was so simple, that with some header files the algorithm
still failed. I had to make some further changes and by now I was able
to parse 1135 header files in 5 seconds with no errors.

This may be considered as spamming, but this package is so small I
don't wan't to create a page for it on SF or Google Code. Furthermore
I want to provide people who find this topic a working solution, so
here's the latest not-so-elegant-brute-force-but-fast parser:

http://kiri.csing.hu/stack/python/bloppy-0.3.zip

On Feb 5, 1:43 pm, karoly.kiripolszky [EMAIL PROTECTED]
wrote:
 Helo ppl!

 At the job I was given the task to make a script to analyze C++ code
 based on concepts my boss had. To do this I needed to represent C++
 code structure in Python somehow. I read the docs for Yapps, pyparsing
 and other stuff like those, then I came up with a very simple idea. I
 realized that bracketed code is almost like a Python list, except I
 have to replace curly brackets with squared ones and surround the
 remaining stuff with quotes. This process invokes no recursion or node
 objects, only pure string manipulations so I believe it's really fast.
 Finally I can get the resulting list by calling eval() with the
 string.

 For example when I need to parse a class definition, I only need to
 look for a list item containing the pattern *class*, and the next
 item will be the contents of the class as another list.

 You can grab the code at:

 http://kiri.csing.hu/stack/python/bloppy-0.1.zip

 (test script [test.py] included)


-- 
http://mail.python.org/mailman/listinfo/python-list


C parsing fun

2007-02-05 Thread karoly.kiripolszky
Helo ppl!

At the job I was given the task to make a script to analyze C++ code
based on concepts my boss had. To do this I needed to represent C++
code structure in Python somehow. I read the docs for Yapps, pyparsing
and other stuff like those, then I came up with a very simple idea. I
realized that bracketed code is almost like a Python list, except I
have to replace curly brackets with squared ones and surround the
remaining stuff with quotes. This process invokes no recursion or node
objects, only pure string manipulations so I believe it's really fast.
Finally I can get the resulting list by calling eval() with the
string.

For example when I need to parse a class definition, I only need to
look for a list item containing the pattern *class*, and the next
item will be the contents of the class as another list.

You can grab the code at:

http://kiri.csing.hu/stack/python/bloppy-0.1.zip

(test script [test.py] included)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread karoly.kiripolszky
and the great thing is that the algorithm can be used with any
language that structures the code with brackets, like PHP and many
others.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread Szabolcs Nagy
 based on concepts my boss had. To do this I needed to represent C++
 code structure in Python somehow. I read the docs for Yapps, pyparsing
 and other stuff like those, then I came up with a very simple idea. I
 realized that bracketed code is almost like a Python list, except I
 have to replace curly brackets with squared ones and surround the
 remaining stuff with quotes. This process invokes no recursion or node

yes that's a nice solution
sometimes it's not enough though (won't work on code obfuscated with
macros)

anyway if you need something more sophisticated then i'd recommend
gccxml or it's python binding:

http://www.language-binding.net/pygccxml/pygccxml.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread Károly Kiripolszky
Thx for responding, Szabolcs! I've already tried that, but couldn't
manage to get it to work. The source I tried to parse is a huge MSVC
7.1 solution containing about 38 projects, and I believe the code is
so complex that it has too many different dependencies and GCC just
can't handle them. Btw I'm not deeply familiar with C++ compilers, so
maybe it was because of compiler misconfiguration, but I really don't
know...

Szabolcs Nagy írta:
  based on concepts my boss had. To do this I needed to represent C++
  code structure in Python somehow. I read the docs for Yapps, pyparsing
  and other stuff like those, then I came up with a very simple idea. I
  realized that bracketed code is almost like a Python list, except I
  have to replace curly brackets with squared ones and surround the
  remaining stuff with quotes. This process invokes no recursion or node

 yes that's a nice solution
 sometimes it's not enough though (won't work on code obfuscated with
 macros)

 anyway if you need something more sophisticated then i'd recommend
 gccxml or it's python binding:

 http://www.language-binding.net/pygccxml/pygccxml.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: C parsing fun

2007-02-05 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED],
karoly.kiripolszky wrote:

 and the great thing is that the algorithm can be used with any
 language that structures the code with brackets, like PHP and many
 others.

But it fails if brackets appear in comments or literal strings.

Ciao,
Marc 'BlackJack' Rintsch

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread Károly Kiripolszky

Marc 'BlackJack' Rintsch írta:
 In [EMAIL PROTECTED],
 karoly.kiripolszky wrote:

  and the great thing is that the algorithm can be used with any
  language that structures the code with brackets, like PHP and many
  others.

 But it fails if brackets appear in comments or literal strings.

 Ciao,
   Marc 'BlackJack' Rintsch

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: C parsing fun

2007-02-05 Thread Károly Kiripolszky
You're right, thank you for the comment! I will look after how to
avoid this.

Marc 'BlackJack' Rintsch írta:
 In [EMAIL PROTECTED],
 karoly.kiripolszky wrote:

  and the great thing is that the algorithm can be used with any
  language that structures the code with brackets, like PHP and many
  others.

 But it fails if brackets appear in comments or literal strings.

 Ciao,
   Marc 'BlackJack' Rintsch

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: C parsing fun

2007-02-05 Thread Claudio Grondi
Károly Kiripolszky wrote:
 You're right, thank you for the comment! I will look after how to
 avoid this.
And after you have resolved this 'small' ;-) detail you will probably 
notice, that some full functional and in wide use being parser have 
still trouble with this ...

Claudio
 
 Marc 'BlackJack' Rintsch írta:
 In [EMAIL PROTECTED],
 karoly.kiripolszky wrote:

 and the great thing is that the algorithm can be used with any
 language that structures the code with brackets, like PHP and many
 others.
 But it fails if brackets appear in comments or literal strings.

 Ciao,
  Marc 'BlackJack' Rintsch
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread Károly Kiripolszky
I've found a brute-force solution. In the preprocessing phase I simply
strip out the comments (things inside comments won't appear in the
result) and replace curly brackets with these symbols: #::OPEN::# and
#::CLOSE::#. After parsing I convert them back. In fact I can disclude
commented lines from the analyzis as I only have to cope with
production code.

Claudio Grondi írta:
 Károly Kiripolszky wrote:
  You're right, thank you for the comment! I will look after how to
  avoid this.
 And after you have resolved this 'small' ;-) detail you will probably
 notice, that some full functional and in wide use being parser have
 still trouble with this ...

 Claudio
 
  Marc 'BlackJack' Rintsch írta:
  In [EMAIL PROTECTED],
  karoly.kiripolszky wrote:
 
  and the great thing is that the algorithm can be used with any
  language that structures the code with brackets, like PHP and many
  others.
  But it fails if brackets appear in comments or literal strings.
 
  Ciao,
 Marc 'BlackJack' Rintsch
 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C parsing fun

2007-02-05 Thread Károly Kiripolszky
http://kiri.csing.hu/stack/python/bloppy-0.2.zip

Test data now also contains brackets in literal strings.

Claudio Grondi írta:
 Károly Kiripolszky wrote:
  You're right, thank you for the comment! I will look after how to
  avoid this.
 And after you have resolved this 'small' ;-) detail you will probably
 notice, that some full functional and in wide use being parser have
 still trouble with this ...

 Claudio
 
  Marc 'BlackJack' Rintsch írta:
  In [EMAIL PROTECTED],
  karoly.kiripolszky wrote:
 
  and the great thing is that the algorithm can be used with any
  language that structures the code with brackets, like PHP and many
  others.
  But it fails if brackets appear in comments or literal strings.
 
  Ciao,
 Marc 'BlackJack' Rintsch
 

-- 
http://mail.python.org/mailman/listinfo/python-list