On 2007-12-21, lex __ <[EMAIL PROTECTED]> wrote: > I'm tryin to use regexp to replace multi-line c-style comments > (like /* this /n */ ) with /n (newlines). I tried someting > like re.sub('/\*(.*)/\*' , '/n' , file) but it doesn't > work for multiple lines. > > besides that I want to keep all newlines as they were in the > original file, so I can still use the original linenumbers (I > want to use linenumbers as a reference for later use.) I know > that that will complicate things a bit more, so this is a bit > less important. > > background: I'm trying to create a 'intelligent' source-code > security analysis tool for c/c++ , python and php files, but > filtering the comments seems to be the biggest problem. :( > > So, if you have an answer to this , please let me know how to > do this!
There are free C lexers and parsers available (e.g., gcc). I recommend them to you. Gluing a real C parser into your Python code might be easier than writing one. Not that it's impossible to discover C comments with your own special-purpose, simple parser (see Exercise 1-23 in K&R _The C Programming Language 2nd Edition_), but it's not remotely doable with a regex. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list