On Aug 8, 10:59 am, Thomas Jollans <tho...@jollans.com> wrote: > On 08/08/2010 04:06 PM, Νίκος wrote: > > > > > > > On 8 Αύγ, 15:40, Thomas Jollans <tho...@jollans.com> wrote: > >> On 08/08/2010 01:41 PM, Νίκος wrote: > > >>> I was so dizzy and confused yesterday that i forgot to metnion that > >>> not only i need removal of php openign and closing tags but whaevers > >>> data lurks inside those tags as well ebcause now with the 'counter.py' > >>> script i wrote the html fiels would open ftm there and substitute the > >>> tempalte variabels like %(counter)d > > >> I could just hand you a solution, but I'll be a bit of a bastard and > >> just give you some hints. > > >> You could use regular expressions. If you know regular expressions, it's > >> relatively trivial - but I doubt you know regexp. > > > Here is the code with some try-and-fail modification i made, still non- > > working based on your hints: > > ========================================================== > > > id = 0 # unique page_id > > > for currdir, files, dirs in os.walk('varsa'): > > > for f in files: > > > if f.endswith('php'): > > > # get abs path to filename > > src_f = join(currdir, f) > > > # open php src file > > print 'reading from %s' % src_f > > f = open(src_f, 'r') > > src_data = f.read() # read contents of PHP file > > f.close() > > > # replace tags > > print 'replacing php tags and contents within' > > src_data = src_data.replace(r'<?.?>', '') # > > the dot matches any character i hope! no matter how many of them?!? > > Two problems here: > > str.replace doesn't use regular expressions. You'll have to use the re > module to use regexps. (the re.sub function to be precise) > > '.' matches a single character. Any character, but only one. > '.*' matches as many characters as possible. This is not what you want, > since it will match everything between the *first* <? and the *last* ?>. > You want non-greedy matching. > > '.*?' is the same thing, without the greed. > > > > > # add ID > > print 'adding unique page_id' > > src_data = ( '<!-- %d -->' % id ) + src_data > > id += 1 > > > # add template variables > > print 'adding counter template variable' > > src_data = src_data + ''' <h4><font color=green> Αριθμός > > Επισκεπτών: %(counter)d </font></h4> ''' > > # i can think of this but the above line must be above </ > > body></html> NOT after but how to right that?!? > > You will have to find the </body> tag before inserting the string. > str.find should help -- or you could use str.replace and replace the > </body> tag with you counter line, plus a new </body>. > > > > > # rename old php file to new with .html extension > > src_file = src_file.replace('.php', '.html') > > > # open newly created html file for inserting data > > print 'writing to %s' % dest_f > > dest_f = open(src_f, 'w') > > dest_f.write(src_data) # write contents > > dest_f.close() > > > This is the best i can do. > > No it's not. You're just giving up too soon.
When replacing text in an HTML document with re.sub, you want to use the re.S (singleline) option; otherwise your pattern won't match when the opening tag is on one line and the closing is on another. -- http://mail.python.org/mailman/listinfo/python-list