Re: Replace and inserting strings within .txt files with the use of regex
On 10 Αύγ, 01:43, MRAB wrote: > Íßêïò wrote: > > D:\>convert.py > > File "D:\convert.py", line 34 > > SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line > > 34, but no > > encoding declared; seehttp://www.python.org/peps/pep-0263.htmlfor > > details > > > D:\> > > > What does it refering too? what character cannot be identified? > > > Line 34 is: > > > src_data = src_data.replace( '', ' > color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d ' ) > > Didn't you say that you're using Python 2.7 now? The default file > encoding will be ASCII, but your file isn't ASCII, it contains Greek > letters. Add the encoding line: > > # -*- coding: utf-8 -*- > > and check that the file is saved as UTF-8. > > > Also, > > > for currdir, files, dirs in os.walk('test'): > > > for f in files: sctually its for currdir, dirs, filesin os.walk('test'): thats whay ti couldnt run!! :-) After changifn this and made some other modification my convertion script finally run! Here it is for someone that might want a similar functionality. == #!/usr/bin/python # -*- coding: utf-8 -*- import re, os, sys count = 520 for currdir, dirs, files in os.walk('d:\\akis'): for f in files: if f.lower().endswith("php"): # get abs path to filename src_f = os.path.join(currdir, f) # open php src file f = open(src_f, 'r') src_data = f.read() f.close() # Grab the id number contained within the php code and insert it above all other data found = re.search( r'PageID = (\d+)', src_data ) if found: id = found.group(1) else: id = count =+ 1 src_data = ( '\n\n' % id ) + src_data # replace php tags and contents within src_data = re.sub( r'(?s)<\?(.*?)\?>', '', src_data ) # add template variables src_data = src_data.replace( '', ' Αριθμός Επισκεπτών: %(counter)d ' ) # open same php file for storing modified data f = open(src_f, 'w') f.write(src_data) f.close() # rename edited .php file to .html extension dst_f = src_f.replace('.php', '.html') os.rename( src_f, dst_f ) print ( "renaming: %s => %s\n" % (src_f, dst_f) ) -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 10 Αύγ, 18:12, MRAB wrote: > Νίκος wrote: > > [snip] > > > > > > > The ID number of each php page was contained in the old php code > > within this string > > > PageID = some_number > > > So instead of create a new ID number for eaqch page i have to pull out > > this number to store to the beginnign to the file as comment line, > > because it has direct relationship with the mysql database as in > > tracking the number of each webpage and finding the counter of it. > > > # Grab the PageID contained within the php code and store it in id > > variable > > id = re.search( 'PageID = ', src_data ) > > > How to tell Python to Grab that number after 'PageID = ' string and to > > store it in var id that a later use in the program? > > If the part of the file you're trying to match look like this: > > PageID = 12 > > then the regex should look like this: > > PageID = (\d+) > > and the code should look like this: > > page_id = re.search(r'PageID = (\d+)', src_data).group(1) > > The page_id will, of course, be a string. > Thank you very much for helping me with the syntax. > > also i made another changewould something like this work: > > > === > > # open same php file for storing modified data > > print ( 'writing to %s' % dest_f ) > > f = open(src_f, 'w') > > f.write(src_data) > > f.close() > > > # rename edited .php file to .html extension > > dst_f = src_f.replace('.php', '.html') > > os.rename( src_f, dst_f ) > > === > > > Because instead of creating a new .html file and inserting the desired > > data of the old php thus having two files(old php, and new html) i > > decided to open the same php file for writing that data and then > > rename it to html. > > Would the above code work? > > Why wouldn't it? I though i was perhaps did something wrong with the code. = for currdir, files, dirs in os.walk('d:\\test'): # neither 'd:/test' tracks the folder for f in files: if f.lower().endswith("php"): print currdir, files, dirs, f = As you advised me in a previous post of yours i need to find out why the converting code although works for a single file doesn't for some reason enter folders and subfolders to grab files form there to convert. So as you said i should comment all other statements to find out the culprit in the above lines. Well those lines are supposed to print current working folder and files but when i run the above code it gives me nothing in response, not even 'f'. So does that mean that os.walk() method cannot enter the windows 7 folders? * One more thing is that instead of trying to run the above script form 'cli' wouldn't it better to run it as a cgi script and see the results in the browser instead with the addition fo this line? print ( "Content-type: text/html; charset=UTF-8 \n" ) Or for some reason this has to be run from the shell to both local(windows 7) and remote hosting (linux) servers? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: [snip] The ID number of each php page was contained in the old php code within this string PageID = some_number So instead of create a new ID number for eaqch page i have to pull out this number to store to the beginnign to the file as comment line, because it has direct relationship with the mysql database as in tracking the number of each webpage and finding the counter of it. # Grab the PageID contained within the php code and store it in id variable id = re.search( 'PageID = ', src_data ) How to tell Python to Grab that number after 'PageID = ' string and to store it in var id that a later use in the program? If the part of the file you're trying to match look like this: PageID = 12 then the regex should look like this: PageID = (\d+) and the code should look like this: page_id = re.search(r'PageID = (\d+)', src_data).group(1) The page_id will, of course, be a string. also i made another changewould something like this work: === # open same php file for storing modified data print ( 'writing to %s' % dest_f ) f = open(src_f, 'w') f.write(src_data) f.close() # rename edited .php file to .html extension dst_f = src_f.replace('.php', '.html') os.rename( src_f, dst_f ) === Because instead of creating a new .html file and inserting the desired data of the old php thus having two files(old php, and new html) i decided to open the same php file for writing that data and then rename it to html. Would the above code work? Why wouldn't it? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Please help me with these last changes before i try to perform an overall change. its almost done! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 10 Αύγ, 01:43, MRAB wrote: > Íßêïò wrote: > > D:\>convert.py > > File "D:\convert.py", line 34 > > SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line > > 34, but no > > encoding declared; seehttp://www.python.org/peps/pep-0263.htmlfor > > details > > > D:\> > > > What does it refering too? what character cannot be identified? > > > Line 34 is: > > > src_data = src_data.replace( '', ' > color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d ' ) > > Didn't you say that you're using Python 2.7 now? The default file > encoding will be ASCII, but your file isn't ASCII, it contains Greek > letters. Add the encoding line: > > # -*- coding: utf-8 -*- > > and check that the file is saved as UTF-8. > > > Also, > > > for currdir, files, dirs in os.walk('test'): > > > for f in files: > > > if f.lower().endswith("php"): > > > in the above lines > > > should i state os.walk('test') or os.walk('d:\test') ? > > The path 'test' is relative to the current working directory. Is that > D:\ for your script? If not, then it won't find the (correct) folder. > > It might be better to use an absolute path instead. You could use > either: > > r'd:\test' > > (note that I've made it a raw string because it contains a backslash > which I want treated as a literal backslash) or: > > 'd:/test' > > (Windows should accept a slash as well as of a backslash.) I will try it as soon as i make another change that i missed: The ID number of each php page was contained in the old php code within this string PageID = some_number So instead of create a new ID number for eaqch page i have to pull out this number to store to the beginnign to the file as comment line, because it has direct relationship with the mysql database as in tracking the number of each webpage and finding the counter of it. # Grab the PageID contained within the php code and store it in id variable id = re.search( 'PageID = ', src_data ) How to tell Python to Grab that number after 'PageID = ' string and to store it in var id that a later use in the program? also i made another changewould something like this work: === # open same php file for storing modified data print ( 'writing to %s' % dest_f ) f = open(src_f, 'w') f.write(src_data) f.close() # rename edited .php file to .html extension dst_f = src_f.replace('.php', '.html') os.rename( src_f, dst_f ) === Because instead of creating a new .html file and inserting the desired data of the old php thus having two files(old php, and new html) i decided to open the same php file for writing that data and then rename it to html. Would the above code work? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: D:\>convert.py File "D:\convert.py", line 34 SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line 34, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details D:\> What does it refering too? what character cannot be identified? Line 34 is: src_data = src_data.replace( '', ' Αριθμός Επισκεπτών: %(counter)d ' ) Didn't you say that you're using Python 2.7 now? The default file encoding will be ASCII, but your file isn't ASCII, it contains Greek letters. Add the encoding line: # -*- coding: utf-8 -*- and check that the file is saved as UTF-8. Also, for currdir, files, dirs in os.walk('test'): for f in files: if f.lower().endswith("php"): in the above lines should i state os.walk('test') or os.walk('d:\test') ? The path 'test' is relative to the current working directory. Is that D:\ for your script? If not, then it won't find the (correct) folder. It might be better to use an absolute path instead. You could use either: r'd:\test' (note that I've made it a raw string because it contains a backslash which I want treated as a literal backslash) or: 'd:/test' (Windows should accept a slash as well as of a backslash.) -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: On 9 Αύγ, 23:17, MRAB wrote: Νίκος wrote: On 9 Αύγ, 21:05, Thomas Jollans wrote: On Monday 09 August 2010, it occurred to Νίκος to exclaim: On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: Νίκος wrote: Please tell me that no matter what weird charhs has inside ic an still open thosie fiels and make the neccessary replacements. Go back to 2.6 for the moment and defer learning about unicode until you're done with the conversion job. You are correct again! 3.2 caused the problem, i switched to 2.7 and now i donyt have that problem anymore. File is openign okey! it ALMOST convert correctly! # replace tags print ( 'replacing php tags and contents within' ) src_data = re.sub( '<\?(.*?)\?>', '', src_data ) it only convert the first instance of php tages and not the rest? But why? http://docs.python.org/library/re.html#re.S You probably need to pass the re.DOTALL flag. src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) like this? re.sub doesn't accept a flags argument. You can put the flag inside the regex itself like this: src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data) (Note that the abbreviation for re.DOTALL is re.S and the inline flag is '(?s)'. This is for historical reasons! :-)) This is for the '.' to match any character including '\n' too right? so no matter if the php start tag and the end tag is in different lines still to be matched, correct? We nned the 'raw' string as well? why? The regex doens't cotnain backslashes. Yes it does; two of them! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
D:\>convert.py File "D:\convert.py", line 34 SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line 34, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details D:\> What does it refering too? what character cannot be identified? Line 34 is: src_data = src_data.replace( '', ' Αριθμός Επισκεπτών: %(counter)d ' ) Also, for currdir, files, dirs in os.walk('test'): for f in files: if f.lower().endswith("php"): in the above lines should i state os.walk('test') or os.walk('d:\test') ? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 23:28, MRAB wrote: > Íßêïò wrote: > > On 9 Áýã, 10:07, Íßêïò wrote: > >> Now the code looks as follows: > > >> = > >> #!/usr/bin/python > > >> import re, os, sys > > >> id = 0 # unique page_id > > >> for currdir, files, dirs in os.walk('test'): > > >> for f in files: > > >> if f.endswith('php'): > > [snip] > > >> I just tried to test it. I created a folder names 'test' in me 'd:\' > >> drive. > >> Then i have put to .php files inside form the original to test if it > >> would work ok for those too files before acting in the whole copy and > >> after in the original project. > > >> so i opened a 'cli' form my Win7 and tried > > >> D:\>convert.py > > >> D:\> > > >> Itsjust printed an empty line and nothign else. Why didn't even try to > >> open the folder and fiels within? > >> Syntactically it doesnt ghive me an error! > >> Somehting with os.walk() methos perhaps? > > > Can you help in this too please? > > > Now iam able to just convrt a single file 'd:\test\index.php' > > > But these needs to be done for ALL the php files in every subfolder. > > >> for currdir, files, dirs in os.walk('test'): > > >> for f in files: > > >> if f.endswith('php'): > > > Should the above lines enter folders and find php files in each folder > > so to be edited? > > I'd start by commenting-out the lines which change the files and then > add some more print statements to see which files it's finding. That > might give a clue. Only when it's fixed and finding the correct files > would I remove the additional print statements and then restore the > commented lines. I did that, but it doesnt even get to the 'test' folder to search for the files! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 23:17, MRAB wrote: > Νίκος wrote: > > On 9 Αύγ, 21:05, Thomas Jollans wrote: > >> On Monday 09 August 2010, it occurred to Νίκος to exclaim: > > >>> On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: > Νίκος wrote: > > Please tell me that no matter what weird charhs has inside ic an still > > open thosie fiels and make the neccessary replacements. > Go back to 2.6 for the moment and defer learning about unicode until > you're done with the conversion job. > >>> You are correct again! 3.2 caused the problem, i switched to 2.7 and > >>> now i donyt have that problem anymore. File is openign okey! > >>> it ALMOST convert correctly! > >>> # replace tags > >>> print ( 'replacing php tags and contents within' ) > >>> src_data = re.sub( '<\?(.*?)\?>', '', src_data ) > >>> it only convert the first instance of php tages and not the rest? > >>> But why? > >>http://docs.python.org/library/re.html#re.S > > >> You probably need to pass the re.DOTALL flag. > > > src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) > > > like this? > > re.sub doesn't accept a flags argument. You can put the flag inside the > regex itself like this: > > src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data) > > (Note that the abbreviation for re.DOTALL is re.S and the inline flag is > '(?s)'. This is for historical reasons! :-)) This is for the '.' to match any character including '\n' too right? so no matter if the php start tag and the end tag is in different lines still to be matched, correct? We nned the 'raw' string as well? why? The regex doens't cotnain backslashes. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: On 9 Αύγ, 10:07, Νίκος wrote: Now the code looks as follows: = #!/usr/bin/python import re, os, sys id = 0 # unique page_id for currdir, files, dirs in os.walk('test'): for f in files: if f.endswith('php'): [snip] I just tried to test it. I created a folder names 'test' in me 'd:\' drive. Then i have put to .php files inside form the original to test if it would work ok for those too files before acting in the whole copy and after in the original project. so i opened a 'cli' form my Win7 and tried D:\>convert.py D:\> Itsjust printed an empty line and nothign else. Why didn't even try to open the folder and fiels within? Syntactically it doesnt ghive me an error! Somehting with os.walk() methos perhaps? Can you help in this too please? Now iam able to just convrt a single file 'd:\test\index.php' But these needs to be done for ALL the php files in every subfolder. for currdir, files, dirs in os.walk('test'): for f in files: if f.endswith('php'): Should the above lines enter folders and find php files in each folder so to be edited? I'd start by commenting-out the lines which change the files and then add some more print statements to see which files it's finding. That might give a clue. Only when it's fixed and finding the correct files would I remove the additional print statements and then restore the commented lines. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: On 9 Αύγ, 21:05, Thomas Jollans wrote: On Monday 09 August 2010, it occurred to Νίκος to exclaim: On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: Νίκος wrote: Please tell me that no matter what weird charhs has inside ic an still open thosie fiels and make the neccessary replacements. Go back to 2.6 for the moment and defer learning about unicode until you're done with the conversion job. You are correct again! 3.2 caused the problem, i switched to 2.7 and now i donyt have that problem anymore. File is openign okey! it ALMOST convert correctly! # replace tags print ( 'replacing php tags and contents within' ) src_data = re.sub( '<\?(.*?)\?>', '', src_data ) it only convert the first instance of php tages and not the rest? But why? http://docs.python.org/library/re.html#re.S You probably need to pass the re.DOTALL flag. src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) like this? re.sub doesn't accept a flags argument. You can put the flag inside the regex itself like this: src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data) (Note that the abbreviation for re.DOTALL is re.S and the inline flag is '(?s)'. This is for historical reasons! :-)) -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 10:07, Νίκος wrote: > Now the code looks as follows: > > = > #!/usr/bin/python > > import re, os, sys > > id = 0 # unique page_id > > for currdir, files, dirs in os.walk('test'): > > for f in files: > > if f.endswith('php'): > > # get abs path to filename > src_f = join(currdir, f) > > # open php src file > print ( 'reading from %s' % src_f ) > f = open(src_f, 'r') > src_data = f.read() # read contents of > PHP file > f.close() > > # replace tags > print ( 'replacing php tags and contents within' ) > src_data = re.sub( '', '', src_data ) > > # add ID > print ( 'adding unique page_id' ) > src_data = ( '' % id ) + src_data > id += 1 > > # add template variables > print ( 'adding counter template variable' ) > src_data = src_data.replace('', > ' color=green> Αριθμός Επισκεπτών: %(counter)d ' ) > > # rename old php file to new with .html extension > src_file = src_file.replace('.php', '.html') > > # open newly created html file for inserting data > print ( 'writing to %s' % dest_f ) > dest_f = open(src_f, 'w') > dest_f.write(src_data) # write contents > dest_f.close() > > I just tried to test it. I created a folder names 'test' in me 'd:\' > drive. > Then i have put to .php files inside form the original to test if it > would work ok for those too files before acting in the whole copy and > after in the original project. > > so i opened a 'cli' form my Win7 and tried > > D:\>convert.py > > D:\> > > Itsjust printed an empty line and nothign else. Why didn't even try to > open the folder and fiels within? > Syntactically it doesnt ghive me an error! > Somehting with os.walk() methos perhaps? Can you help in this too please? Now iam able to just convrt a single file 'd:\test\index.php' But these needs to be done for ALL the php files in every subfolder. > for currdir, files, dirs in os.walk('test'): > > for f in files: > > if f.endswith('php'): Should the above lines enter folders and find php files in each folder so to be edited? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 21:05, Thomas Jollans wrote: > On Monday 09 August 2010, it occurred to Νίκος to exclaim: > > > > > > > On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: > > > Νίκος wrote: > > > > Please tell me that no matter what weird charhs has inside ic an still > > > > open thosie fiels and make the neccessary replacements. > > > > Go back to 2.6 for the moment and defer learning about unicode until > > > you're done with the conversion job. > > > You are correct again! 3.2 caused the problem, i switched to 2.7 and > > now i donyt have that problem anymore. File is openign okey! > > > it ALMOST convert correctly! > > > # replace tags > > print ( 'replacing php tags and contents within' ) > > src_data = re.sub( '<\?(.*?)\?>', '', src_data ) > > > it only convert the first instance of php tages and not the rest? > > But why? > > http://docs.python.org/library/re.html#re.S > > You probably need to pass the re.DOTALL flag. src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) like this? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 20:29, John S wrote: > When replacing text in an HTML document with re.sub, you want to use > the re.S (singleline) option; otherwise your pattern won't match when > the opening tag is on one line and the closing is on another. Thats exactly the problem iam facing now with this statement. src_data = re.sub( '<\?(.*?)\?>', '', src_data ) you mean i have to switch it like this? src_data = re.S ( '<\?(.*?)\?>', '', src_data ) ? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Monday 09 August 2010, it occurred to Νίκος to exclaim: > On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: > > Νίκος wrote: > > > Please tell me that no matter what weird charhs has inside ic an still > > > open thosie fiels and make the neccessary replacements. > > > > Go back to 2.6 for the moment and defer learning about unicode until > > you're done with the conversion job. > > You are correct again! 3.2 caused the problem, i switched to 2.7 and > now i donyt have that problem anymore. File is openign okey! > > it ALMOST convert correctly! > > # replace tags > print ( 'replacing php tags and contents within' ) > src_data = re.sub( '<\?(.*?)\?>', '', src_data ) > > it only convert the first instance of php tages and not the rest? > But why? http://docs.python.org/library/re.html#re.S You probably need to pass the re.DOTALL flag. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 19:21, Peter Otten <__pete...@web.de> wrote: > Νίκος wrote: > > Please tell me that no matter what weird charhs has inside ic an still > > open thosie fiels and make the neccessary replacements. > > Go back to 2.6 for the moment and defer learning about unicode until you're > done with the conversion job. You are correct again! 3.2 caused the problem, i switched to 2.7 and now i donyt have that problem anymore. File is openign okey! it ALMOST convert correctly! # replace tags print ( 'replacing php tags and contents within' ) src_data = re.sub( '<\?(.*?)\?>', '', src_data ) it only convert the first instance of php tages and not the rest? But why? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: > Please tell me that no matter what weird charhs has inside ic an still > open thosie fiels and make the neccessary replacements. Go back to 2.6 for the moment and defer learning about unicode until you're done with the conversion job. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Please tell me that no matter what weird charhs has inside ic an still open thosie fiels and make the neccessary replacements. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: On 9 Αύγ, 16:52, MRAB wrote: Νίκος wrote: On 8 Αύγ, 17:59, Thomas Jollans wrote: Two problems here: str.replace doesn't use regular expressions. You'll have to use the re module to use regexps. (the re.sub function to be precise) '.' matches a single character. Any character, but only one. '.*' matches as many characters as possible. This is not what you want, since it will match everything between the *first* . You want non-greedy matching. '.*?' is the same thing, without the greed. Thanks you, So i guess this needs to be written as: src_data = re.sub( '', '', src_data ) In a regex '?' is a special character, so if you want a literal '?' you need to escape it. Therefore: src_data = re.sub(r'<\?(.*?)\?>', '', src_data) i see, or perhaps even this: src_data = re.sub(r'', '', src_data) maybe it works here as well. No. That regex means that it should match: # '>' -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 16:52, MRAB wrote: > Νίκος wrote: > > On 8 Αύγ, 17:59, Thomas Jollans wrote: > > >> Two problems here: > > >> str.replace doesn't use regular expressions. You'll have to use the re > >> module to use regexps. (the re.sub function to be precise) > > >> '.' matches a single character. Any character, but only one. > >> '.*' matches as many characters as possible. This is not what you want, > >> since it will match everything between the *first* . > >> You want non-greedy matching. > > >> '.*?' is the same thing, without the greed. > > > Thanks you, > > > So i guess this needs to be written as: > > > src_data = re.sub( '', '', src_data ) > > In a regex '?' is a special character, so if you want a literal '?' you > need to escape it. Therefore: > > src_data = re.sub(r'<\?(.*?)\?>', '', src_data) i see, or perhaps even this: src_data = re.sub(r'', '', src_data) maybe it works here as well. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 13:47, Peter Otten <__pete...@web.de> wrote: > Νίκος wrote: > > On 9 Αύγ, 13:06, Peter Otten <__pete...@web.de> wrote: > > >> > So since its utf-8 what the problem of opening it? > > >> Python says it's not, and I tend to believe it. > > > You are right! > > > I tried to do the same exact openign via IDLE enviroment and i goth > > the encoding of the file from there! > > open("d:\\test\\index.php" ,'r') > > <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'> > > > Thats why in the error in my previous post it said > > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode > > it tried to use the cp1253 encoding. > > > But now sicne Python as we see can undestand the nature of the > > encoding what causing it not to open the file? > > It doesn't. You have to tell. Why it doesn't? The idle response designates that it knows that file encoding is in "cp1253" which means it can identify it. *If* the file uses cp1253 you can open it with > > open(..., encoding="cp1253") > > Note that if the file is not in cp1253 python will still happily open it as > long as it doesn't contain the following bytes: > > >>> for i in range(256): > > ... try: chr(i).decode("cp1253") and None > ... except: print i > ... > 129 > 136 > 138 > 140 > 141 > 142 > 143 > 144 > 152 > 154 > 156 > 157 > 158 > 159 > 170 > 210 > 255 > > Peter I'm afraid it does because whn i tried: f = open(src_f, 'r', encoding="cp1253" ) i got the same error again.what are those characters?Dont they belong too tot he same weird 'cp1253' encoding? Why compiler cant open them? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: On 8 Αύγ, 17:59, Thomas Jollans wrote: Two problems here: str.replace doesn't use regular expressions. You'll have to use the re module to use regexps. (the re.sub function to be precise) '.' matches a single character. Any character, but only one. '.*' matches as many characters as possible. This is not what you want, since it will match everything between the *first* . You want non-greedy matching. '.*?' is the same thing, without the greed. Thanks you, So i guess this needs to be written as: src_data = re.sub( '', '', src_data ) In a regex '?' is a special character, so if you want a literal '?' you need to escape it. Therefore: src_data = re.sub(r'<\?(.*?)\?>', '', src_data) Tha 'r' special char doesn't need to be inserter before the regex here due to regex ain't containing backslashes. You will have to find the tag before inserting the string. str.find should help -- or you could use str.replace and replace the tag with you counter line, plus a new . Ah yes! Damn why din't i think of it str.replace should do the trick. I was stuck trying to figure regexes. So, i guess that should work: src_data = src_data.replace('', ' Αριθμός Επισκεπτών: %(counter)d ' ) No it's not. You're just giving up too soon. Yes youa re right, your hints keep me going and thank you for that. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: > On 9 Αύγ, 13:06, Peter Otten <__pete...@web.de> wrote: > >> > So since its utf-8 what the problem of opening it? >> >> Python says it's not, and I tend to believe it. > > You are right! > > I tried to do the same exact openign via IDLE enviroment and i goth > the encoding of the file from there! > open("d:\\test\\index.php" ,'r') > <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'> > > Thats why in the error in my previous post it said > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode > it tried to use the cp1253 encoding. > > But now sicne Python as we see can undestand the nature of the > encoding what causing it not to open the file? It doesn't. You have to tell. *If* the file uses cp1253 you can open it with open(..., encoding="cp1253") Note that if the file is not in cp1253 python will still happily open it as long as it doesn't contain the following bytes: >>> for i in range(256): ... try: chr(i).decode("cp1253") and None ... except: print i ... 129 136 138 140 141 142 143 144 152 154 156 157 158 159 170 210 255 Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 13:06, Peter Otten <__pete...@web.de> wrote: > > So since its utf-8 what the problem of opening it? > > Python says it's not, and I tend to believe it. You are right! I tried to do the same exact openign via IDLE enviroment and i goth the encoding of the file from there! >>> open("d:\\test\\index.php" ,'r') <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'> Thats why in the error in my previous post it said File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode it tried to use the cp1253 encoding. But now sicne Python as we see can undestand the nature of the encoding what causing it not to open the file? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: > On 9 Αύγ, 11:45, Peter Otten <__pete...@web.de> wrote: >> Νίκος wrote: >> > On 9 Αύγ, 10:38, Peter Otten <__pete...@web.de> wrote: >> >> Νίκος wrote: >> >> > Now the code looks as follows: >> >> > for currdir, files, dirs in os.walk('test'): >> >> >> > for f in files: >> >> >> > if f.endswith('php'): >> >> >> > # get abs path to filename >> >> > src_f = join(currdir, f) >> >> > I just tried to test it. I created a folder names 'test' in me 'd:\' >> >> > drive. >> >> > Then i have put to .php files inside form the original to test if it >> >> > would work ok for those too files before acting in the whole copy >> >> > and after in the original project. >> >> >> > so i opened a 'cli' form my Win7 and tried >> >> >> > D:\>convert.py >> >> >> > D:\> >> >> >> > Itsjust printed an empty line and nothign else. Why didn't even try >> >> > to open the folder and fiels within? >> >> > Syntactically it doesnt ghive me an error! >> >> > Somehting with os.walk() methos perhaps? >> >> >> If there is a folder D:\test and it does contain some PHP files >> >> (double- check!) the extension could be upper-case. Try >> >> >> if f.lower().endswith("php"): ... >> >> >> or >> >> >> php_files = fnmatch.filter(files, "*.php") >> >> for f in php_files: ... >> >> >> Peter >> >> > The extension is in in lower case. folder is there, php files is >> > there, i dont know why it doesnt't want to go into the d:\test to find >> > them. >> >> > Thast one problem. >> >> > The other one is: >> >> > i made the code simpler by specifying the filename my self. >> >> > = >> > # get abs path to filename >> > src_f = 'd:\\test\\index.php' >> >> > # open php src file >> > print ( 'reading from %s' % src_f ) >> > f = open(src_f, 'r') >> > src_data = f.read()# read contents of PHP file >> > f.close() >> > = >> >> > but although ti nwo finds the fiel i egt this error in 'cli': >> >> > D:\>aconvert.py >> > reading from d:\test\index.php >> > Traceback (most recent call last): >> > File "D:\aconvert.py", line 16, in >> > src_data = f.read() # read contents of PHP file >> > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode >> > return codecs.charmap_decode(input,self.errors,decoding_table)[0] >> > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position >> > 321: char >> > acter maps to >> >> > Somethign with the damn encodings again!! >> >> Hmm, at one point in this thread you switched from Python 2.x to Python >> 3.2. There are a lot of subtle and not so subtle differences between 2.x >> and 3.x, and I recommend that you stick to one while you are still in >> newbie mode. >> >> If you want to continue to use 3.x I recommend that you at least use the >> stable 3.1 version. >> >> Now one change from Python 2 to 3 is that open(filename, "r") gives you a >> beast that is unicode-aware and assumes that the file is encoded in utf-8 >> unless you tell it otherwise with open(..., encoding=whatever). So what >> is the charset used for your index.php? >> >> Peter > > > Yes yesterday i switched to Python 3.2 Peter. > > When i open index.php within Notapad++ it says its in utf-8 without > BOM and it contains inside exepect form english chars , greek cjhars > as well fro printing. > > The file was made by my client in dreamweaver. > > So since its utf-8 what the problem of opening it? Python says it's not, and I tend to believe it. You can open the file with open(..., errors="replace") but you will lose data (which is already garbled, anyway). Again: in the unlikely case that Python is causing your problem -- you do understand what an alpha version is? Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 11:45, Peter Otten <__pete...@web.de> wrote: > Νίκος wrote: > > On 9 Αύγ, 10:38, Peter Otten <__pete...@web.de> wrote: > >> Νίκος wrote: > >> > Now the code looks as follows: > >> > for currdir, files, dirs in os.walk('test'): > > >> > for f in files: > > >> > if f.endswith('php'): > > >> > # get abs path to filename > >> > src_f = join(currdir, f) > >> > I just tried to test it. I created a folder names 'test' in me 'd:\' > >> > drive. > >> > Then i have put to .php files inside form the original to test if it > >> > would work ok for those too files before acting in the whole copy and > >> > after in the original project. > > >> > so i opened a 'cli' form my Win7 and tried > > >> > D:\>convert.py > > >> > D:\> > > >> > Itsjust printed an empty line and nothign else. Why didn't even try to > >> > open the folder and fiels within? > >> > Syntactically it doesnt ghive me an error! > >> > Somehting with os.walk() methos perhaps? > > >> If there is a folder D:\test and it does contain some PHP files (double- > >> check!) the extension could be upper-case. Try > > >> if f.lower().endswith("php"): ... > > >> or > > >> php_files = fnmatch.filter(files, "*.php") > >> for f in php_files: ... > > >> Peter > > > The extension is in in lower case. folder is there, php files is > > there, i dont know why it doesnt't want to go into the d:\test to find > > them. > > > Thast one problem. > > > The other one is: > > > i made the code simpler by specifying the filename my self. > > > = > > # get abs path to filename > > src_f = 'd:\\test\\index.php' > > > # open php src file > > print ( 'reading from %s' % src_f ) > > f = open(src_f, 'r') > > src_data = f.read() # read contents of PHP file > > f.close() > > = > > > but although ti nwo finds the fiel i egt this error in 'cli': > > > D:\>aconvert.py > > reading from d:\test\index.php > > Traceback (most recent call last): > > File "D:\aconvert.py", line 16, in > > src_data = f.read() # read contents of PHP file > > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode > > return codecs.charmap_decode(input,self.errors,decoding_table)[0] > > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position > > 321: char > > acter maps to > > > Somethign with the damn encodings again!! > > Hmm, at one point in this thread you switched from Python 2.x to Python 3.2. > There are a lot of subtle and not so subtle differences between 2.x and 3.x, > and I recommend that you stick to one while you are still in newbie mode. > > If you want to continue to use 3.x I recommend that you at least use the > stable 3.1 version. > > Now one change from Python 2 to 3 is that open(filename, "r") gives you a > beast that is unicode-aware and assumes that the file is encoded in utf-8 > unless you tell it otherwise with open(..., encoding=whatever). So what is > the charset used for your index.php? > > Peter Yes yesterday i switched to Python 3.2 Peter. When i open index.php within Notapad++ it says its in utf-8 without BOM and it contains inside exepect form english chars , greek cjhars as well fro printing. The file was made by my client in dreamweaver. So since its utf-8 what the problem of opening it? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: > On 9 Αύγ, 10:38, Peter Otten <__pete...@web.de> wrote: >> Νίκος wrote: >> > Now the code looks as follows: >> > for currdir, files, dirs in os.walk('test'): >> >> > for f in files: >> >> > if f.endswith('php'): >> >> > # get abs path to filename >> > src_f = join(currdir, f) >> > I just tried to test it. I created a folder names 'test' in me 'd:\' >> > drive. >> > Then i have put to .php files inside form the original to test if it >> > would work ok for those too files before acting in the whole copy and >> > after in the original project. >> >> > so i opened a 'cli' form my Win7 and tried >> >> > D:\>convert.py >> >> > D:\> >> >> > Itsjust printed an empty line and nothign else. Why didn't even try to >> > open the folder and fiels within? >> > Syntactically it doesnt ghive me an error! >> > Somehting with os.walk() methos perhaps? >> >> If there is a folder D:\test and it does contain some PHP files (double- >> check!) the extension could be upper-case. Try >> >> if f.lower().endswith("php"): ... >> >> or >> >> php_files = fnmatch.filter(files, "*.php") >> for f in php_files: ... >> >> Peter > > The extension is in in lower case. folder is there, php files is > there, i dont know why it doesnt't want to go into the d:\test to find > them. > > Thast one problem. > > The other one is: > > i made the code simpler by specifying the filename my self. > > = > # get abs path to filename > src_f = 'd:\\test\\index.php' > > # open php src file > print ( 'reading from %s' % src_f ) > f = open(src_f, 'r') > src_data = f.read() # read contents of PHP file > f.close() > = > > but although ti nwo finds the fiel i egt this error in 'cli': > > D:\>aconvert.py > reading from d:\test\index.php > Traceback (most recent call last): > File "D:\aconvert.py", line 16, in > src_data = f.read() # read contents of PHP file > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode > return codecs.charmap_decode(input,self.errors,decoding_table)[0] > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position > 321: char > acter maps to > > Somethign with the damn encodings again!! Hmm, at one point in this thread you switched from Python 2.x to Python 3.2. There are a lot of subtle and not so subtle differences between 2.x and 3.x, and I recommend that you stick to one while you are still in newbie mode. If you want to continue to use 3.x I recommend that you at least use the stable 3.1 version. Now one change from Python 2 to 3 is that open(filename, "r") gives you a beast that is unicode-aware and assumes that the file is encoded in utf-8 unless you tell it otherwise with open(..., encoding=whatever). So what is the charset used for your index.php? Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 9 Αύγ, 10:38, Peter Otten <__pete...@web.de> wrote: > Νίκος wrote: > > Now the code looks as follows: > > for currdir, files, dirs in os.walk('test'): > > > for f in files: > > > if f.endswith('php'): > > > # get abs path to filename > > src_f = join(currdir, f) > > I just tried to test it. I created a folder names 'test' in me 'd:\' > > drive. > > Then i have put to .php files inside form the original to test if it > > would work ok for those too files before acting in the whole copy and > > after in the original project. > > > so i opened a 'cli' form my Win7 and tried > > > D:\>convert.py > > > D:\> > > > Itsjust printed an empty line and nothign else. Why didn't even try to > > open the folder and fiels within? > > Syntactically it doesnt ghive me an error! > > Somehting with os.walk() methos perhaps? > > If there is a folder D:\test and it does contain some PHP files (double- > check!) the extension could be upper-case. Try > > if f.lower().endswith("php"): ... > > or > > php_files = fnmatch.filter(files, "*.php") > for f in php_files: ... > > Peter The extension is in in lower case. folder is there, php files is there, i dont know why it doesnt't want to go into the d:\test to find them. Thast one problem. The other one is: i made the code simpler by specifying the filename my self. = # get abs path to filename src_f = 'd:\\test\\index.php' # open php src file print ( 'reading from %s' % src_f ) f = open(src_f, 'r') src_data = f.read() # read contents of PHP file f.close() = but although ti nwo finds the fiel i egt this error in 'cli': D:\>aconvert.py reading from d:\test\index.php Traceback (most recent call last): File "D:\aconvert.py", line 16, in src_data = f.read() # read contents of PHP file File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position 321: char acter maps to Somethign with the damn encodings again!! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Νίκος wrote: > Now the code looks as follows: > for currdir, files, dirs in os.walk('test'): > > for f in files: > > if f.endswith('php'): > > # get abs path to filename > src_f = join(currdir, f) > I just tried to test it. I created a folder names 'test' in me 'd:\' > drive. > Then i have put to .php files inside form the original to test if it > would work ok for those too files before acting in the whole copy and > after in the original project. > > so i opened a 'cli' form my Win7 and tried > > D:\>convert.py > > D:\> > > Itsjust printed an empty line and nothign else. Why didn't even try to > open the folder and fiels within? > Syntactically it doesnt ghive me an error! > Somehting with os.walk() methos perhaps? If there is a folder D:\test and it does contain some PHP files (double- check!) the extension could be upper-case. Try if f.lower().endswith("php"): ... or php_files = fnmatch.filter(files, "*.php") for f in php_files: ... Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Now the code looks as follows: = #!/usr/bin/python import re, os, sys id = 0 # unique page_id for currdir, files, dirs in os.walk('test'): for f in files: if f.endswith('php'): # get abs path to filename src_f = join(currdir, f) # open php src file print ( 'reading from %s' % src_f ) f = open(src_f, 'r') src_data = f.read() # read contents of PHP file f.close() # replace tags print ( 'replacing php tags and contents within' ) src_data = re.sub( '', '', src_data ) # add ID print ( 'adding unique page_id' ) src_data = ( '' % id ) + src_data id += 1 # add template variables print ( 'adding counter template variable' ) src_data = src_data.replace('', ' Αριθμός Επισκεπτών: %(counter)d ' ) # rename old php file to new with .html extension src_file = src_file.replace('.php', '.html') # open newly created html file for inserting data print ( 'writing to %s' % dest_f ) dest_f = open(src_f, 'w') dest_f.write(src_data) # write contents dest_f.close() I just tried to test it. I created a folder names 'test' in me 'd:\' drive. Then i have put to .php files inside form the original to test if it would work ok for those too files before acting in the whole copy and after in the original project. so i opened a 'cli' form my Win7 and tried D:\>convert.py D:\> Itsjust printed an empty line and nothign else. Why didn't even try to open the folder and fiels within? Syntactically it doesnt ghive me an error! Somehting with os.walk() methos perhaps? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 17:59, Thomas Jollans wrote: > Two problems here: > > str.replace doesn't use regular expressions. You'll have to use the re > module to use regexps. (the re.sub function to be precise) > > '.' matches a single character. Any character, but only one. > '.*' matches as many characters as possible. This is not what you want, > since it will match everything between the *first* . > You want non-greedy matching. > > '.*?' is the same thing, without the greed. Thanks you, So i guess this needs to be written as: src_data = re.sub( '', '', src_data ) Tha 'r' special char doesn't need to be inserter before the regex here due to regex ain't containing backslashes. > You will have to find the tag before inserting the string. > str.find should help -- or you could use str.replace and replace the > tag with you counter line, plus a new . Ah yes! Damn why din't i think of it str.replace should do the trick. I was stuck trying to figure regexes. So, i guess that should work: src_data = src_data.replace('', ' Αριθμός Επισκεπτών: %(counter)d ' ) > No it's not. You're just giving up too soon. Yes youa re right, your hints keep me going and thank you for that. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
� wrote: Hello dear Pythoneers, I have over 500 .php web pages in various subfolders under 'data' folder that i have to rename to .html and and ditch the '' tages from within and also insert a very first line of where id must be an identification unique number of every page for counter tracking purposes. ONly pure html code must be left. I before find otu Python used php and now iam switching to templates + python solution so i ahve to change each and every page. I don't know how to handle such a big data replacing problem and cannot play with fire because those 500 pages are my cleints pages and data of those filesjust cannot be messes up. Can you provide to me a script please that is able of performing an automatic way of such a page content replacing? Thanks a million! This is quite a vague description of the file contents. But, for a completely different approach, how about using a browser and doing view source, then saving the html that was generated. This will contain no php code, but it will contain the results of whatever the php was doing. If you don't have time to do this manually, look into wget or curl, which will do the job in a program environment. The discussion so far has dealt with stripping php, and leaving the html. But the html must have embeded ?> in it. Or, there could be long fragments of html which are constructed by php and then echo'ed. Joel Goldstick -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Aug 8, 10:59 am, Thomas Jollans wrote: > On 08/08/2010 04:06 PM, Νίκος wrote: > > > > > > > On 8 Αύγ, 15:40, Thomas Jollans wrote: > >> On 08/08/2010 01:41 PM, Νίκος wrote: > > >>> I was so dizzy and confused yesterday that i forgot to metnion that > >>> not only i need removal of php openign and closing tags but whaevers > >>> data lurks inside those tags as well ebcause now with the 'counter.py' > >>> script i wrote the html fiels would open ftm there and substitute the > >>> tempalte variabels like %(counter)d > > >> I could just hand you a solution, but I'll be a bit of a bastard and > >> just give you some hints. > > >> You could use regular expressions. If you know regular expressions, it's > >> relatively trivial - but I doubt you know regexp. > > > Here is the code with some try-and-fail modification i made, still non- > > working based on your hints: > > == > > > id = 0 # unique page_id > > > for currdir, files, dirs in os.walk('varsa'): > > > for f in files: > > > if f.endswith('php'): > > > # get abs path to filename > > src_f = join(currdir, f) > > > # open php src file > > print 'reading from %s' % src_f > > f = open(src_f, 'r') > > src_data = f.read() # read contents of PHP file > > f.close() > > > # replace tags > > print 'replacing php tags and contents within' > > src_data = src_data.replace(r'', '') # > > the dot matches any character i hope! no matter how many of them?!? > > Two problems here: > > str.replace doesn't use regular expressions. You'll have to use the re > module to use regexps. (the re.sub function to be precise) > > '.' matches a single character. Any character, but only one. > '.*' matches as many characters as possible. This is not what you want, > since it will match everything between the *first* . > You want non-greedy matching. > > '.*?' is the same thing, without the greed. > > > > > # add ID > > print 'adding unique page_id' > > src_data = ( '' % id ) + src_data > > id += 1 > > > # add template variables > > print 'adding counter template variable' > > src_data = src_data + ''' Αριθμός > > Επισκεπτών: %(counter)d ''' > > # i can think of this but the above line must be above > body> NOT after but how to right that?!? > > You will have to find the tag before inserting the string. > str.find should help -- or you could use str.replace and replace the > tag with you counter line, plus a new . > > > > > # rename old php file to new with .html extension > > src_file = src_file.replace('.php', '.html') > > > # open newly created html file for inserting data > > print 'writing to %s' % dest_f > > dest_f = open(src_f, 'w') > > dest_f.write(src_data) # write contents > > dest_f.close() > > > This is the best i can do. > > No it's not. You're just giving up too soon. When replacing text in an HTML document with re.sub, you want to use the re.S (singleline) option; otherwise your pattern won't match when the opening tag is on one line and the closing is on another. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 08/08/2010 04:06 PM, Νίκος wrote: > On 8 Αύγ, 15:40, Thomas Jollans wrote: >> On 08/08/2010 01:41 PM, Νίκος wrote: >> >>> I was so dizzy and confused yesterday that i forgot to metnion that >>> not only i need removal of php openign and closing tags but whaevers >>> data lurks inside those tags as well ebcause now with the 'counter.py' >>> script i wrote the html fiels would open ftm there and substitute the >>> tempalte variabels like %(counter)d >> >> I could just hand you a solution, but I'll be a bit of a bastard and >> just give you some hints. >> >> You could use regular expressions. If you know regular expressions, it's >> relatively trivial - but I doubt you know regexp. > > Here is the code with some try-and-fail modification i made, still non- > working based on your hints: > == > > id = 0 # unique page_id > > for currdir, files, dirs in os.walk('varsa'): > > for f in files: > > if f.endswith('php'): > > # get abs path to filename > src_f = join(currdir, f) > > # open php src file > print 'reading from %s' % src_f > f = open(src_f, 'r') > src_data = f.read() # read contents of PHP file > f.close() > > # replace tags > print 'replacing php tags and contents within' > src_data = src_data.replace(r'', '') # > the dot matches any character i hope! no matter how many of them?!? Two problems here: str.replace doesn't use regular expressions. You'll have to use the re module to use regexps. (the re.sub function to be precise) '.' matches a single character. Any character, but only one. '.*' matches as many characters as possible. This is not what you want, since it will match everything between the *first* . You want non-greedy matching. '.*?' is the same thing, without the greed. > > # add ID > print 'adding unique page_id' > src_data = ( '' % id ) + src_data > id += 1 > > # add template variables > print 'adding counter template variable' > src_data = src_data + ''' Αριθμός > Επισκεπτών: %(counter)d ''' > # i can think of this but the above line must be above body> NOT after but how to right that?!? You will have to find the tag before inserting the string. str.find should help -- or you could use str.replace and replace the tag with you counter line, plus a new . > > # rename old php file to new with .html extension > src_file = src_file.replace('.php', '.html') > > # open newly created html file for inserting data > print 'writing to %s' % dest_f > dest_f = open(src_f, 'w') > dest_f.write(src_data) # write contents > dest_f.close() > > This is the best i can do. No it's not. You're just giving up too soon. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 15:40, Thomas Jollans wrote: > On 08/08/2010 01:41 PM, Νίκος wrote: > > > I was so dizzy and confused yesterday that i forgot to metnion that > > not only i need removal of php openign and closing tags but whaevers > > data lurks inside those tags as well ebcause now with the 'counter.py' > > script i wrote the html fiels would open ftm there and substitute the > > tempalte variabels like %(counter)d > > I could just hand you a solution, but I'll be a bit of a bastard and > just give you some hints. > > You could use regular expressions. If you know regular expressions, it's > relatively trivial - but I doubt you know regexp. Here is the code with some try-and-fail modification i made, still non- working based on your hints: == id = 0 # unique page_id for currdir, files, dirs in os.walk('varsa'): for f in files: if f.endswith('php'): # get abs path to filename src_f = join(currdir, f) # open php src file print 'reading from %s' % src_f f = open(src_f, 'r') src_data = f.read() # read contents of PHP file f.close() # replace tags print 'replacing php tags and contents within' src_data = src_data.replace(r'', '') # the dot matches any character i hope! no matter how many of them?!? # add ID print 'adding unique page_id' src_data = ( '' % id ) + src_data id += 1 # add template variables print 'adding counter template variable' src_data = src_data + ''' Αριθμός Επισκεπτών: %(counter)d ''' # i can think of this but the above line must be above NOT after but how to right that?!? # rename old php file to new with .html extension src_file = src_file.replace('.php', '.html') # open newly created html file for inserting data print 'writing to %s' % dest_f dest_f = open(src_f, 'w') dest_f.write(src_data) # write contents dest_f.close() This is the best i can do. Sorry for any typos i might made. Please shed some LIGHT! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 08/08/2010 01:41 PM, Νίκος wrote: > I was so dizzy and confused yesterday that i forgot to metnion that > not only i need removal of php openign and closing tags but whaevers > data lurks inside those tags as well ebcause now with the 'counter.py' > script i wrote the html fiels would open ftm there and substitute the > tempalte variabels like %(counter)d I could just hand you a solution, but I'll be a bit of a bastard and just give you some hints. You could use regular expressions. If you know regular expressions, it's relatively trivial - but I doubt you know regexp. You could also repeatedly find the next occurrence of first a start tag, then an end tag, using either str.find or str.split, and build up a version of the file without PHP yourself. > Also before the > > > > > of every html file afetr removing the tags this line must be > inserted(this holds the template variable) that 'counter.py' uses to > produce data > > Αριθμός Επισκεπτών: %(counter)d > This problem is truly trivial. I know you can do it yourself, or at least give it a good shot, and ask again when you hit a serious roadblock. If I may comment on your HTML: you forgot to close your and tags. Close them! Also, both (CENTER and FONT) have been deprecated since HTML 4.0 -- you should consider using CSS for these tasks instead. Also, this line does not look like a heading, so H4 is hardly fitting. > > After making this modifications then i can trst the script to a COPY > of the original data in my pc. It would be nice if you re-read your posts before sending and tried to iron out some of more careless spelling mistakes. Maybe you are doing your best to post in good English -- it isn't bad and I realize this is neither your native language nor alphabet, in which case I apologize. The fact of the matter is: I originally interpreter "trst" as "trust", which made no sense whatsoever. > > *In my pc i run Windows 7 while remote web hosting setup uses Linux > Servers. > *That wont be a problem right? Nah. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 13:13, Thomas Jollans wrote: > On 08/08/2010 11:21 AM, Νίκος wrote: > > > Please help me adjust it, if need extra modification for more php tags > > replacing. > > Have you tried it ? I haven't, but I see no immediate reason why it > wouldn't work with multiple PHP blocks. > > > > > > > #!/usr/bin/python > > > import cgitb; cgitb.enable() > > import cgi, re, os > > > print ( "Content-type: text/html; charset=UTF-8 \n" ) > > > id = 0 # unique page_id > > > for currdir, files, dirs in os.walk('data'): > > > for f in files: > > > if f.endswith('php'): > > > # get abs path to filename > > src_f = join(currdir,f) > > > # open php src file > > f = open(src_f, 'r') > > src_data = f.read() # read contents of PHP file > > f.close() > > print 'reading from %s' % src_f > > > # replace tags > > src_data = src_data.replace('<%', '') > > src_data = src_data.replace('%>', '') > > Did you read the script before posting? ;-) > Here, you remove ASP-style tags. Which is fine, PHP supports them if you > configure it that way, but you probably didn't. Change this to the start > and end tags you actually use, and, if you use multiple forms (such as > > > > > print 'replacing php tags' > > > # add ID > > src_data = ( '' % id ) + src_data > > id += 1 > > print 'adding unique page_id' > > > # create new file with .html extension > > src_file = src_file.replace('.php', '.html') > > > # open newly created html file for insertid data > > dest_f = open(src_f, 'w') > > dest_f.write(src_data) # write contents > > dest_f.close() > > print 'writing to %s' % dest_f Yes i have read the code very well and by mistake i wrote '<%>' instead of ' of every html file afetr removing the tags this line must be inserted(this holds the template variable) that 'counter.py' uses to produce data Αριθμός Επισκεπτών: %(counter)d After making this modifications then i can trst the script to a COPY of the original data in my pc. *In my pc i run Windows 7 while remote web hosting setup uses Linux Servers. *That wont be a problem right? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 08/08/2010 11:21 AM, Νίκος wrote: > Please help me adjust it, if need extra modification for more php tags > replacing. Have you tried it ? I haven't, but I see no immediate reason why it wouldn't work with multiple PHP blocks. > #!/usr/bin/python > > import cgitb; cgitb.enable() > import cgi, re, os > > print ( "Content-type: text/html; charset=UTF-8 \n" ) > > > id = 0 # unique page_id > > for currdir, files, dirs in os.walk('data'): > > for f in files: > > if f.endswith('php'): > > # get abs path to filename > src_f = join(currdir,f) > > # open php src file > f = open(src_f, 'r') > src_data = f.read() # read contents of PHP file > f.close() > print 'reading from %s' % src_f > > # replace tags > src_data = src_data.replace('<%', '') > src_data = src_data.replace('%>', '') Did you read the script before posting? ;-) Here, you remove ASP-style tags. Which is fine, PHP supports them if you configure it that way, but you probably didn't. Change this to the start and end tags you actually use, and, if you use multiple forms (such as print 'replacing php tags' > > # add ID > src_data = ( '' % id ) + src_data > id += 1 > print 'adding unique page_id' > > # create new file with .html extension > src_file = src_file.replace('.php', '.html') > > # open newly created html file for insertid data > dest_f = open(src_f, 'w') > dest_f.write(src_data) # write contents > dest_f.close() > print 'writing to %s' % dest_f > -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 08/08/2010 04:46 AM, rantingrick wrote: > *facepalm*! I really must stop Usenet-ing whilst consuming large > volumes of alcoholic beverages. THAT explains a lot. Cheers -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Script so far: #!/usr/bin/python import cgitb; cgitb.enable() import cgi, re, os print ( "Content-type: text/html; charset=UTF-8 \n" ) id = 0 # unique page_id for currdir, files, dirs in os.walk('data'): for f in files: if f.endswith('php'): # get abs path to filename src_f = join(currdir,f) # open php src file f = open(src_f, 'r') src_data = f.read() # read contents of PHP file f.close() print 'reading from %s' % src_f # replace tags src_data = src_data.replace('<%', '') src_data = src_data.replace('%>', '') print 'replacing php tags' # add ID src_data = ( '' % id ) + src_data id += 1 print 'adding unique page_id' # create new file with .html extension src_file = src_file.replace('.php', '.html') # open newly created html file for insertid data dest_f = open(src_f, 'w') dest_f.write(src_data) # write contents dest_f.close() print 'writing to %s' % dest_f Please help me adjust it, if need extra modification for more php tags replacing. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 11:09, Steven D'Aprano wrote: > On Sat, 07 Aug 2010 17:20:24 -0700, Νίκος wrote: > > I don't know how to handle such a big data replacing problem and cannot > > play with fire because those 500 pages are my cleints pages and data of > > those filesjust cannot be messes up. > > Take a backup copy of the files, and only edit the copies. Don't replace > the originals until you know they're correct. > > -- > Steven Yes of course, but the code that John S provided need soem modification in order to be able to change various instances of php tags and not only one set. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Sat, 07 Aug 2010 17:20:24 -0700, Νίκος wrote: > I don't know how to handle such a big data replacing problem and cannot > play with fire because those 500 pages are my cleints pages and data of > those filesjust cannot be messes up. Take a backup copy of the files, and only edit the copies. Don't replace the originals until you know they're correct. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 05:56, John S wrote: >"How can I use RE string replacement to find PHP tags and convert them >to Django template tags?" No, not at all John, at least not yet! I have only 1 week that i'm learnign python(changing from php & perl) so i'm very fresh at this beautifull and straighforwrd language. When i have a good understnading of Python then i will proceed to Django templates. Until then my Python templates would be only 'simple html files' that the only thign they contain apart form the html data would be the special string formatting identifies '%s' :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On 8 Αύγ, 05:42, John S wrote: > If the 500 web pages are PHP only in the sense that there is only one > pair of tags in each file, surrounding the entire content, then > what you ask for is doable. First of all, thank you very much John for your BIG effort to help me(i'm still readign your posts)! I have to tell you here that those php files contain several instances of php opening and closing tags(like 3 each php file). The rest is pure html data. That happened because those files were in the beginning html only files that later needed conversion to php due to some dynamic code that had to be used to address some issues. Please tell me that the code you provided can be adjusted to several instances as well! -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
Even though I just replied above, in reading over the OP's message, I think the OP might be asking: "How can I use RE string replacement to find PHP tags and convert them to Django template tags?" Instead of saying source_contents = source_contents.replace(...) say this instead: import re def replace_php_tags(m): ''' PHP tag replacer This function is called for each PHP tag. It gets a Match object as its parameter, so you can get the contents of the old tag, and should return the new (Django) tag. ''' # m is the match object from the current match php_guts = m.group(1) # the contents of the PHP tag # now put the replacement logic here # and return whatever should go in place of the PHP tag, # which could be '{{ python_template_var }}' # or '{% template logic ... %} # or some combination source_contents = re.sub('',replace_php_tags,source_contents) -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Aug 7, 8:42 pm, MRAB wrote: > That should be: > > data = data.replace(' data = data.replace('?>', '') Yes, Thanks MRAB. I did forget that important detail. > Strings don't have an 'insert' method! *facepalm*! I really must stop Usenet-ing whilst consuming large volumes of alcoholic beverages. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Aug 7, 8:20 pm, Νίκος wrote: > Hello dear Pythoneers, > > I have over 500 .php web pages in various subfolders under 'data' > folder that i have to rename to .html and and ditch the '' > tages from within and also insert a very first line of > where id must be an identification unique number of every page for > counter tracking purposes. ONly pure html code must be left. > > I before find otu Python used php and now iam switching to templates + > python solution so i ahve to change each and every page. > > I don't know how to handle such a big data replacing problem and > cannot play with fire because those 500 pages are my cleints pages and > data of those filesjust cannot be messes up. > > Can you provide to me a script please that is able of performing an > automatic way of such a page content replacing? > > Thanks a million! If the 500 web pages are PHP only in the sense that there is only one pair of tags in each file, surrounding the entire content, then what you ask for is doable. from os.path import join import os id = 1 # id number for currdir,files,dirs in os.walk('data'): for f in files: if f.endswith('php'): source_file_name = join(currdir,f)# get abs path to filename source_file = open(source_file_name) source_contents = source_file.read() # read contents of PHP file source_file.close() # replace tags source_contents = source_contents.replace('<%','') source_contents = source_contents.replace('%>','') # add ID source_contents = ( '' % id ) + source_contents id += 1 # create new file with .html extension source_file_name = source_file_name.replace('.php','.html') dest_file = open(source_file_name,'w') dest_file.write(source_contents) # write contents dest_file.close() Note: error checking left out for clarity. On the other hand, if your 500 web pages contain embedded PHP variables or logic, you have a big job ahead. Django templates and PHP are two different languages for embedding data and logic in web pages. Converting a project from PHP to Django involves more than renaming the template files and deleting "'; } ?> In Django, you would typically put this logic in a Django *view* (which btw is not what is called a 'view' in MVC term), which is the code that prepares data for the template. The logic would not live with the HTML. The template uses "template variables" that the view has associated with a Python variable or function. You might create a template variable (created via a Context object) named 'browser' that contains a value that identifies the browser. Thus, your Python template (HTML file) might look like this: {% if browser == 'IE' %}You are using Internet Explorer{% endif %} PHP tends to combine the presentation with the business logic, or in MVC terms, combines the view with the controller. Django separates them out, which many people find to be a better way. The person who writes the HTML doesn't have to speak Python, but only know the names of template variables and a little bit of template logic. In PHP, the HTML code and all the business logic lives in the same files. Even here, it would probably make sense to calculate the browser ID in the header of the HTML file, then access it via a variable in the body. If you have 500 static web pages that are part of the same application, but that do not contain any logic, your application might need to be redesigned. Also, you are doing your changes on a COPY of the application on a non- public server, aren't you? If not, then you really are playing with fire. HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
# rename ALL php files to html in every subfolder of the folder 'data' os.rename('*.php', '*.html') # how to tell python to rename ALL php files to html to ALL subfolder under 'data' ? # current path of the file to be processed path = './data' # this must be somehow in a loop i feel that read every file of every subfolder # open an html file for reading f = open(path, 'rw') # read the contents of the whole file data = f.read() # replace all php tags with empty string data = data.replace('', '') # write replaced data to file data = f.write() # insert an increasing unique integer number at the very first line of every html file processing comment = ""%(idnum) # how will the number change here an increased by one file after file? f = f.close() Please help i'm new to python an apart from syntx its a logic problem as well and needs experience. -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
rantingrick wrote: On Aug 7, 7:20 pm, Νίκος wrote: Hello dear Pythoneers, I prefer Pythonista, but anywho.. I have over 500 .php web pages in various subfolders under 'data' folder that i have to rename to .html import os os.rename(old, new) and and ditch the '' tages from within path = 'some/valid/path' f = open(path, 'r') data = f.read() f.close() data.replace('', '') That should be: data = data.replace('', '') and also insert a very first line of where id must be an identification unique number of every page for counter tracking purposes. comment = ""%(idnum) data.insert(idx, comment) Strings don't have an 'insert' method! ONly pure html code must be left. Well then don't F up! However judging from the amount of typos in this post i would suggest you do some major testing! I don't know how to handle such a big data replacing problem and cannot play with fire because those 500 pages are my cleints pages and data of those files just cannot be messes up. Better do some serous testing first, or (if you have enough disc space ) create copies instead! Can you provide to me a script please that is able of performing an automatic way of such a page content replacing? This is very basic stuff and the fine manual is free you know. But how much are you willing to pay? -- http://mail.python.org/mailman/listinfo/python-list
Re: Replace and inserting strings within .txt files with the use of regex
On Aug 7, 7:20 pm, Νίκος wrote: > Hello dear Pythoneers, I prefer Pythonista, but anywho.. > I have over 500 .php web pages in various subfolders under 'data' > folder that i have to rename to .html import os os.rename(old, new) > and and ditch the '' tages from within path = 'some/valid/path' f = open(path, 'r') data = f.read() f.close() data.replace('', '') > and also insert a very first line of > where id must be an identification unique number of every page for > counter tracking purposes. comment = ""%(idnum) data.insert(idx, comment) > ONly pure html code must be left. Well then don't F up! However judging from the amount of typos in this post i would suggest you do some major testing! > I don't know how to handle such a big data replacing problem and > cannot play with fire because those 500 pages are my cleints pages and > data of those files just cannot be messes up. Better do some serous testing first, or (if you have enough disc space ) create copies instead! > Can you provide to me a script please that is able of performing an > automatic way of such a page content replacing? This is very basic stuff and the fine manual is free you know. But how much are you willing to pay? -- http://mail.python.org/mailman/listinfo/python-list
Replace and inserting strings within .txt files with the use of regex
Hello dear Pythoneers, I have over 500 .php web pages in various subfolders under 'data' folder that i have to rename to .html and and ditch the '' tages from within and also insert a very first line of where id must be an identification unique number of every page for counter tracking purposes. ONly pure html code must be left. I before find otu Python used php and now iam switching to templates + python solution so i ahve to change each and every page. I don't know how to handle such a big data replacing problem and cannot play with fire because those 500 pages are my cleints pages and data of those filesjust cannot be messes up. Can you provide to me a script please that is able of performing an automatic way of such a page content replacing? Thanks a million! -- http://mail.python.org/mailman/listinfo/python-list