Re: [PHP] Re: regex help needed -- Solved! Thanks!
Kathleen Ballard a écrit : Thanks! Works like a charm! I am the very lowest of newbies when it comes to regex and working through your solutions has been very educational. I have one question about something I couldn't figure out: #h[1-9](.*)/h[1-9]#Uie `h([1-6]).*?/h\1)`sie What is the purpose of the back-ticks and the '#'? PCRE patterns has to be enclosed, you can use all the non alpha numerics characters to do that. Personnaly, I prefer back ticks because I don't have to escape it often inside my patterns. For my example, you can also remove the ``s pattern modifier, It makes the dot ( . ) accept any New line characters, and I had not see that you removed them before. What are 'Uie' and 'sie'? there are patterns modifiers, you can find a complete list and descriptions here : http://www.php.net/manual/en/pcre.pattern.modifiers.php Thanks again! Kathleen -Original Message- From: Fabrice Lezoray [mailto:[EMAIL PROTECTED] Sent: Sunday, August 01, 2004 2:52 PM To: [EMAIL PROTECTED] Subject: [PHP] Re: regex help needed hi M. Sokolewicz a écrit : You could try something like: $return = preg_replace('#h[1-9](.*)/h[1-9]#Uie', 'str_replace(br /, , $1)'); - Tul Kathleen Ballard wrote: Sorry, Here is the code I am using to match the h* tags: h([1-9]){1}.*/h([1-9]){1} I think this mask is better : `h([1-6]).*?/h\1)`sie I have removed all the NL and CR chars from the string I am matching to make things easier. Also, I have run tidy on the code so the tags are all uniform. The above string seems to match the tag well now, but I still need to remove the br tags from the tag contents (.*). To remove the br / tags, you need to call preg_replace_callback() : ?php $str = 'h1hi br / ../h1 bla bla h5 br / ../h5 ...br /'; function cbk_br($match) { return 'h' . $match[1] . '' . str_replace('br /', '', $match[2]) . '/h' . $match[1] . ''; } $return = preg_replace_callback('`h([1-6])(.*?)/h\1`si', 'cbk_br', $str); echo $return; ? The strings I will be matching are html formatted text. Sample h* tags with content are below: h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary/h4 h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary br / Wins New Jersey/h4 h4Ex-Secretary Reich Loses Mass. Primary/h4 Again, any help is appreciated. Kathleen Sorry for my bad english .. -- Fabrice Lezoray http://classes.scriptsphp.fr - -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: regex help needed
You could try something like: $return = preg_replace('#h[1-9](.*)/h[1-9]#Uie', 'str_replace(br /, , $1)'); - Tul Kathleen Ballard wrote: Sorry, Here is the code I am using to match the h* tags: h([1-9]){1}.*/h([1-9]){1} I have removed all the NL and CR chars from the string I am matching to make things easier. Also, I have run tidy on the code so the tags are all uniform. The above string seems to match the tag well now, but I still need to remove the br tags from the tag contents (.*). The strings I will be matching are html formatted text. Sample h* tags with content are below: h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary/h4 h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary br / Wins New Jersey/h4 h4Ex-Secretary Reich Loses Mass. Primary/h4 Again, any help is appreciated. Kathleen -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: regex help needed
hi M. Sokolewicz a écrit : You could try something like: $return = preg_replace('#h[1-9](.*)/h[1-9]#Uie', 'str_replace(br /, , $1)'); - Tul Kathleen Ballard wrote: Sorry, Here is the code I am using to match the h* tags: h([1-9]){1}.*/h([1-9]){1} I think this mask is better : `h([1-6]).*?/h\1)`sie I have removed all the NL and CR chars from the string I am matching to make things easier. Also, I have run tidy on the code so the tags are all uniform. The above string seems to match the tag well now, but I still need to remove the br tags from the tag contents (.*). To remove the br / tags, you need to call preg_replace_callback() : ?php $str = 'h1hi br / ../h1 bla bla h5 br / ../h5 ...br /'; function cbk_br($match) { return 'h' . $match[1] . '' . str_replace('br /', '', $match[2]) . '/h' . $match[1] . ''; } $return = preg_replace_callback('`h([1-6])(.*?)/h\1`si', 'cbk_br', $str); echo $return; ? The strings I will be matching are html formatted text. Sample h* tags with content are below: h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary/h4 h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary br / Wins New Jersey/h4 h4Ex-Secretary Reich Loses Mass. Primary/h4 Again, any help is appreciated. Kathleen -- Fabrice Lezoray http://classes.scriptsphp.fr - -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: regex help needed -- Solved! Thanks!
Thanks! Works like a charm! I am the very lowest of newbies when it comes to regex and working through your solutions has been very educational. I have one question about something I couldn't figure out: #h[1-9](.*)/h[1-9]#Uie `h([1-6]).*?/h\1)`sie What is the purpose of the back-ticks and the '#'? What are 'Uie' and 'sie'? Thanks again! Kathleen -Original Message- From: Fabrice Lezoray [mailto:[EMAIL PROTECTED] Sent: Sunday, August 01, 2004 2:52 PM To: [EMAIL PROTECTED] Subject: [PHP] Re: regex help needed hi M. Sokolewicz a écrit : You could try something like: $return = preg_replace('#h[1-9](.*)/h[1-9]#Uie', 'str_replace(br /, , $1)'); - Tul Kathleen Ballard wrote: Sorry, Here is the code I am using to match the h* tags: h([1-9]){1}.*/h([1-9]){1} I think this mask is better : `h([1-6]).*?/h\1)`sie I have removed all the NL and CR chars from the string I am matching to make things easier. Also, I have run tidy on the code so the tags are all uniform. The above string seems to match the tag well now, but I still need to remove the br tags from the tag contents (.*). To remove the br / tags, you need to call preg_replace_callback() : ?php $str = 'h1hi br / ../h1 bla bla h5 br / ../h5 ...br /'; function cbk_br($match) { return 'h' . $match[1] . '' . str_replace('br /', '', $match[2]) . '/h' . $match[1] . ''; } $return = preg_replace_callback('`h([1-6])(.*?)/h\1`si', 'cbk_br', $str); echo $return; ? The strings I will be matching are html formatted text. Sample h* tags with content are below: h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary/h4 h4Ex-Secretary Mickey Mouse br /Loses Mass. Primary br / Wins New Jersey/h4 h4Ex-Secretary Reich Loses Mass. Primary/h4 Again, any help is appreciated. Kathleen -- Fabrice Lezoray http://classes.scriptsphp.fr -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: Regex help needed
First, the prob you got : WARNING comes from the following error: (\s+face=\Verdana, Arial, Helvetica, sans-serif\|) After the | (OR) sign, you must define another case, example: echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\(\s+face=\Verdana, Arial, Helvetica, sans-serif\|\s)\s*purchasing power parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1 face=Verdana, Arial, Helvetica, sans-seriftdfont size=2Purchasing power parity'); Secondly, it's right that the \s expression is not recognised in purchasing\s+power\s+parity , a little strange, but you can use two different ways instead of '\s': - [[:space:]] - [ ] The brackets allows you to define a sequence of characters patterns (in the second case above, the space character). It will give: echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\\s*purchasing[[:space:]]+power[[:space:]]+parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1tdfont size=2Purchasing power parity'); Just a little help, you can find on the page http://www.php.net/manual/en/ref.regex.php that could be useful for you: ^ Start of line $ End of line n? Zero or only one single occurrence of character 'n' n* Zero or more occurrences of character 'n' n+ At least one or more occurrences of character 'n' n{2} Exactly two occurrences of 'n' n{2,} At least 2 or more occurrences of 'n' n{2,4} From 2 to 4 occurrences of 'n' . Any single character () Parenthesis to group expressions (.*) Zero or more occurrences of any single character, ie, anything! (n|a) Either 'n' or 'a' [1-6] Any single digit in the range between 1 and 6 [c-h] Any single lower case letter in the range between c and h [D-M] Any single upper case letter in the range between D and M [^a-z] Any single character EXCEPT any lower case letter between a and z. Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the very first character inside a range, and it denies the entire range including the ^ symbol itself if it appears again later in the range. Also remember that if it is the first character in the entire expression, it means start of line. In any other place, it is always treated as a regular ^ symbol. In other words, you cannot deny a word with ^undesired_word or a group with ^(undesired_phrase). Read more detailed regex documentation to find out what is necessary to achieve this. [_4^a-zA-Z] Any single character which can be the underscore or the number 4 or the ^ symbol or any letter, lower or upper case ?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[]. therefore, ^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$ would mean: ^.{2} = A line beginning with any two characters, [a-z]{1,2} = followed by either 1 or 2 lower case letters, _? = followed by an optional underscore, [0-9]* = followed by zero or more digits, ([1-6]|[a-f]) = followed by either a digit between 1 and 6 OR a lower case letter between a and f, [^1-9]{2} = followed by any two characters except digits between 1 and 9 (0 is possible), a+$ = followed by at least one or more occurrences of 'a' at the end of a line. Sid a écrit: Hello, Well I am doing by first reg ex operations and I am having problems which I just cannot figure out. For example I tried echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\\s*purchasing power parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1tdfont size=2Purchasing power parity'); and this worked perfectly, but when I chnaged that to echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\\s*purchasing\s+power\s+parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1tdfont size=2Purchasing power parity'); It does not detect the string. Srange. According to what I know, \s+ will detect a single space also. I tried chnaging the last 2 \s+ to \s* but this did not work also. Any ideas on this one? As I proceed I would like the expression to detect the optional face attribute also, so I tried echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\(\s+face=\Verdana, Arial, Helvetica, sans-serif\|)\s*purchasing power parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1 face=Verdana, Arial, Helvetica, sans-seriftdfont size=2Purchasing power parity'); ... and this gave me an error like Warning: eregi_replace(): REG_EMPTY:çempty (sub)expression in D:\sid\dg\test.php on line 2 Any ideas? BTW any place where I can get started on regex? I got a perl book that explains regex, but I have got to learn perl first (I dont know any perl) Thanks in advance. - Sid Sid a écrit: Hello, Well I am doing by first reg ex operations and I am having problems which I just cannot figure out. For example I tried echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font size=\2\\s*purchasing power parity, '%POWER%', 'tdtrsdsdsstr bgcolor=#f8f8f1tdfont size=2Purchasing power parity'); and this worked perfectly, but when I chnaged that to echo eregi_replace (tr bgcolor=\#F8F8F1\(\s*)td\s*font