On Mon, Mar 07, 2016 at 05:44:08PM -0800, jorrit...@gmail.com wrote:
> I'm trying to replace *[URL]www.link.com[/URL]* with HTML with this regexp:
> 
> topic.text = re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', topic
> .text, flags=re.I)
> 
> But it's giving me the following problems:
> 
>    1. The $2 capture group is only able to be repeated once, so I get 
>    <a href="www.link.com">$2</a>
>    instead of 
>    <a href="www.link.com">www.link.com</a>

I have my doubts – if you use the standard Python re library, then the
way to refer to captured groups is "\1", "\2", etc., not "$1". When I
try the code you posted above, I get the following result (i.e., not
even the first occurrence of "$2" gets substituted)::

    >>> re.sub("(\[URL\])(.*)(\[\/URL\])", '<a href="$2">$2</a>', 
'[URL]www.link.com[/URL]', flags=re.I)
    '<a href="$2">$2</a>'

In order to make the substitution work for a single occurrence of
[URL]...[/URL], you can use the following, which uses "\2" (Also, when
writing regular expressions, or other strings that are supposed to
contain the backslash character, it is a good idea to write them as
raw string literals, i.e. prefix them with a "r", which I've done
below; that way, Python won't try to interpret the backslashes as
special characters – otherwise, "\2" would become a character with an
ASCII value of 2)::

    >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link.com[/URL]', flags=re.I)
    '<a href="www.link.com">www.link.com</a>'

>    2. Only the first *[URL]* is matched. Everything after the first *[/URL]* 
>    is simply deleted...

The solution above gets you halfway there – re.sub will replace all
matches by default, the problem here is that the "(.*)" part of your
regex will matches everything between the first "[URL]", and the last
"[/URL]"::

    >>> re.sub(r"(\[URL\])(.*)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', 
flags=re.I)
    '<a 
href="www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com">www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com</a>'

The reason is that the asterisk operator in a regex is greedy, which
means a ".*" will try to match as much as possible. When you use the
non-greedy version of the operator (which you get by putting a
question mark after the asterisk), you get the result you want::

    >>> re.sub(r"(\[URL\])(.*?)(\[\/URL\])", r'<a href="\2">\2</a>', 
'[URL]www.link1.com[/URL][URL]www.link2.com[/URL][URL]www.link3.com[/URL]', 
flags=re.I)
    '<a href="www.link1.com">www.link1.com</a><a 
href="www.link2.com">www.link2.com</a><a href="www.link3.com">www.link3.com</a>'


You can read an explanation of the difference between greedy and
non-greedy regular expressions in the Python docs:
https://docs.python.org/2/howto/regex.html#greedy-versus-non-greedy

Good luck,

Michal

>    
> I hope someone can help me with this. I'm using Python 2.7 if it makes a 
> difference.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/fce5a726-8a4c-455a-a978-6ee70d66464e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/20160308084020.GE25061%40koniiiik.org.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: Digital signature

Reply via email to