Re: emacs lisp text processing example (html5 figure/figcaption)
On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: div class=img ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+ height=[0-9]+)+) \s*p class=cpt((?:[^]|(?!/p))+)/p \s*/div and corresponding replacement string: figure \1 figcaption\2/figcaption /figure I don't know what dialect Emacs uses for regexes; the above is the Python re dialect. I assume it is translatable. If not, then the above should at least work with other editors, such as Komodo's Find/Replace in Files command. I kept the line breaks here for readability, but for completeness they should be stripped out of the final regex. The possibility of nested HTML in the caption is allowed for by using a negative look-ahead assertion to accept any tag except a closing /p. It would break if you had nested p tags, but then that would be invalid html anyway. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
On Jul 4, 12:13 pm, S.Mandl stefanma...@web.de wrote: Nice. I guess that XSLT would be another (the official) approach for such a task. Is there an XSLT-engine for Emacs? -- Stefan haven't used XSLT, and don't know if there's one in emacs... it'd be nice if someone actually give a example... Xah -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: div class=img ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+ height=[0-9]+)+) \s*p class=cpt((?:[^]|(?!/p))+)/p \s*/div and corresponding replacement string: figure \1 figcaption\2/figcaption /figure I don't know what dialect Emacs uses for regexes; the above is the Python re dialect. I assume it is translatable. If not, then the above should at least work with other editors, such as Komodo's Find/Replace in Files command. I kept the line breaks here for readability, but for completeness they should be stripped out of the final regex. The possibility of nested HTML in the caption is allowed for by using a negative look-ahead assertion to accept any tag except a closing /p. It would break if you had nested p tags, but then that would be invalid html anyway. Cheers, Ian that's fantastic. Thanks! I'll try it out. Xah -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: div class=img ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+ height=[0-9]+)+) \s*p class=cpt((?:[^]|(?!/p))+)/p \s*/div and corresponding replacement string: figure \1 figcaption\2/figcaption /figure I don't know what dialect Emacs uses for regexes; the above is the Python re dialect. I assume it is translatable. If not, then the above should at least work with other editors, such as Komodo's Find/Replace in Files command. I kept the line breaks here for readability, but for completeness they should be stripped out of the final regex. The possibility of nested HTML in the caption is allowed for by using a negative look-ahead assertion to accept any tag except a closing /p. It would break if you had nested p tags, but then that would be invalid html anyway. Cheers, Ian emacs regex supports shygroup (the 「(?:…)」) but it doesn't support the negative assertion 「?!…」 though. but in anycase, i can't see how this part would work p class=cpt((?:[^]|(?!/p))+)/p ? Xah -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee xah...@gmail.com wrote: but in anycase, i can't see how this part would work p class=cpt((?:[^]|(?!/p))+)/p It's not that different from the pattern 「alt=[^]+」 earlier in the regex. The capture group accepts one or more characters that either aren't '', or that are '' but are not immediately followed by '/p'. Thus it stops capturing when it sees exactly '/p' without consuming the ''. Using my regex with the example that you posted earlier demonstrates that it works: import re s = '''div class=img ... img src=jamie_cat.jpg alt=jamie's cat width=167 height=106 ... p class=cptjamie's cat! Her blog is a href=http://example.com/ ... jamie/http://example.com/jamie//a/p ... /div''' print re.sub(pattern, replace, s) figure img src=jamie_cat.jpg alt=jamie's cat width=167 height=106 figcaptionjamie's cat! Her blog is a href=http://example.com/ jamie/http://example.com/jamie//a/figcaption /figure Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
haven't used XSLT, and don't know if there's one in emacs... it'd be nice if someone actually give a example... Hi Xah, actually I have to correct myself. HTML is not XML. If it were, you could use a stylesheet like this: ?xml version=1.0 encoding=ISO-8859-1? xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform; xsl:template match=p[@class='cpt'] figcaption xsl:value-of select=./ /figcaption /xsl:template xsl:template match=div[@class='img'] figure xsl:apply-templates select=@*|node()/ /figure /xsl:template xsl:template match=@*|node() xsl:copy xsl:apply-templates select=@*|node()/ /xsl:copy /xsl:template /xsl:stylesheet which applied to this document: ?xml version=1.0 encoding=ISO-8859-1? doc h1Just having fun/h1with all the div class=img img src=cat1.jpg alt=my cat width=200 height=200/ img src=cat2.jpg alt=my cat width=200 height=200/ p class=cptmy 2 cats/p /div cats here: h1Just fooling around/h1 div class=img img src=jamie_cat.jpg alt=jamie's cat width=167 height=106/ p class=cptjamie's cat! Her blog is a href=http://example.com/ jamie/http://example.com/jamie//a/p /div /doc would yield: ?xml version=1.0? doc h1Just having fun/h1with all the figure class=img img src=cat1.jpg alt=my cat width=200 height=200/ img src=cat2.jpg alt=my cat width=200 height=200/ figcaptionmy 2 cats/figcaption /figure cats here: h1Just fooling around/h1 figure class=img img src=jamie_cat.jpg alt=jamie's cat width=167 height=106/ figcaptionjamie's cat! Her blog is http://example.com/jamie//figcaption /figure /doc But well, as you don't have XML as input ... there really was no point to my remark. Best, Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: emacs lisp text processing example (html5 figure/figcaption)
Nice. I guess that XSLT would be another (the official) approach for such a task. Is there an XSLT-engine for Emacs? -- Stefan -- http://mail.python.org/mailman/listinfo/python-list