Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote:
 So, a solution by regex is out.

Actually, none of the complications you listed appear to exclude
regexes.  Here's a possible (untested) solution:

div class=img
((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+
height=[0-9]+)+)
\s*p class=cpt((?:[^]|(?!/p))+)/p
\s*/div

and corresponding replacement string:

figure
\1
figcaption\2/figcaption
/figure

I don't know what dialect Emacs uses for regexes; the above is the
Python re dialect.  I assume it is translatable.  If not, then the
above should at least work with other editors, such as Komodo's
Find/Replace in Files command.  I kept the line breaks here for
readability, but for completeness they should be stripped out of the
final regex.

The possibility of nested HTML in the caption is allowed for by using
a negative look-ahead assertion to accept any tag except a closing
/p.  It would break if you had nested p tags, but then that would
be invalid html anyway.

Cheers,
Ian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 4, 12:13 pm, S.Mandl stefanma...@web.de wrote:
 Nice. I guess that XSLT would be another (the official) approach for
 such a task.
 Is there an XSLT-engine for Emacs?

 -- Stefan

haven't used XSLT, and don't know if there's one in emacs...

it'd be nice if someone actually give a example...

 Xah
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote:
  So, a solution by regex is out.

 Actually, none of the complications you listed appear to exclude
 regexes.  Here's a possible (untested) solution:

 div class=img
 ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+
 height=[0-9]+)+)
 \s*p class=cpt((?:[^]|(?!/p))+)/p
 \s*/div

 and corresponding replacement string:

 figure
 \1
 figcaption\2/figcaption
 /figure

 I don't know what dialect Emacs uses for regexes; the above is the
 Python re dialect.  I assume it is translatable.  If not, then the
 above should at least work with other editors, such as Komodo's
 Find/Replace in Files command.  I kept the line breaks here for
 readability, but for completeness they should be stripped out of the
 final regex.

 The possibility of nested HTML in the caption is allowed for by using
 a negative look-ahead assertion to accept any tag except a closing
 /p.  It would break if you had nested p tags, but then that would
 be invalid html anyway.

 Cheers,
 Ian

that's fantastic. Thanks! I'll try it out.

 Xah
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote:
 On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote:
  So, a solution by regex is out.

 Actually, none of the complications you listed appear to exclude
 regexes.  Here's a possible (untested) solution:

 div class=img
 ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+
 height=[0-9]+)+)
 \s*p class=cpt((?:[^]|(?!/p))+)/p
 \s*/div

 and corresponding replacement string:

 figure
 \1
 figcaption\2/figcaption
 /figure

 I don't know what dialect Emacs uses for regexes; the above is the
 Python re dialect.  I assume it is translatable.  If not, then the
 above should at least work with other editors, such as Komodo's
 Find/Replace in Files command.  I kept the line breaks here for
 readability, but for completeness they should be stripped out of the
 final regex.

 The possibility of nested HTML in the caption is allowed for by using
 a negative look-ahead assertion to accept any tag except a closing
 /p.  It would break if you had nested p tags, but then that would
 be invalid html anyway.

 Cheers,
 Ian

emacs regex supports shygroup (the 「(?:…)」) but it doesn't support the
negative assertion 「?!…」 though.

but in anycase, i can't see how this part would work
p class=cpt((?:[^]|(?!/p))+)/p

?

 Xah
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee xah...@gmail.com wrote:
 but in anycase, i can't see how this part would work
 p class=cpt((?:[^]|(?!/p))+)/p

It's not that different from the pattern 「alt=[^]+」 earlier in the
regex.  The capture group accepts one or more characters that either
aren't '', or that are '' but are not immediately followed by '/p'.
 Thus it stops capturing when it sees exactly '/p' without consuming
the ''.  Using my regex with the example that you posted earlier
demonstrates that it works:

 import re
 s = '''div class=img
... img src=jamie_cat.jpg alt=jamie's cat width=167 height=106
... p class=cptjamie's cat! Her blog is a href=http://example.com/
... jamie/http://example.com/jamie//a/p
... /div'''
 print re.sub(pattern, replace, s)
figure
img src=jamie_cat.jpg alt=jamie's cat width=167 height=106
figcaptionjamie's cat! Her blog is a href=http://example.com/
jamie/http://example.com/jamie//a/figcaption
/figure

Cheers,
Ian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread S.Mandl
 haven't used XSLT, and don't know if there's one in emacs...

 it'd be nice if someone actually give a example...


Hi Xah, actually I have to correct myself. HTML is not XML. If it
were, you
could use a stylesheet like this:

?xml version=1.0 encoding=ISO-8859-1?
xsl:stylesheet version=1.0
xmlns:xsl=http://www.w3.org/1999/XSL/Transform;

xsl:template match=p[@class='cpt']
  figcaption
xsl:value-of select=./
  /figcaption
/xsl:template

xsl:template match=div[@class='img']
  figure
xsl:apply-templates select=@*|node()/
  /figure
/xsl:template

xsl:template match=@*|node()
  xsl:copy
xsl:apply-templates select=@*|node()/
  /xsl:copy
/xsl:template


/xsl:stylesheet

which applied to this document:

?xml version=1.0 encoding=ISO-8859-1?
doc
h1Just having fun/h1with all the
div class=img
  img src=cat1.jpg alt=my cat width=200 height=200/
  img src=cat2.jpg alt=my cat width=200 height=200/
  p class=cptmy 2 cats/p
/div
cats here:
h1Just fooling around/h1
div class=img
  img src=jamie_cat.jpg alt=jamie's cat width=167 height=106/

  p class=cptjamie's cat! Her blog is a href=http://example.com/
jamie/http://example.com/jamie//a/p
/div
/doc

would yield:

?xml version=1.0?
doc
h1Just having fun/h1with all the
figure class=img
  img src=cat1.jpg alt=my cat width=200 height=200/
  img src=cat2.jpg alt=my cat width=200 height=200/
  figcaptionmy 2 cats/figcaption
/figure
cats here:
h1Just fooling around/h1
figure class=img
  img src=jamie_cat.jpg alt=jamie's cat width=167 height=106/

  figcaptionjamie's cat! Her blog is http://example.com/jamie//figcaption
/figure
/doc

But well, as you don't have XML as input ... there really was no point
to my remark.

Best,
Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-04 Thread S.Mandl
Nice. I guess that XSLT would be another (the official) approach for
such a task.
Is there an XSLT-engine for Emacs?

-- Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list