Erez Schatz <moonb...@gmail.com> writes:

> Care to share your script here? I'd love to see what you came up with.

OK.  I've stripped-out the application-specific data from the script;
here it is in its redacted form:

,{
  ,y/<form/ g/<head/ s/(.|\n)*>(Expected Page 
Title)<(.|\n)*updated.*>([0-9]+\/[0-9]+\/[0-9]+)<(.|\n)*/:Title: \2\n:Date: 
\4\n/
  s/<form//
  ,y/<form/ v/<head/ y/<h2/ v|/h2| d
  ,y/<form/ v/<head/ s/<h2/\n/
  ,y/<form/ v/<head/ y/<h2/ {
    g|/h2| y/<tr/ {
      v/<td/ s/([ 
>]|&nbsp;)*([^<]*)<\/h2>(.|\n)*/\n\n\2\n==========================\n/
      g/<td/ g/Entry#/ y/<td/ {
        v|/td| c/\n/
      }
      g/<td/ v/Entry#/ y/<td/ v|/td| d
      g/<td/ y/<td/ {
        g|/td| v/colspan="3"/ g/>(Header Foo)/ y/(Header Foo)/ {
          v|/td| c/:/
          g|/td| s/ : /:/
          g|/td| x/(\n|<[^<>]+>\??|&nbsp;)+/ c/ /
          g|/td| a/\n/
        }
        g|/td| v/colspan="3"/ g/>(Header Bar)/ y/(Header Bar)/ {
          v|/td| c/:/
          g|/td| i/:/
          g|/td| x/( |\n|<[^<>]+>\??|&nbsp;)+/ c/ /
          g|/td| a/\n/
        }
        g|/td| v/colspan="3"/ g/>(Entry#|Field Foo|Filed Bar|Field Baz|Field 
Quux|Field Snarf|Field Barf)/ y/(Entry#|Field Foo|Filed Bar|Field Baz|Field 
Quux|Field Snarf|Field Barf)/ {
          v|/td| c/:/
          g|/td| s/[:.?]*/:/
          g|/td| x/( |\n|<[^<>]+>|&nbsp;)+/ c/ /
          g|/td| a/\n/
        }
        g|/td| v/colspan="3"/ g/inch/ {
          s/[^>]*>( |\n|<[^<>]+>|&nbsp;)*(([a-zA-Z0-9\-]+ )+inch 
?([a-zA-Z0-9\-]+ )*[a-zA-Z0-9\-]+)( |\n|<[^<>]+>|&nbsp;)*/:Inches: \2\n/
        }
        g|/td| v/colspan="3"/ g/Stock/ {
          i/:Density:/
          x/( |\n|<?[^<>]+>|&nbsp;)+/ c/ /
          a/\n/
        }
        g|/td| g/colspan="3"/ {
          s/[^>]*>/:Details:/
          y/[^>]*colspan="3"[^>]*>/ x/( |\n|<[^<>]+>\??|&nbsp;)+/ c/ /
          a/\n:Notes: \n\n/
        }
        g|/td| g|checkbox| d
      }
    }
  }
}
,x/<h2|<tr|<td/d

That last line is a bit of a hack.  I needed it because there didn't
appear to be any way to delete the "<h2" delimiters from within the
,y/<h2/.  But it works because those HTML tags do not appear in the
final reStructuredText.

A lot of the complexity of the script comes from the need to keep the
changes in sequence.  I really hope that the implementation of the sam
language in Acme doesn't impose the same requirement to keep changes in
order; it's a Real Pain(TM).

One of the things that still perplexes me is the apparent necessity of
the s command.  The sam paper claims that the s command isn't necessary.
But I couldn't find any way to do the edits without resorting to it.  If
you could figure out how to replace the s commands with a combination of
other sam commands, I'd be quite impressed indeed!

-- 
+---------------------------------------------------------------+
|Smiley       <smi...@icebubble.org>    PGP key ID:    BC549F8B |
|Fingerprint: 9329 DB4A 30F5 6EDA D2BA  3489 DAB7 555A BC54 9F8B|
+---------------------------------------------------------------+

Reply via email to