Hi
I'm playing with the latest snapshot and I need a way to "filter" the
html files that are being indexed: Specifically I need to filter out
the <script></script> blocks that are in the pages.
I tried using an ExternalParser to call a little perl script to do the
job, using it as a converter from text/html to text/html but htdig loops
over the same file repeatedly (as I rather expected would happen). I
guess I could have the script output text/plain instead, but that seems
a lot of work when it's only parsing an html doc and htdig does that
already.
Is there some technique that I've missed, or is this one for the feature
wish-list?
Thanks
Gareth
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.