[htdig3-dev] pre-parsing pages

Gareth Watts Mon, 25 Dec 2000 12:55:50 -0800

Hi

I'm playing with the latest snapshot and I need a way to "filter" the
html files that are being indexed:  Specifically I need to filter out
the <script></script> blocks that are in the pages.

I tried using an ExternalParser to call a little perl script to do the
job, using it as a converter from text/html to text/html but htdig loops
over the same file repeatedly (as I rather expected would happen).  I
guess I could have the script output text/plain instead, but that seems
a lot of work when it's only parsing an html doc and htdig does that
already.

Is there some technique that I've missed, or is this one for the feature
wish-list?

Thanks

Gareth

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.

[htdig3-dev] pre-parsing pages

Reply via email to