One approach you can take is to add Metatags to webpages, and then extend 
HtmlParseFilter using a custom plugin. You can add the metatags content to the 
parse data in HtmlParseFilter extension and then extend IndexingFilter to add 
the metatags to the index.
   
  - Sathyam

Chris Hane <[EMAIL PROTECTED]> wrote:
  I am looking to use nutch to crawl/index a website. A lot of the pages 
have videos on them. We have transcripts for the videos that we would like 
to be included for indexing; but we do not want to put the transcripts on 
the web pages.

Is there a way to "add" this information to a given web page for purposes 
of indexing as part of the crawl process? Maybe another point in the 
process before the index is generated? I am hoping there is a point in the 
crawl process where I can add augmented content to a page in the nutch 
segment (rough thought based on very limited time spent looking at nutch).

We are comfortable using java and can write custom code as needed. I would 
appreciate any pointers on where to look in the nutch code.

Thanks in advance,
Chris.....


       
---------------------------------
Yahoo! oneSearch: Finally,  mobile search that gives answers, not web links. 
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to