Enis - thanks for the pointer. Enis Soztutar wrote: > You can write index plugins. Please first read the (slighlty outdated) > tutorial and then check http://wiki.apache.org/nutch/PluginCentral. > Optionally you may want to write html parse plugins depending on the > source of the data. > > Chris Hane wrote: >> I am looking to use nutch to crawl/index a website. A lot of the >> pages have videos on them. We have transcripts for the videos that we >> would like to be included for indexing; but we do not want to put the >> transcripts on the web pages. >> >> Is there a way to "add" this information to a given web page for >> purposes of indexing as part of the crawl process? Maybe another >> point in the process before the index is generated? I am hoping there >> is a point in the crawl process where I can add augmented content to a >> page in the nutch segment (rough thought based on very limited time >> spent looking at nutch). >> >> We are comfortable using java and can write custom code as needed. I >> would appreciate any pointers on where to look in the nutch code. >> >> Thanks in advance, >> Chris..... >> >
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
