Hi, Nutch seems to be a very powerfull tool. But I'm not sure if I could customize it that much, to meet my requirements. I would like to create a web searcher which:
1. Crowls entire site, but keeps only selected pages for searching. The dermination if page should be indexed would be based on content (XPath expression). 2. Pick additional fields from pages with XPath expressions. 3. The fields from 2. would be used for sorting and filtering search results. Some of them would be numerical. They should be displayed in search results in separate columns. 4. Page rank do not need to be influenced by links. Just content search would be enough. 5. XPaths would be configurable per web site. Is it possible to customize Nutch to do this? Or I should rather create a custom solution with Lucene? Thanks for help. Marcin Okraszewski ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
