Hi,
Nutch seems to be a very powerfull tool. But I'm not sure if I could
customize it that much, to meet my requirements. I would like to
create a web searcher which:

1. Crowls entire site, but keeps only selected pages for searching.
The dermination if page should be indexed would be based on content
(XPath expression).
2. Pick additional fields from pages with XPath expressions.
3. The fields from 2. would be used for sorting and filtering search
results. Some of them would be numerical. They should be displayed in
search results in separate columns.
4. Page rank do not need to be influenced by links. Just content
search would be enough.
5. XPaths would be configurable per web site.

Is it possible to customize Nutch to do this? Or I should rather
create a custom solution with Lucene?

Thanks for help.
Marcin Okraszewski

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to