[Nutch-general] Nutch + RDF for scholarly archives

Erik Hatcher Wed, 29 Jun 2005 13:46:52 -0700

Is anyone here using Nutch for crawling digital scholarly archives?If so, are you also harvesting and indexing additional metadata?

My group (http://www.patacriticism.org) is considering using Nutch tocrawl a specific set of sites and index the HTML as full-text andalso retrieve any associated RDF data (specified with a hyperlink ina <meta> tag perhaps, like this page: http://www.rossettiarchive.org/docs/1-1847.s244.raw.html). The RDF most likely could be simplyindexed as additional fields, but perhaps it would also be added toan RDF engine (such as Kowari) and perhaps additionally queried inthe search interface in conjunction with full-text searching.

The Ontology and Creative Commons plugins are great starting places,for sure. I'm wondering what others have done along these lines.


Thanks,
    Erik



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Nutch + RDF for scholarly archives

Reply via email to