Hello,

i'm implementing something similiar at the moment. i'm feeding nutch with a url-list with an annotated ID. this ID must go into the lucene index, so that i can do a 1:many relation between a database and the crawled pages.

i've added the custom data into the meta-data field in the datum. see InjectMapper:

// add myID to the crawlDatum as metaData
MapWritable meta = new MapWritable();
meta.put(new Text("myID"), new Text(myID));
datum.setMetaData(meta);

now the ID is saved in the CrawlDatum-Object. On the indexing-side i've written a new plugin index-id, but it's simply a modified index- basic ;) the essence is:

MapWritable meta = datum.getMetaData();

String id = ((Text)meta.get(new Text("myID"))).toString();
                
if (id != "") {
Field myid = new Field("myid", id, Field.Store.YES, Field.Index.UN_TOKENIZED);
        mederiid.setBoost(5.0f);
        doc.add(myid);
        LOG.info("The following ID was added to the index: " + myid);
}

So, that's where i stand at the moment. Now i have to build a custom query interface, so that i can search in my MySQL-database and enrich the results with my crawled sites.

maybe we can join forces. feel free to contact me :) greetings,
        Sebastian Steinmetz

Reply via email to