[ 
https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641047#action_12641047
 ] 

Otis Gospodnetic commented on NUTCH-655:
----------------------------------------

I think we need a generic way for keeping meta data about hosts ... I think I 
started that somewhere in JIRA a while back.... aha: 
https://issues.apache.org/jira/browse/NUTCH-628

I'm mentioning this simply because we can probably use the same or very similar 
mechanism for keeping meta data about hosts and individual URLs.


> Injecting Crawl metadata
> ------------------------
>
>                 Key: NUTCH-655
>                 URL: https://issues.apache.org/jira/browse/NUTCH-655
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>            Reporter: julien nioche
>            Priority: Minor
>         Attachments: Injector.patch
>
>
> the patch attached allows to inject metadata into the crawlDB. The input file 
> has to contain fields separated by tabs, with the URL being on the first 
> column. The metadata names and values are separated by '='. A input line 
> might look like this:
> http://www.myurl.com  \t  categ=value1 \t categ2=value2
> This functionality can be useful to store external knowledge and index it 
> with a custom plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to