Send an email to dev-unsubscr...@nutch.apache.org and follow the instructions 
from there...


On 6/24/10 9:36 PM, "Vimal Varghese" <vimal.vargh...@tcs.com> wrote:




Vimal Varghese

-----Claus Schröter (JIRA) wrote: -----
To: dev@nutch.apache.org
From: Claus Schröter (JIRA) <j...@apache.org>
Date: 06/25/2010 01:59AM
Subject: [jira] Commented: (NUTCH-655) Injecting Crawl metadata

    [ 
https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882313#action_12882313
 ]

Claus Schröter commented on NUTCH-655:
--------------------------------------

Hi Julien, thanks for this patch...
is there any way to inherit the metadata or parts of it to suburls while 
crawling?
I fiddled around with a scoring filter but with no success.

Cheers
Claus

> Injecting Crawl metadata
> ------------------------
>
>                 Key: NUTCH-655
>                 URL: https://issues.apache.org/jira/browse/NUTCH-655
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: Injector.patch, NUTCH-655.v2
>
>
> the patch attached allows to inject metadata into the crawlDB. The input file 
> has to contain fields separated by tabs, with the URL being on the first 
> column. The metadata names and values are separated by '='. A input line 
> might look like this:
> http://www.myurl.com <http://www.myurl.com/>   \t  categ=value1 \t 
> categ2=value2
> This functionality can be useful to store external knowledge and index it 
> with a custom plugin


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to