Sorry keep pressing
But I dont quite understanding how the metadata is passed from the parse to the
index if in my
public ParseResult filter...
Do this
Parse parse = parseResult.get(content.getUrl());
metadata = parse.getData().getParseMeta();
metadata.add("filter_html_data", docTrans);
Then return
return parseResult;
Is the data passed by reference into parseResult? because when I try and
retrieve it in
public NutchDocument filter...
by doing
String html_filter_data = parse.getData().getMeta("html_filter_data");
LOG.warn(html_filter_data);
if (html_filter_data != null){
LOG.warn("________________________Adding filter
data_______________________");
doc.add("html_filter_data", html_filter_data);
}
I Never reach the add because the variable html_filter_data is empty
any ideas
Thanks for you help
On 24 November 2009 at 12:27 "[email protected]"
<[email protected]> wrote:
> I thought I did but I thought before I did a bin/nutch index (or solrindex) it
> would be stored somewhere it does seems to be getting to the doc.add bit which
> makes me think the variable is empty
> {code}
> public void addIndexBackendOptions(Configuration conf) {
> LOG.warn("+_+_You called me _+_+");
> LuceneWriter.addFieldOptions("html_filter_data", STORE.YES,
> INDEX.UNTOKENIZED, conf);
> }
>
> public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
> CrawlDatum datum, Inlinks inlinks) throws IndexingException {
> LOG.warn("________________________FILTER_______________________");
> String html_filter_data = parse.getData().getMeta("html_filter_data");
> if (html_filter_data != null){
> LOG.warn("________________________Adding filter
> data_______________________");
> doc.add("html_filter_data", html_filter_data);
> }
> return doc;
> }
> {code}
> On 24 November 2009 at 12:05 Andrzej Bialecki <[email protected]> wrote:
>
> > [email protected] wrote:
> > > Hi All,
> > >
> > > I think I am just about finished my plugin (nutch 1.0) which adds extra
> > > metadata to during parsing the problem I am having is it doesn't seem to
> > > be adding the data to the system (via luke or readseg). I looked at in
> > > the wiki but it seems to be for 0.9 and the syntax looks different.
> > >
> > > {code}
> > > public ParseResult filter(Content content, ParseResult parseResult,
> > > HTMLMetaTags metaTags, DocumentFragment doc) {
> > > Metadata metadata = new Metadata();
> > > // parse the content
> > > DocumentFragment root;
> > > String docTrans;
> > > try {
> > > byte[] contentInOctets = content.getContent();
> > > String input = new String(contentInOctets);
> > > XSLTSimpleTransform DocTransform = new XSLTSimpleTransform();
> > > docTrans = DocTransform.doTransform(input);
> > > Parse parse = parseResult.get(content.getUrl());
> > > metadata = parse.getData().getParseMeta();
> > > metadata.add("filter_html_data", docTrans);
> > >
> > > } catch (Exception e) {
> > > e.printStackTrace(LogUtil.getWarnStream(LOG));
> > > }
> > >
> > > return parseResult;
> > > }
> > > {code}
> >
> > Did you declare that you are adding this field in the
> > IndexingFilter.addIndexBackendOptions(..) ? See how other indexing
> > plugins do this.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki <><
> > ___. ___ ___ ___ _ _ __________________________________
> > [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> > ___|||__|| \| || | Embedded Unix, System Integration
> > http://www.sigram.com Contact: info at sigram dot com
> >