[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890593#comment-13890593 ]
Anton commented on NUTCH-1478: ------------------------------- Hi [~lewismc] snipped of source code with NPE below. I added comment to mark line 95 with NPE {code:java} // add the fields from contentmd if (contentFieldnames != null) { for (String metatag : contentFieldnames) { // String[] value = parse.getData().getContentMeta().getValues(metatag); ByteBuffer bvalues = page.getFromMetadata(new Utf8(metatag)); String value = new String(bvalues.array()); //line 95 with NPE if (value != null) doc.add("meta_" + metatag, value); } } {code} Hi [~talat] Do you mean that I need to define another field name in schema.xml? I have such field definition now: <field name="metatag.description" type="string" stored="true" indexed="true"/> It have the same name as in wiki http://wiki.apache.org/nutch/IndexMetatags. and another type of field ('string'), not the same as in wiki ('text') > Parse-metatags and index-metadata plugin for Nutch 2.x series > -------------------------------------------------------------- > > Key: NUTCH-1478 > URL: https://issues.apache.org/jira/browse/NUTCH-1478 > Project: Nutch > Issue Type: Improvement > Components: parser > Affects Versions: 2.1 > Reporter: kiran > Fix For: 2.3 > > Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, > NUTCH-1478v4.patch, Nutch1478.patch, Nutch1478.zip, > metadata_parseChecker_sites.png > > > I have ported parse-metatags and index-metadata plugin to Nutch 2.x series. > This will take multiple values of same tag and index in Solr as i patched > before (https://issues.apache.org/jira/browse/NUTCH-1467). > The usage is same as described here > (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is > no need to give 'metatag' keyword before metatag names. For example my > configuration looks like this > (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml) > > This is only the first version and does not include the junit test. I will > update the new version soon. > This will parse the tags and index the tags in Solr. Make sure you create the > fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr. > Please let me know if you have any suggestions > This is supported by DLA (Digital Library and Archives) of Virginia Tech. -- This message was sent by Atlassian JIRA (v6.1.5#6160)