[ 
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922282#comment-13922282
 ] 

Vangelis Karvounis commented on NUTCH-1478:
-------------------------------------------

Hi! I have a few questions on how to run this patch:
1. In nutch-site.xml:
 <property>
  <name>plugin.includes</name>
 
<value>protocol-http|urlfilter-domain|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
 <description> </description>
</property>

2. In nutch-site.xml can you tell us how to use those 4 new properties?
<property>
  <name>index.parse.md</name>
  <value>description,keywords</value>
  <description></description>
</property>

<property>
  <name>index.content.md</name>
  <value></value>
  <description> </description>
</property>

<property>
  <name>index.db.md</name>
  <value></value>
  <description> </description>
</property>

<!-- parse-metatags plugin properties -->
<property>
  <name>description;keywords</name>
  <value>*</value>
  <description>  </description>
</property>

3. I read somewhere that we need to input
<field name="metatag.description" type="string" stored="true" indexed="true"/>
in schema.xml both in solr and nutch. Is that correct?

4. I want to see my chosen metatags at MySQL, for I find it more useful for my 
queries. Any ideas how to implement this?

5. I want to crawl a page for <meta og:video> or <meta twitter: image> . Any 
ideas????  


> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>
>                 Key: NUTCH-1478
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1478
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, 
> NUTCH-1478v4.patch, NUTCH-1478v5.patch, Nutch1478.patch, Nutch1478.zip, 
> metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  
> This will take multiple values of same tag and index in Solr as i patched 
> before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here 
> (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is 
> no need to give 'metatag' keyword before metatag names. For example my 
> configuration looks like this 
> (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)
>  
> This is only the first version and does not include the junit test. I will 
> update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the 
> fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to