[ https://issues.apache.org/jira/browse/NUTCH-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Xue updated NUTCH-1385: ---------------------------- Description: When listing multiple scoring filters in certain properties (listed below) in "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in front of the value content. E.g.: This is fine: <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> Either of these will generate an exception: <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter </value> Affects these properties in "nutch-site.xml": * indexingfilter.order * urlnormalizer.order * urlfilter.order * htmlparsefilter.order * scoring.filter.order Solution: replaced {{order.split("\\\s+")}} to {{order.trim().split("\\\s+")}}. Patch provided. was: When listing multiple scoring filters in certain properties (listed below) in "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in front of the value content. E.g.: This is fine: <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> Either of these will generate an exception: <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter </value> Affects these properties in "nutch-site.xml": * indexingfilter.order * urlnormalizer.order * urlfilter.order * htmlparsefilter.order * scoring.filter.order Solution: replaced 'order.split("\\s+")' to 'order.trim().split("\\s+")'. Patch provided. > More robust plug-in order properties in "nutch-site.xml" > -------------------------------------------------------- > > Key: NUTCH-1385 > URL: https://issues.apache.org/jira/browse/NUTCH-1385 > Project: Nutch > Issue Type: Improvement > Components: indexer, parser > Affects Versions: 1.5 > Reporter: Andy Xue > Priority: Minor > Labels: filter > Fix For: 1.6 > > Attachments: nutch-1385.txt > > > When listing multiple scoring filters in certain properties (listed below) in > "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in > front of the value content. > E.g.: > This is fine: > <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> > Either of these will generate an exception: > <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value> > <value> > org.apache.nutch.scoring.opic.OPICScoringFilter > myFilter > </value> > Affects these properties in "nutch-site.xml": > * indexingfilter.order > * urlnormalizer.order > * urlfilter.order > * htmlparsefilter.order > * scoring.filter.order > Solution: replaced {{order.split("\\\s+")}} to > {{order.trim().split("\\\s+")}}. Patch provided. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira