[jira] [Comment Edited] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910355#comment-13910355 ] lufeng edited comment on NUTCH-1726 at 2/24/14 2:41 PM: Hi Markus It seems that HeadingsFilter does not find nested nodes in my testing code. but I can not restore your testing result when I use following process to testing our patch {code:java} > svn checkout https://svn.apache.org/repos/asf/nutch/trunk nutch-svn2 > cd nutch-svn2 > patch -p0 < NUTCH-1726-trunk.patch > ant > cd src/plugin/headings/ > ant test {code} everything seems ok. yes, you are right, maybe someone want to ignore long headers. But do we need to set headings.maxlength option to -1 to disable this check, maybe someone want to disable this feature. Feng was (Author: amuseme.lu): Hi Markus It seems that HeadingsFilter does not find nested nodes in my testing code. but I can not restore your testing result when I use following process to testing our patch {code:bash} > svn checkout https://svn.apache.org/repos/asf/nutch/trunk nutch-svn2 > cd nutch-svn2 > patch -p0 < NUTCH-1726-trunk.patch > ant > cd src/plugin/headings/ > ant test {code} everything seems ok. yes, you are right, maybe someone want to ignore long headers. But do we need to set headings.maxlength option to -1 to disable this check, maybe someone want to disable this feature. Feng > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910355#comment-13910355 ] lufeng commented on NUTCH-1726: --- Hi Markus It seems that HeadingsFilter does not find nested nodes in my testing code. but I can not restore your testing result when I use following process to testing our patch {code:bash} > svn checkout https://svn.apache.org/repos/asf/nutch/trunk nutch-svn2 > cd nutch-svn2 > patch -p0 < NUTCH-1726-trunk.patch > ant > cd src/plugin/headings/ > ant test {code} everything seems ok. yes, you are right, maybe someone want to ignore long headers. But do we need to set headings.maxlength option to -1 to disable this check, maybe someone want to disable this feature. Feng > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-710) Support for rel="canonical" attribute
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910089#comment-13910089 ] Sertac TURKEL commented on NUTCH-710: - hi [~jnioche] [~lewismc], I want to work about this issue for 2x branch. What is the last decision about the issue. > Support for rel="canonical" attribute > - > > Key: NUTCH-710 > URL: https://issues.apache.org/jira/browse/NUTCH-710 > Project: Nutch > Issue Type: New Feature >Affects Versions: 1.1 >Reporter: Frank McCown >Priority: Minor > Fix For: 2.3, 1.8 > > Attachments: canonical.patch > > > There is a the new rel="canonical" attribute which is > now being supported by Google, Yahoo, and Live: > http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html > Adding support for this attribute value will potentially reduce the number of > URLs crawled and indexed and reduce duplicate page content. -- This message was sent by Atlassian JIRA (v6.1.5#6160)