[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900352#comment-13900352 ] Markus Jelsma commented on NUTCH-1726: -- lufeng, it seems one of your unit tests fails, is something wrong with the test or is the my fix just not correct? :) > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900432#comment-13900432 ] lufeng commented on NUTCH-1726: --- Hi Markus. But I didn't find any error using your newest patch. {code:xml} test: [echo] Testing plugin: headings [junit] Running org.apache.nutch.parse.headings.TestHeadingsParseFilter [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.142 sec BUILD SUCCESSFUL Total time: 3 seconds {code} * maybe you can truncate log headers if it's size is larger than the value of maxlength option. so headings.truncate option can be removed. > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901363#comment-13901363 ] Markus Jelsma commented on NUTCH-1726: -- Hi lufeng! I don't understand, i have a clean Apache Nutch headings plugin, the same test fails for my patch and your patch. {code} Testcase: testIt took 1.489 sec Testcase: testMultiValueMetatags took 0.185 sec FAILED One value of metatag with multiple values is missing: Test header h2 with span junit.framework.AssertionFailedError: One value of metatag with multiple values is missing: Test header h2 with span at org.apache.nutch.parse.headings.TestHeadingsParseFilter.testMultiValueMetatags(TestHeadingsParseFilter.java:97) {code} I added truncate because perhaps some users may want to ignore long headers instead of truncating them. If i get a header containing 2kb of text, i think i would like to skip it, not truncate. Markus > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910355#comment-13910355 ] lufeng commented on NUTCH-1726: --- Hi Markus It seems that HeadingsFilter does not find nested nodes in my testing code. but I can not restore your testing result when I use following process to testing our patch {code:bash} > svn checkout https://svn.apache.org/repos/asf/nutch/trunk nutch-svn2 > cd nutch-svn2 > patch -p0 < NUTCH-1726-trunk.patch > ant > cd src/plugin/headings/ > ant test {code} everything seems ok. yes, you are right, maybe someone want to ignore long headers. But do we need to set headings.maxlength option to -1 to disable this check, maybe someone want to disable this feature. Feng > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1726) HeadingsFilter does not find nested nodes
[ https://issues.apache.org/jira/browse/NUTCH-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969601#comment-13969601 ] lufeng commented on NUTCH-1726: --- Hi all, Can someone free to check this patch? thanks. > HeadingsFilter does not find nested nodes > - > > Key: NUTCH-1726 > URL: https://issues.apache.org/jira/browse/NUTCH-1726 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.7 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.9 > > Attachments: NUTCH-1726-trunk-v2.patch, NUTCH-1726-trunk.patch, > NUTCH-1726-trunk.patch > > > Filter won't find: > {code} > apache nutch > {code} > The getNodeValue() tries to read data from children but should traverse nodes > instead. -- This message was sent by Atlassian JIRA (v6.2#6252)