[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2014-04-05 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-490:


Component/s: (was: fetcher)
 parser

 Extension point with filters for Neko HTML parser (with patch)
 --

 Key: NUTCH-490
 URL: https://issues.apache.org/jira/browse/NUTCH-490
 Project: Nutch
  Issue Type: Improvement
  Components: parser
Affects Versions: 0.9.0
 Environment: Any
Reporter: Marcin Okraszewski
Priority: Minor
 Fix For: 1.9

 Attachments: HtmlParser.java.diff, NekoFilters_for_1.0.patch, 
 nutch-extensionpoins_plugin.xml.diff


 In my project I need to set filters for Neko HTML parser. So instead of 
 adding it hard coded, I made an extension point to define filters for Neko. I 
 was fallowing the code for HtmlParser filters. In fact the method to get 
 filters I think could be generalized to handle both cases. But I didn't want 
 to make too big mess.
 The attached patch is for Nutch 0.9. This part of code wasn't changed in 
 trunk, so should be applicable easily.
 BTW. I wonder if it wouldn't be best to have HTML DOM Parsing defined by 
 extension point itself. Now there are options for Neko and TagSoap. But if 
 someone would like to use something else or set give different settings for 
 the parser, he would need to modify HtmlParser class, instead of replacing a 
 plugin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2013-05-22 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-490:
--

Fix Version/s: 1.8

 Extension point with filters for Neko HTML parser (with patch)
 --

 Key: NUTCH-490
 URL: https://issues.apache.org/jira/browse/NUTCH-490
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: 0.9.0
 Environment: Any
Reporter: Marcin Okraszewski
Priority: Minor
 Fix For: 2.3, 1.8

 Attachments: HtmlParser.java.diff, NekoFilters_for_1.0.patch, 
 nutch-extensionpoins_plugin.xml.diff


 In my project I need to set filters for Neko HTML parser. So instead of 
 adding it hard coded, I made an extension point to define filters for Neko. I 
 was fallowing the code for HtmlParser filters. In fact the method to get 
 filters I think could be generalized to handle both cases. But I didn't want 
 to make too big mess.
 The attached patch is for Nutch 0.9. This part of code wasn't changed in 
 trunk, so should be applicable easily.
 BTW. I wonder if it wouldn't be best to have HTML DOM Parsing defined by 
 extension point itself. Now there are options for Neko and TagSoap. But if 
 someone would like to use something else or set give different settings for 
 the parser, he would need to modify HtmlParser class, instead of replacing a 
 plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-490:
---

   Patch Info: Patch Available
Fix Version/s: 2.2
   1.7

 Extension point with filters for Neko HTML parser (with patch)
 --

 Key: NUTCH-490
 URL: https://issues.apache.org/jira/browse/NUTCH-490
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: 0.9.0
 Environment: Any
Reporter: Marcin Okraszewski
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: HtmlParser.java.diff, NekoFilters_for_1.0.patch, 
 nutch-extensionpoins_plugin.xml.diff


 In my project I need to set filters for Neko HTML parser. So instead of 
 adding it hard coded, I made an extension point to define filters for Neko. I 
 was fallowing the code for HtmlParser filters. In fact the method to get 
 filters I think could be generalized to handle both cases. But I didn't want 
 to make too big mess.
 The attached patch is for Nutch 0.9. This part of code wasn't changed in 
 trunk, so should be applicable easily.
 BTW. I wonder if it wouldn't be best to have HTML DOM Parsing defined by 
 extension point itself. Now there are options for Neko and TagSoap. But if 
 someone would like to use something else or set give different settings for 
 the parser, he would need to modify HtmlParser class, instead of replacing a 
 plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira