[ 
https://issues.apache.org/jira/browse/NUTCH-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2389.
-----------------------------------------
    Resolution: Fixed

Thank you [~kaidul]

> Precise data parsing using Jsoup CSS selectors
> ----------------------------------------------
>
>                 Key: NUTCH-2389
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2389
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 2.3
>            Reporter: Kaidul Islam
>            Assignee: Kaidul Islam
>             Fix For: 2.4
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> As far as I know, currently Nutch 1.x and 2.x has no features to 
> extract/parse exact contents for specific websites. I've developed a plugin 
> {{parse-jsoup}} using Jsoup for my current project to extract precise content 
> for site specific crawling using detailed XML configuration(field name, 
> CSS-selector, attribute, extraction rules, data-type, default-value etc).
> Please let me know if this feature seems relevant and currently not present 
> in Nutch. I have also plan to export it into Nutch 1.x.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to