[jira] [Commented] (NUTCH-1504) Pluggable url partitioner

2015-06-24 Thread Michael Joyce (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599958#comment-14599958
 ] 

Michael Joyce commented on NUTCH-1504:
--

This is great stuff [~lewismc], we definitely need to get this in there. Would 
help us out a great deal.

> Pluggable url partitioner
> -
>
> Key: NUTCH-1504
> URL: https://issues.apache.org/jira/browse/NUTCH-1504
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.6
>Reporter: Sourajit Basak
>Assignee: Lewis John McGibbney
> Fix For: 1.11
>
> Attachments: custom.partitioner.patch
>
>
> At present, the url partition logic is hard wired inside nutch core. It 
> should be pluggable like FetchSchedule customized via nutch-site.xml.
> There might be use cases where a single domain needs to be partioned on some 
> custom logic. The existing UrlPartitioner cannot handle such cases. 
> Hence the requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1504) Pluggable url partitioner

2015-06-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598726#comment-14598726
 ] 

Lewis John McGibbney commented on NUTCH-1504:
-

[~mjoyce] scope this out. This relates exactly to what we were talking about 
just yesterday.

> Pluggable url partitioner
> -
>
> Key: NUTCH-1504
> URL: https://issues.apache.org/jira/browse/NUTCH-1504
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.6
>Reporter: Sourajit Basak
>Assignee: Lewis John McGibbney
> Fix For: 1.11
>
> Attachments: custom.partitioner.patch
>
>
> At present, the url partition logic is hard wired inside nutch core. It 
> should be pluggable like FetchSchedule customized via nutch-site.xml.
> There might be use cases where a single domain needs to be partioned on some 
> custom logic. The existing UrlPartitioner cannot handle such cases. 
> Hence the requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)