[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2019-10-01 Thread Sebastian Nagel (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941824#comment-16941824
 ] 

Sebastian Nagel commented on NUTCH-1186:


Disabling normalization can be done by setting: urlnormalizer.scope.partition = 
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer, and I think this was 
why the 
[PassURLNormalizer|https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.html]
 has been created for. Needs either to be fixed (see Markus' patch) or good 
documentation in nutch-default.xml. Could also define PassURLNormalizer as the 
default for urlnormalizer.scope.partition.

> FreeGenerator always normalizes
> ---
>
> Key: NUTCH-1186
> URL: https://issues.apache.org/jira/browse/NUTCH-1186
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.3
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch
>
>
> The FreeGenerator does not honor the -normalize option, it always normalizes 
> all URL's in the input directory. The -filter option is respected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-10 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091300#comment-15091300
 ] 

Lewis John McGibbney commented on NUTCH-1186:
-

Hi [~markus17] I have scoped the patch and tested it. I need to admit Markus 
that I do not use FreeGenerator tool  much OK. So Some additional review would 
help greatly. 

> FreeGenerator always normalizes
> ---
>
> Key: NUTCH-1186
> URL: https://issues.apache.org/jira/browse/NUTCH-1186
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.3
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch
>
>
> The FreeGenerator does not honor the -normalize option, it always normalizes 
> all URL's in the input directory. The -filter option is respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2016-01-05 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083138#comment-15083138
 ] 

Lewis John McGibbney commented on NUTCH-1186:
-

Will scope and test [~markus17]


> FreeGenerator always normalizes
> ---
>
> Key: NUTCH-1186
> URL: https://issues.apache.org/jira/browse/NUTCH-1186
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.3
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch
>
>
> The FreeGenerator does not honor the -normalize option, it always normalizes 
> all URL's in the input directory. The -filter option is respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2013-02-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587374#comment-13587374
 ] 

Lewis John McGibbney commented on NUTCH-1186:
-

I do not (and have not) ever used FreeGenerator Markus. Is there a 
justification as to why was normalization configuration compliance not 
implemented so far?

 FreeGenerator always normalizes
 ---

 Key: NUTCH-1186
 URL: https://issues.apache.org/jira/browse/NUTCH-1186
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.3
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1186-1.7-1.patch


 The FreeGenerator does not honor the -normalize option, it always normalizes 
 all URL's in the input directory. The -filter option is respected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2013-02-26 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587401#comment-13587401
 ] 

Markus Jelsma commented on NUTCH-1186:
--

I think there is little justification. Normalization must be configurable if 
there is no need to do it. This patch does fixes the generator issue, not the 
freegenerator issue, i'll look into that.

 FreeGenerator always normalizes
 ---

 Key: NUTCH-1186
 URL: https://issues.apache.org/jira/browse/NUTCH-1186
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.3
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1186-1.7-1.patch


 The FreeGenerator does not honor the -normalize option, it always normalizes 
 all URL's in the input directory. The -filter option is respected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2013-02-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587439#comment-13587439
 ] 

Lewis John McGibbney commented on NUTCH-1186:
-

OK Markus. I looked at the patch (not tested) and looks good to me.

 FreeGenerator always normalizes
 ---

 Key: NUTCH-1186
 URL: https://issues.apache.org/jira/browse/NUTCH-1186
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.3
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.7

 Attachments: NUTCH-1186-1.7-1.patch


 The FreeGenerator does not honor the -normalize option, it always normalizes 
 all URL's in the input directory. The -filter option is respected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes

2011-11-09 Thread Markus Jelsma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147069#comment-13147069
 ] 

Markus Jelsma commented on NUTCH-1186:
--

This actually not the FreeGenerator but the URLPartitioner class doing the 
Partioning scope normalizing. I'm not sure what would be good behaviour. The 
common generator is also affected and uses the partitioner when turning fetch 
lists into segments. Without scope, this means ALL selected URL's are at least 
normalized once, twice when the normalizing is actually in use.

Thoughts?

 FreeGenerator always normalizes
 ---

 Key: NUTCH-1186
 URL: https://issues.apache.org/jira/browse/NUTCH-1186
 Project: Nutch
  Issue Type: Bug
  Components: generator
Affects Versions: 1.3
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5


 The FreeGenerator does not honor the -normalize option, it always normalizes 
 all URL's in the input directory. The -filter option is respected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira