[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941824#comment-16941824 ] Sebastian Nagel commented on NUTCH-1186: Disabling normalization can be done by setting: urlnormalizer.scope.partition = org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer, and I think this was why the [PassURLNormalizer|https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.html] has been created for. Needs either to be fixed (see Markus' patch) or good documentation in nutch-default.xml. Could also define PassURLNormalizer as the default for urlnormalizer.scope.partition. > FreeGenerator always normalizes > --- > > Key: NUTCH-1186 > URL: https://issues.apache.org/jira/browse/NUTCH-1186 > Project: Nutch > Issue Type: Bug > Components: generator >Affects Versions: 1.3 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch > > > The FreeGenerator does not honor the -normalize option, it always normalizes > all URL's in the input directory. The -filter option is respected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091300#comment-15091300 ] Lewis John McGibbney commented on NUTCH-1186: - Hi [~markus17] I have scoped the patch and tested it. I need to admit Markus that I do not use FreeGenerator tool much OK. So Some additional review would help greatly. > FreeGenerator always normalizes > --- > > Key: NUTCH-1186 > URL: https://issues.apache.org/jira/browse/NUTCH-1186 > Project: Nutch > Issue Type: Bug > Components: generator >Affects Versions: 1.3 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch > > > The FreeGenerator does not honor the -normalize option, it always normalizes > all URL's in the input directory. The -filter option is respected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083138#comment-15083138 ] Lewis John McGibbney commented on NUTCH-1186: - Will scope and test [~markus17] > FreeGenerator always normalizes > --- > > Key: NUTCH-1186 > URL: https://issues.apache.org/jira/browse/NUTCH-1186 > Project: Nutch > Issue Type: Bug > Components: generator >Affects Versions: 1.3 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch > > > The FreeGenerator does not honor the -normalize option, it always normalizes > all URL's in the input directory. The -filter option is respected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587374#comment-13587374 ] Lewis John McGibbney commented on NUTCH-1186: - I do not (and have not) ever used FreeGenerator Markus. Is there a justification as to why was normalization configuration compliance not implemented so far? FreeGenerator always normalizes --- Key: NUTCH-1186 URL: https://issues.apache.org/jira/browse/NUTCH-1186 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.7 Attachments: NUTCH-1186-1.7-1.patch The FreeGenerator does not honor the -normalize option, it always normalizes all URL's in the input directory. The -filter option is respected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587401#comment-13587401 ] Markus Jelsma commented on NUTCH-1186: -- I think there is little justification. Normalization must be configurable if there is no need to do it. This patch does fixes the generator issue, not the freegenerator issue, i'll look into that. FreeGenerator always normalizes --- Key: NUTCH-1186 URL: https://issues.apache.org/jira/browse/NUTCH-1186 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.7 Attachments: NUTCH-1186-1.7-1.patch The FreeGenerator does not honor the -normalize option, it always normalizes all URL's in the input directory. The -filter option is respected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587439#comment-13587439 ] Lewis John McGibbney commented on NUTCH-1186: - OK Markus. I looked at the patch (not tested) and looks good to me. FreeGenerator always normalizes --- Key: NUTCH-1186 URL: https://issues.apache.org/jira/browse/NUTCH-1186 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.7 Attachments: NUTCH-1186-1.7-1.patch The FreeGenerator does not honor the -normalize option, it always normalizes all URL's in the input directory. The -filter option is respected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
[ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147069#comment-13147069 ] Markus Jelsma commented on NUTCH-1186: -- This actually not the FreeGenerator but the URLPartitioner class doing the Partioning scope normalizing. I'm not sure what would be good behaviour. The common generator is also affected and uses the partitioner when turning fetch lists into segments. Without scope, this means ALL selected URL's are at least normalized once, twice when the normalizing is actually in use. Thoughts? FreeGenerator always normalizes --- Key: NUTCH-1186 URL: https://issues.apache.org/jira/browse/NUTCH-1186 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 The FreeGenerator does not honor the -normalize option, it always normalizes all URL's in the input directory. The -filter option is respected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira