[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143038#comment-13143038 ]
Julien Nioche commented on NUTCH-1098: -------------------------------------- @Radim Sounds like "I am not going to" is your favourite phrase. You certainly prefer GIT to SVN, but the fact is that Nutch uses the latter and contributions are expected to be generated with 'svn diff'. By going on with what the rest of the community do (vs imposing your ways to others) you will make it easier for people to discuss, review and commit your contributions. http://wiki.apache.org/nutch/HowToContribute >> Creating a patch http://www.apache.org/foundation/how-it-works.html >> Philosophy Thanks > better url-normalizer basic > --------------------------- > > Key: NUTCH-1098 > URL: https://issues.apache.org/jira/browse/NUTCH-1098 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.3 > Environment: Any > Reporter: Radim Kolar > Assignee: Markus Jelsma > Labels: encoding, url > Fix For: 1.5 > > Attachments: patch-with-utf8-encoding.diff > > Original Estimate: 4h > Remaining Estimate: 4h > > Basic URL normalizer lacks 2 important features > Encode space in URL into %20 to unbreak httpclient and possibly others who do > not expect space inside URL > Ability to decode %33 encoding in URL. This is important for avoiding > duplicates -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira