Author: lewismc Date: Mon May 21 18:25:09 2012 New Revision: 1341137 URL: http://svn.apache.org/viewvc?rev=1341137&view=rev Log: commit to address NUTCH-1364 and update to CHANGES.txt
Modified: nutch/branches/nutchgora/CHANGES.txt nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java Modified: nutch/branches/nutchgora/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/CHANGES.txt?rev=1341137&r1=1341136&r2=1341137&view=diff ============================================================================== --- nutch/branches/nutchgora/CHANGES.txt (original) +++ nutch/branches/nutchgora/CHANGES.txt Mon May 21 18:25:09 2012 @@ -2,6 +2,8 @@ Nutch Change Log Release nutchgora - Current Development +* NUTCH-1364 Add a counter for malformed urls (Jason Trost via lewismc) + * NUTCH-1361 Fix mishandling of malformed urls in generator job (Jason Trost via lewismc) * NUTCH-1360 Support the storing of IP address connected to when web crawling (lewismc) Modified: nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java URL: http://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java?rev=1341137&r1=1341136&r2=1341137&view=diff ============================================================================== --- nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java (original) +++ nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorReducer.java Mon May 21 18:25:09 2012 @@ -77,6 +77,7 @@ extends GoraReducer<SelectorEntry, WebPa try { context.write(TableUtil.reverseUrl(key.url), page); } catch (MalformedURLException e) { + context.getCounter("Generator", "MALFORMED_URL").increment(1); continue; } context.getCounter("Generator", "GENERATE_MARK").increment(1);