GitHub user prernasatija opened a pull request: https://github.com/apache/nutch/pull/57
2.x You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/nutch 2.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/57.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #57 ---- commit f7ef04dca1b763e86502a3b23064520ded39181e Author: Ferdy Galema <fe...@apache.org> Date: 2012-08-31T12:49:26Z NUTCH-1462 Elasticsearch not indexing when type==null in NutchDocument metadata git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379431 13f79535-47bb-0310-9956-ffa450edef68 commit 1bb03c759180688f58284189abca787437935647 Author: Ferdy Galema <fe...@apache.org> Date: 2012-08-31T12:56:41Z NUTCH-1463 Elasticsearch indexer should wait and check response for last flush git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379435 13f79535-47bb-0310-9956-ffa450edef68 commit c5e2236f36a881ee7fec97aff3baf9bb32b40200 Author: Ferdy Galema <fe...@apache.org> Date: 2012-08-31T13:02:32Z NUTCH-1448 Redirected urls should be handled more cleanly (more like an outlink url) git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379438 13f79535-47bb-0310-9956-ffa450edef68 commit 33de245d3211d2be19559870c5a821381e18e9c0 Author: Ferdy Galema <fe...@apache.org> Date: 2012-08-31T15:57:18Z NUTCH-1431 Introduce link 'distance' and add configurable max distance in the generator git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379488 13f79535-47bb-0310-9956-ffa450edef68 commit c1b68c35ee02d1588786d5767f3feaa71b5393e1 Author: Ferdy Galema <fe...@apache.org> Date: 2012-09-07T08:17:58Z NUTCH-1459 Remove dead code (phase2) from InjectorJob git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1381931 13f79535-47bb-0310-9956-ffa450edef68 commit e878515c26e1bceaed2555a3cac2402322f27046 Author: Ferdy Galema <fe...@apache.org> Date: 2012-09-07T14:19:47Z NUTCH-1456 Updater not setting batchId in markers correctly. (Alexander Kingson via ferdy) git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1382037 13f79535-47bb-0310-9956-ffa450edef68 commit 32b825c58bcb1647bec548cb1ea17ee4ae522399 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-15T16:16:48Z NUTCH-1162 Write JUnit tests for parse-js git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1385103 13f79535-47bb-0310-9956-ffa450edef68 commit 4369dac176a228d0c9ef729dca89bcff0e097211 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-15T23:06:34Z NUTCH-1470 Ensure test files are included for runtime testing git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1385199 13f79535-47bb-0310-9956-ffa450edef68 commit ecb86f4de0209c73e5b00fa0df8d4c6f58c592bf Author: Ferdy Galema <fe...@apache.org> Date: 2012-09-17T09:24:33Z NUTCH-1468 Redirects that are external links not adhering to db.ignore.external.links git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1386526 13f79535-47bb-0310-9956-ffa450edef68 commit 068636631cc73786b150e1ec2cd0be38919890e7 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-18T14:07:57Z NUTCH-1162 test file git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387173 13f79535-47bb-0310-9956-ffa450edef68 commit 19e694e609776a388ce1409a3272a2a15b101222 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-18T14:13:26Z add keyspace reference to NullPointerException on inject before git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387175 13f79535-47bb-0310-9956-ffa450edef68 commit 590ad02aea95c1dcb9c6ad25de1e38a815c7fa82 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-18T20:30:25Z NUTCH-1432 property storage.schema does not work anymore, should be storage.schema.webpage and storage.schema.host git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387347 13f79535-47bb-0310-9956-ffa450edef68 commit fceecfabb9c47952f0ec2b3fcd2a6241dbedb465 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-09-18T20:52:08Z NUTCH-1415 release packages to contain top level folder apache-nutch-x.x git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387356 13f79535-47bb-0310-9956-ffa450edef68 commit 2da30f3d398a53da6fcc85f143e8b2d0b1c75837 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-09-21T14:37:07Z revert gora-cassandra to v0.2, prepare for 2.2 development git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1388529 13f79535-47bb-0310-9956-ffa450edef68 commit bc7ef2e9c62606c5f134d5e1ad8ea001d90dbd36 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-10-10T21:05:19Z NUTCH-706 Url regex normalizer: pattern for session id removal not to match "newsId" git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396795 13f79535-47bb-0310-9956-ffa450edef68 commit 2e31b117aa7e25193bcdeabce4088f71c91a7029 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-10-10T21:15:55Z NUTCH-1344 BasicURLNormalizer to normalize https same as http git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396800 13f79535-47bb-0310-9956-ffa450edef68 commit 8b35d734a5112af93f571aab218e190a225990dd Author: Sebastian Nagel <sna...@apache.org> Date: 2012-10-10T21:58:06Z NUTCH-706 (applied correct patch) git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396822 13f79535-47bb-0310-9956-ffa450edef68 commit f9d0e7685d7f43cc8f1bbbd37d73fe2d9ddc4461 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-10-10T23:02:57Z NUTCH-874 Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora (part 1) git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396850 13f79535-47bb-0310-9956-ffa450edef68 commit 33e7ae5a7ed524939e91f887de7c9821deb8a866 Author: Julien Nioche <jnio...@apache.org> Date: 2012-10-20T08:49:53Z NUTCH-1087 crawl script git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1400390 13f79535-47bb-0310-9956-ffa450edef68 commit 39893c6e5681e6936572f5d9983ab1decd085bf5 Author: Julien Nioche <jnio...@apache.org> Date: 2012-10-20T09:14:40Z NUTCH-1433 Upgrade to Tika 1.2 git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1400397 13f79535-47bb-0310-9956-ffa450edef68 commit 244ebf6682c3ea5969a2f36ab72e0fa2fceead31 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-10-23T20:47:16Z NUTCH-1344 BasicURLNormalizer to normalize https same as http - forgot to add committer git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1401458 13f79535-47bb-0310-9956-ffa450edef68 commit 0cffa912513dcdd6526ae4189f7207f23c903b49 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-10-23T20:52:21Z NUTCH-1421 RegexURLNormalizer to only skip rules with invalid patterns git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1401460 13f79535-47bb-0310-9956-ffa450edef68 commit a722e43d2c5a6225d46b2178174def4918a6b4d4 Author: Markus Jelsma <mar...@apache.org> Date: 2012-11-06T09:17:38Z NUTCH-1491 Strip UTF-8 non-character codepoints in title git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1406077 13f79535-47bb-0310-9956-ffa450edef68 commit c7342c74b52a0fc2ee6c070299f997f673584013 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-07T18:47:54Z NUTCH-1493 Error adding field 'contentLength'='' during solrindex using index-more git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1406749 13f79535-47bb-0310-9956-ffa450edef68 commit e9b46e9088e48c45a4086b983117ebaf3e202e30 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-09T16:35:50Z * NUTCH-1488 bin/nutch to run junit from any directory (snagel via lewismc) git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1407531 13f79535-47bb-0310-9956-ffa450edef68 commit f35d6ab520701be0fd345be5b577eba73ecee9e4 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-12T12:53:27Z NUTCH-1496 ParserJob logs skipped urls with level info git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408271 13f79535-47bb-0310-9956-ffa450edef68 commit 37c31a62c488ef0d9b248f1be8e930db29ba38ed Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-12T13:56:30Z NUTCH-1451 Upgrade automaton jar to 1.11-8 git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408289 13f79535-47bb-0310-9956-ffa450edef68 commit 0d350bc0f6e9468b7560de443230425550099550 Author: Sebastian Nagel <sna...@apache.org> Date: 2012-11-12T21:20:55Z NUTCH-1484 TableUtil unreverseURL fails on file:// URLs git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408465 13f79535-47bb-0310-9956-ffa450edef68 commit 1873f6eb3e8c2c5d6b5a55dff1304397c66dcbe9 Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-22T14:45:07Z NUTCH-1370 Expose exact number of urls injected @runtime git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1412566 13f79535-47bb-0310-9956-ffa450edef68 commit 3a1effa22216236e8989aed39a4b7bc3cb0b1f9c Author: Lewis John McGibbney <lewi...@apache.org> Date: 2012-11-22T14:51:28Z NUTCH-1370 Expose exact number of urls injected @runtime git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1412570 13f79535-47bb-0310-9956-ffa450edef68 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---