nutch-dev
Thread
Date
Earlier messages
Later messages
Messages by Date
2009/03/02
[jira] Updated: (NUTCH-700) Neko1.9.11 goes into a loop
Sami Siren (JIRA)
2009/03/01
How to make parse-xml plugin (NUTCH-185) compatible with the latest trunk ?
Gopikrishnan Kookkal
2009/03/01
[jira] Commented: (NUTCH-705) parse-rtf plugin
Dmitry Lihachev (JIRA)
2009/03/01
[jira] Created: (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.
Aaron Binns (JIRA)
2009/03/01
Re: Release 1.0?
Andrzej Bialecki
2009/02/28
[jira] Commented: (NUTCH-419) unavailable robots.txt kills fetch
Andrzej Bialecki (JIRA)
2009/02/28
Re: Release 1.0?
Techie
2009/02/28
Re: Release 1.0?
dealmaker
2009/02/28
[jira] Updated: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Otis Gospodnetic (JIRA)
2009/02/28
[jira] Updated: (NUTCH-419) unavailable robots.txt kills fetch
Doug Cook (JIRA)
2009/02/28
[jira] Commented: (NUTCH-419) unavailable robots.txt kills fetch
Doug Cook (JIRA)
2009/02/28
Re: planning for nutch-1.0-rc1
Andrzej Bialecki
2009/02/28
Re: Release 1.0?
Andrzej Bialecki
2009/02/28
Re: planning for nutch-1.0-rc1
Andrzej Bialecki
2009/02/28
[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains
Andrzej Bialecki (JIRA)
2009/02/28
Re: Release 1.0?
Dennis Kubes
2009/02/28
Re: Release 1.0?
Andrzej Bialecki
2009/02/28
[jira] Updated: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Michael Chan (JIRA)
2009/02/28
[jira] Created: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Michael Chan (JIRA)
2009/02/28
[jira] Updated: (NUTCH-705) parse-rtf plugin
Andrzej Bialecki (JIRA)
2009/02/28
Re: Release 1.0?
dealmaker
2009/02/28
Re: Release 1.0?
Doğacan Güney
2009/02/28
Re: Release 1.0?
dealmaker
2009/02/28
planning for nutch-1.0-rc1
Sami Siren
2009/02/28
Re: Release 1.0?
Sami Siren
2009/02/28
Re: Release 1.0?
Sami Siren
2009/02/27
Re: Release 1.0?
dealmaker
2009/02/27
[jira] Commented: (NUTCH-699) Add an "official" solr schema for solr integration
Hudson (JIRA)
2009/02/27
[jira] Commented: (NUTCH-703) Upgrade to Hadoop 0.19.1
Hudson (JIRA)
2009/02/27
Re: Url regex normalizer
Sami Siren
2009/02/27
[jira] Commented: (NUTCH-705) parse-rtf plugin
Sami Siren (JIRA)
2009/02/27
[jira] Closed: (NUTCH-703) Upgrade to Hadoop 0.19.1
Andrzej Bialecki (JIRA)
2009/02/27
Re: Url regex normalizer
Meghna Kukreja
2009/02/27
[jira] Commented: (NUTCH-706) Url regex normalizer
Meghna Kukreja (JIRA)
2009/02/27
[jira] Created: (NUTCH-706) Url regex normalizer
Meghna Kukreja (JIRA)
2009/02/27
Re: NutchAnalysis.java STOP_WORDS not configurable?
Otis Gospodnetic
2009/02/27
Re: Url regex normalizer
Andrzej Bialecki
2009/02/27
Url regex normalizer
Meghna Kukreja
2009/02/26
Re: [jira] Commented: (NUTCH-703) Upgrade to Hadoop 0.19.1
Andrzej Bialecki
2009/02/26
[jira] Commented: (NUTCH-703) Upgrade to Hadoop 0.19.1
Sami Siren (JIRA)
2009/02/26
[jira] Assigned: (NUTCH-669) Consolidate code for Fetcher and Fetcher2
Sami Siren (JIRA)
2009/02/26
[jira] Resolved: (NUTCH-699) Add an "official" solr schema for solr integration
Sami Siren (JIRA)
2009/02/26
[jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.
Gopikrishnan (JIRA)
2009/02/26
[jira] Commented: (NUTCH-644) RTF parser doesn't compile anymore
Dmitry Lihachev (JIRA)
2009/02/26
[jira] Updated: (NUTCH-705) parse-rtf plugin
Dmitry Lihachev (JIRA)
2009/02/26
[jira] Commented: (NUTCH-705) parse-rtf plugin
Dmitry Lihachev (JIRA)
2009/02/26
[jira] Created: (NUTCH-705) parse-rtf plugin
Dmitry Lihachev (JIRA)
2009/02/26
[Nutch Wiki] Trivial Update of "FrontPage" by BartoszGadzimski
Apache Wiki
2009/02/26
[Nutch Wiki] Update of "SimpleMapReduceTutorial" by BartoszGadzimski
Apache Wiki
2009/02/26
[Nutch Wiki] Update of "DownloadingNutch" by BartoszGadzimski
Apache Wiki
2009/02/26
[jira] Closed: (NUTCH-704) ensure that more important pages are crawled first
Andrzej Bialecki (JIRA)
2009/02/25
[jira] Created: (NUTCH-704) ensure that more important pages are crawled first
kr (JIRA)
2009/02/25
[jira] Updated: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
julien nioche (JIRA)
2009/02/25
[jira] Updated: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
julien nioche (JIRA)
2009/02/25
[jira] Created: (NUTCH-703) Upgrade to Hadoop 0.19.1
Andrzej Bialecki (JIRA)
2009/02/25
[jira] Commented: (NUTCH-696) Timeout for Parser
julien nioche (JIRA)
2009/02/25
[jira] Updated: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
julien nioche (JIRA)
2009/02/25
[jira] Created: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
julien nioche (JIRA)
2009/02/24
Is there the functions of "More Like This" and "Spell Checking"?
buddha1021
2009/02/24
[jira] Commented: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
Hudson (JIRA)
2009/02/24
[jira] Commented: (NUTCH-247) robot parser to restrict.
Hudson (JIRA)
2009/02/24
[jira] Commented: (NUTCH-626) fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects
Hudson (JIRA)
2009/02/24
NutchAnalysis.java STOP_WORDS not configurable?
Bartosz Gadzimski
2009/02/24
[jira] Commented: (NUTCH-699) Add an "official" solr schema for solr integration
Andrzej Bialecki (JIRA)
2009/02/24
[jira] Updated: (NUTCH-669) Consolidate code for Fetcher and Fetcher2
Sami Siren (JIRA)
2009/02/24
[jira] Resolved: (NUTCH-701) Replace Fetcher with Fetcher2
Sami Siren (JIRA)
2009/02/24
[jira] Commented: (NUTCH-701) Replace Fetcher with Fetcher2
Andrzej Bialecki (JIRA)
2009/02/24
[jira] Updated: (NUTCH-644) RTF parser doesn't compile anymore
Dmitry Lihachev (JIRA)
2009/02/24
[jira] Commented: (NUTCH-699) Add an "official" solr schema for solr integration
Sami Siren (JIRA)
2009/02/24
[jira] Resolved: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
Sami Siren (JIRA)
2009/02/24
[jira] Updated: (NUTCH-701) Replace Fetcher with Fetcher2
Sami Siren (JIRA)
2009/02/24
[jira] Created: (NUTCH-701) replace Fetcher with Fetcher2
Sami Siren (JIRA)
2009/02/24
[jira] Resolved: (NUTCH-247) robot parser to restrict.
Sami Siren (JIRA)
2009/02/24
[jira] Updated: (NUTCH-644) RTF parser doesn't compile anymore
Dmitry Lihachev (JIRA)
2009/02/24
[jira] Commented: (NUTCH-644) RTF parser doesn't compile anymore
Dmitry Lihachev (JIRA)
2009/02/24
[jira] Resolved: (NUTCH-626) fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects
Sami Siren (JIRA)
2009/02/23
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Hudson (JIRA)
2009/02/23
[jira] Updated: (NUTCH-650) Hbase Integration
Andrew McCall (JIRA)
2009/02/23
[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains
Dennis Kubes (JIRA)
2009/02/22
[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains
Sami Siren (JIRA)
2009/02/22
[jira] Resolved: (NUTCH-694) Distributed Search Server fails
Sami Siren (JIRA)
2009/02/21
[jira] Updated: (NUTCH-650) Hbase Integration
Andrew McCall (JIRA)
2009/02/20
[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
julien nioche (JIRA)
2009/02/20
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Dr. Nadine Hochstotter (JIRA)
2009/02/20
[jira] Commented: (NUTCH-700) Neko1.9.11 goes into a loop
julien nioche (JIRA)
2009/02/20
[jira] Commented: (NUTCH-699) Add an "official" solr schema for solr integration
Dmitry Lihachev (JIRA)
2009/02/20
[jira] Commented: (NUTCH-684) Dedup support for Solr
Andrzej Bialecki (JIRA)
2009/02/20
[jira] Created: (NUTCH-700) Neko1.9.11 goes into a loop
julien nioche (JIRA)
2009/02/20
[jira] Commented: (NUTCH-699) Add an "official" solr schema for solr integration
JIRA
2009/02/20
[jira] Created: (NUTCH-699) Add an "official" solr schema for solr integration
JIRA
2009/02/20
[jira] Commented: (NUTCH-684) Dedup support for Solr
JIRA
2009/02/20
[jira] Commented: (NUTCH-684) Dedup support for Solr
JIRA
2009/02/20
[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains
Andrzej Bialecki (JIRA)
2009/02/20
[jira] Issue Comment Edited: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/20
[jira] Commented: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/20
Re: [Nutch Wiki] Update of "InstallingWeb2" by SamiSiren
Sami Siren
2009/02/20
Re: [Nutch Wiki] Update of "InstallingWeb2" by SamiSiren
Andrzej Bialecki
2009/02/20
[jira] Commented: (NUTCH-684) Dedup support for Solr
Andrzej Bialecki (JIRA)
2009/02/20
[jira] Updated: (NUTCH-247) robot parser to restrict.
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-573) Multiple Domains - Query Search
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-694) Distributed Search Server fails
Sami Siren (JIRA)
2009/02/20
[Nutch Wiki] Update of "InstallingWeb2" by SamiSiren
Apache Wiki
2009/02/20
[Nutch Wiki] Update of "RunningNutchAndSolr" by SamiSiren
Apache Wiki
2009/02/20
[jira] Updated: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
JIRA
2009/02/20
[jira] Updated: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
JIRA
2009/02/20
[jira] Created: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles
JIRA
2009/02/20
[jira] Updated: (NUTCH-694) Distributed Search Server fails
Sami Siren (JIRA)
2009/02/20
[jira] Updated: (NUTCH-697) Generate log output for solr indexer and dedup
Dmitry Lihachev (JIRA)
2009/02/20
[jira] Created: (NUTCH-697) Generate log output for solr indexer and dedup
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Andrzej Bialecki (JIRA)
2009/02/19
[jira] Updated: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Issue Comment Edited: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Commented: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Hudson (JIRA)
2009/02/19
[jira] Updated: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Commented: (NUTCH-684) Dedup support for Solr
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Commented: (NUTCH-696) Timeout for Parser
JIRA
2009/02/19
[Nutch Wiki] Update of "RunNutchInEclipse0.9" by FrankMcCown
Apache Wiki
2009/02/19
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Dr. Nadine Hochstotter (JIRA)
2009/02/19
[jira] Created: (NUTCH-696) Timeout for Parser
julien nioche (JIRA)
2009/02/19
[jira] Commented: (NUTCH-650) Hbase Integration
Andrew McCall (JIRA)
2009/02/19
[jira] Updated: (NUTCH-650) Hbase Integration
Andrew McCall (JIRA)
2009/02/19
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Sami Siren (JIRA)
2009/02/19
[jira] Commented: (NUTCH-694) Distributed Search Server fails
Dr. Nadine Hochstotter (JIRA)
2009/02/19
[jira] Commented: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Resolved: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Sami Siren (JIRA)
2009/02/19
[jira] Issue Comment Edited: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Issue Comment Edited: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Created: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin
Dmitry Lihachev (JIRA)
2009/02/19
[jira] Updated: (NUTCH-694) Distributed Search Server fails
Sami Siren (JIRA)
2009/02/19
[jira] Created: (NUTCH-694) Distributed Search Server fails
Dr. Nadine Hochstotter (JIRA)
2009/02/18
[jira] Commented: (NUTCH-691) Update jakarta poi jars to the most relevant version
Hudson (JIRA)
2009/02/18
[jira] Commented: (NUTCH-687) Add RAT
Hudson (JIRA)
2009/02/18
[jira] Commented: (NUTCH-688) Fix missing/wrong headers in source files
Hudson (JIRA)
2009/02/18
[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter
Hudson (JIRA)
2009/02/18
[jira] Updated: (NUTCH-693) Add configurable option for treating nofollow behaviour.
Andrew McCall (JIRA)
2009/02/18
[jira] Created: (NUTCH-693) Add configurable option for treating nofollow behaviour.
Andrew McCall (JIRA)
2009/02/18
Re: would someone help confirm a patch (fix incorrect encoding detection in cached.jsp)
Sami Siren
2009/02/18
would someone help confirm a patch (fix incorrect encoding detection in cached.jsp)
Justin Yao
2009/02/18
[jira] Commented: (NUTCH-689) Swf parser doesn't seem to handle relative links
Peter Sparks (JIRA)
2009/02/18
dump Fetcher?
Sami Siren
2009/02/18
[jira] Updated: (NUTCH-583) FeedParser empty links for items
Sami Siren (JIRA)
2009/02/18
[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
julien nioche (JIRA)
2009/02/18
[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
Sami Siren (JIRA)
2009/02/18
[jira] Resolved: (NUTCH-563) Include custom fields in BasicQueryFilter
Sami Siren (JIRA)
2009/02/18
[jira] Resolved: (NUTCH-691) Update jakarta poi jars to the most relevant version
Sami Siren (JIRA)
2009/02/18
[jira] Created: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
julien nioche (JIRA)
2009/02/18
[jira] Resolved: (NUTCH-688) Fix missing/wrong headers in source files
Sami Siren (JIRA)
2009/02/18
[jira] Resolved: (NUTCH-591) StringIndexOutOfBoundsException when extracting text from a Word document.
Sami Siren (JIRA)
2009/02/18
[jira] Commented: (NUTCH-689) Swf parser doesn't seem to handle relative links
Sami Siren (JIRA)
2009/02/18
[jira] Resolved: (NUTCH-687) Add RAT
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Commented: (NUTCH-591) StringIndexOutOfBoundsException when extracting text from a Word document.
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Issue Comment Edited: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Commented: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
[jira] Created: (NUTCH-691) Update jakarta poi jars to the most relevant version
Dmitry Lihachev (JIRA)
2009/02/17
Hudson build is back to normal: Nutch-trunk #728
Apache Hudson Server
2009/02/17
[jira] Updated: (NUTCH-689) Swf parser doesn't seem to handle relative links
Peter Sparks (JIRA)
2009/02/17
[jira] Updated: (NUTCH-689) Swf parser doesn't seem to handle relative links
Peter Sparks (JIRA)
2009/02/17
[jira] Commented: (NUTCH-689) Swf parser doesn't seem to handle relative links
Sami Siren (JIRA)
2009/02/17
[jira] Created: (NUTCH-690) bug in DomContentUtils.shouldThrowAwayLink?
Peter Sparks (JIRA)
2009/02/17
[jira] Updated: (NUTCH-689) Swf parser doesn't seem to handle relative links
Peter Sparks (JIRA)
2009/02/17
[jira] Created: (NUTCH-689) Swf parser doesn't seem to handle relative links
Peter Sparks (JIRA)
2009/02/17
[jira] Updated: (NUTCH-310) Review Log Levels
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-249) black- white list url filtering
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-309) Uses commons logging Code Guards
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-86) LanguageIdentifier API enhancements
Sami Siren (JIRA)
2009/02/17
[jira] Resolved: (NUTCH-582) Add missing type parameters
Sami Siren (JIRA)
2009/02/17
[jira] Commented: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException
hasan (JIRA)
2009/02/17
[jira] Commented: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Sami Siren (JIRA)
2009/02/17
[jira] Resolved: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException
Sami Siren (JIRA)
2009/02/17
[jira] Commented: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException
Chris A. Mattmann (JIRA)
2009/02/17
[jira] Created: (NUTCH-688) Fix missing/wrong headers in source files
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-687) Add RAT
Sami Siren (JIRA)
2009/02/17
[jira] Created: (NUTCH-687) Add RAT
Sami Siren (JIRA)
2009/02/17
[jira] Updated: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException
Sami Siren (JIRA)
2009/02/16
Re: Support for Sitemap Protocol and Canonical URLs
Andrzej Bialecki
2009/02/16
Build failed in Hudson: Nutch-trunk #727
Apache Hudson Server
2009/02/16
Support for Sitemap Protocol and Canonical URLs
Frank McCown
2009/02/12
Re: NTCH-635 LinkAnalysis Tool for Nutch
Pradeep Pujari
2009/02/12
NTCH-635 LinkAnalysis Tool for Nutch
Eric J. Christeson
2009/02/12
[jira] Commented: (NUTCH-668) Domain URL Filter
julien nioche (JIRA)
2009/02/12
[Nutch Wiki] Update of "IntranetRecrawl" by SAnand
Apache Wiki
2009/02/11
[jira] Commented: (NUTCH-676) MapWritable is written inefficiently and confusingly
Hudson (JIRA)
2009/02/11
[jira] Commented: (NUTCH-683) NUTCH-676 broke CrawlDbMerger
Hudson (JIRA)
Earlier messages
Later messages