nutch-dev
Thread
Date
Earlier messages
Later messages
Messages by Date
2009/10/21
Re: datanode.BlockAlreadyExistsException
Jesse Hires
2009/10/21
Re: datanode.BlockAlreadyExistsException
Andrzej Bialecki
2009/10/20
datanode.BlockAlreadyExistsException
Jesse Hires
2009/10/20
[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index
David Stuart (JIRA)
2009/10/20
[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index
Andrzej Bialecki (JIRA)
2009/10/20
Re: solr index question
david.stu...@progressivealliance.co.uk
2009/10/19
RE: Niocchi - java asynchronous crawl library released
Fuad Efendi
2009/10/19
Re: Niocchi - java asynchronous crawl library released
Andrzej Bialecki
2009/10/19
RE: Niocchi - java asynchronous crawl library released
Fuad Efendi
2009/10/19
Re: Niocchi - java asynchronous crawl library released
Andrzej Bialecki
2009/10/19
[jira] Updated: (NUTCH-760) Allow field mapping from nutch to solr index
David Stuart (JIRA)
2009/10/18
RE: Niocchi - java asynchronous crawl library released
Fuad Efendi
2009/10/18
RE: Niocchi - java asynchronous crawl library released
Fuad Efendi
2009/10/18
Re: Niocchi - java asynchronous crawl library released
Andrzej Bialecki
2009/10/18
Re: Renaming Nutch
Nutch Newbie
2009/10/18
Niocchi - java asynchronous crawl library released
Lukáš Vlček
2009/10/18
Renaming Nutch
fredericoagent
2009/10/16
bug in AbstractFetchSchedule.java
reinhard schwab
2009/10/15
[jira] Commented: (NUTCH-760) Allow field mapping from nutch to solr index
Andrzej Bialecki (JIRA)
2009/10/15
Re: solr index question
david.stu...@progressivealliance.co.uk
2009/10/15
[jira] Updated: (NUTCH-760) Allow field mapping from nutch to solr index
David Stuart (JIRA)
2009/10/15
Where shall I modify if I wanna change scoring rule in intranet crawl?
Chuan
2009/10/15
[jira] Updated: (NUTCH-760) Allow field mapping from nutch to solr index
David Stuart (JIRA)
2009/10/15
[jira] Created: (NUTCH-760) Allow field mapping from nutch to solr index
David Stuart (JIRA)
2009/10/15
[jira] Commented: (NUTCH-251) Administration GUI
Marko Bauhardt (JIRA)
2009/10/15
Malaga-fi - Finnish plugin for Nutch
Hannu Väisänen
2009/10/15
[jira] Commented: (NUTCH-251) Administration GUI
Andrzej Bialecki (JIRA)
2009/10/15
[jira] Commented: (NUTCH-251) Administration GUI
Marko Bauhardt (JIRA)
2009/10/14
Recrawl Strategy with Nutch!
tittutomen
2009/10/13
[jira] Created: (NUTCH-759) Removal of deprecated APIs
Stephen Norman (JIRA)
2009/10/13
Re: solr index question
david.stu...@progressivealliance.co.uk
2009/10/13
Re: solr index question
Andrzej Bialecki
2009/10/13
solr index question
david.stu...@progressivealliance.co.uk
2009/10/12
starting crawl from the previous point
jkimathi
2009/10/12
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
Andrea Spinelli (JIRA)
2009/10/12
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
cwi...@yahoo.com (JIRA)
2009/10/09
[jira] Commented: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-758) Set subversion eol-style to "native"
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-731) Redirection of robots.txt in RobotRulesParser
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-757) RequestUtils getBooleanParameter() always returns false
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-756) CrawlDatum.set() does not reset Metadata if it is null
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-754) Use GenericOptionsParser instead of FileSystem.parseArgs()
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-679) Fetcher2 implementing Tool
Hudson (JIRA)
2009/10/09
[jira] Commented: (NUTCH-758) Set subversion eol-style to "native"
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-758) Set subversion eol-style to "native"
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-679) Fetcher2 implementing Tool
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-679) Fetcher2 implementing Tool
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-335) Pdf summary corrupt issue
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-335) Pdf summary corrupt issue
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-756) CrawlDatum.set() does not reset Metadata if it is null
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-756) CrawlDatum.set() does not reset Metadata if it is null
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-251) Administration GUI
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-748) DiskChecker Could not find
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-748) DiskChecker Could not find
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-754) Use GenericOptionsParser instead of FileSystem.parseArgs()
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-754) Use GenericOptionsParser instead of FileSystem.parseArgs()
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-757) RequestUtils getBooleanParameter() always returns false
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-757) RequestUtils getBooleanParameter() always returns false
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-731) Redirection of robots.txt in RobotRulesParser
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-731) Redirection of robots.txt in RobotRulesParser
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Commented: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Andrzej Bialecki (JIRA)
2009/10/09
[jira] Closed: (NUTCH-707) Generation of multiple segments in multiple runs returns only 1 segment
Andrzej Bialecki (JIRA)
2009/10/08
[jira] Commented: (NUTCH-677) Segment merge filering based on segment content
Marcin Okraszewski (JIRA)
2009/10/08
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
Marcin Okraszewski (JIRA)
2009/10/08
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
Marcin Okraszewski (JIRA)
2009/10/07
Running crawls with different configurations
Fabrice Estiévenart
2009/10/06
Authenticity of URLs from DMOZ
Gaurang Patel
2009/10/06
Re: Nutch Topical / Focused Crawl
MyD
2009/10/05
Number of urls in the crawl database.
Gaurang Patel
2009/10/05
generate, fetch- nutch commands
Gaurang Patel
2009/10/05
Re: whole web crawl
Gaurang Patel
2009/10/04
Re: whole web crawl
kevin chen
2009/10/04
whole web crawl
Gaurang Patel
2009/10/03
Re: crawling local file system
Niall Pemberton
2009/10/03
crawling local file system
jkimathi
2009/10/02
Recommended plugin example - test fails
Fabrice Estiévenart
2009/09/30
[jira] Updated: (NUTCH-758) Set subversion eol-style to "native"
Niall Pemberton (JIRA)
2009/09/30
[jira] Created: (NUTCH-758) Set subversion eol-style to "native"
Niall Pemberton (JIRA)
2009/09/30
[jira] Updated: (NUTCH-757) RequestUtils getBooleanParameter() always returns false
Niall Pemberton (JIRA)
2009/09/30
[jira] Created: (NUTCH-757) RequestUtils getBooleanParameter() always returns false
Niall Pemberton (JIRA)
2009/09/29
[jira] Updated: (NUTCH-756) CrawlDatum.set() does not reset Metadata if it is null
Julien Nioche (JIRA)
2009/09/29
[jira] Updated: (NUTCH-756) CrawlDatum.set() does not resets Metadata if it is null
Julien Nioche (JIRA)
2009/09/29
[jira] Created: (NUTCH-756) CrawlDatum.set() does not resets Metadata if it is null
Julien Nioche (JIRA)
2009/09/29
how to study the nutch
feng zhou
2009/09/23
[jira] Commented: (NUTCH-755) DomainURLFilter crashes on malformed URL
Mike Baranczak (JIRA)
2009/09/22
Where should I do this?
Paul Tomblin
2009/09/22
Nutch is not crawling all outlinks
Pravin Karne
2009/09/19
[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
zhangxihua (JIRA)
2009/09/16
[jira] Created: (NUTCH-755) DomainURLFilter crashes on malformed URL
Mike Baranczak (JIRA)
2009/09/16
[jira] Updated: (NUTCH-754) Use GenericOptionsParser instead of FileSystem.parseArgs()
Julien Nioche (JIRA)
2009/09/16
[jira] Created: (NUTCH-754) Use GenericOptionsParser instead of FileSystem.parseArgs()
Julien Nioche (JIRA)
2009/09/16
Re: Upgrade to hadoop 0.20?
Julien Nioche
2009/09/16
[jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
Julien Nioche (JIRA)
2009/09/11
[Nutch Wiki] Update of "Support" by KelvinTan
Apache Wiki
2009/09/10
[jira] Closed: (NUTCH-752) how to index data from databse(ect oracle)
JIRA
2009/09/09
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Julien Nioche (JIRA)
2009/09/09
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Ken Krugler (JIRA)
2009/09/08
[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
Hudson (JIRA)
2009/09/08
[jira] Closed: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
JIRA
2009/09/07
[jira] Updated: (NUTCH-753) Prevent new Fetcher to retrieve the robots twice
Julien Nioche (JIRA)
2009/09/07
[jira] Created: (NUTCH-753) Prevent new Fetcher to retrieve the robots twice
Julien Nioche (JIRA)
2009/09/07
[jira] Created: (NUTCH-752) how to index data from databse(ect oracle)
zhengfang (JIRA)
2009/09/06
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Andrzej Bialecki (JIRA)
2009/09/04
[jira] Created: (NUTCH-751) Upgrade version of HttpClient
Julien Nioche (JIRA)
2009/09/04
[Nutch Wiki] Update of "Support" by Justin Gilbreath
Apache Wiki
2009/09/02
Customise scoring
Max S
2009/09/01
subclauses
Marko Bauhardt
2009/09/01
or queries
Marko Bauhardt
2009/08/31
[jira] Issue Comment Edited: (NUTCH-251) Administration GUI
Marko Bauhardt (JIRA)
2009/08/31
graphical user interface v0.1 for nutch
Marko Bauhardt
2009/08/29
Re: Title inside body
Alexey Torochkov
2009/08/29
[jira] Updated: (NUTCH-750) HtmlParser plugin - page title extraction
Alexey Torochkov (JIRA)
2009/08/29
[jira] Created: (NUTCH-750) HtmlParser plugin - page title extraction
Alexey Torochkov (JIRA)
2009/08/28
RE: Title inside body
Fuad Efendi
2009/08/28
Re: Title inside body
Alexey Torochkov
2009/08/28
RE: Title inside body
Fuad Efendi
2009/08/28
Re: Title inside body
Magnús Skúlason
2009/08/28
RE: Title inside body
Fuad Efendi
2009/08/28
RE: Title inside body
Fuad Efendi
2009/08/28
Re: Title inside body
Magnús Skúlason
2009/08/28
Re: Title inside body
Alexey Torochkov
2009/08/28
RE: Title inside body
Fuad Efendi
2009/08/28
[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
Julien Nioche (JIRA)
2009/08/28
[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum
Julien Nioche (JIRA)
2009/08/28
Title inside body
Alexey Torochkov
2009/08/28
[jira] Closed: (NUTCH-696) Timeout for Parser
Julien Nioche (JIRA)
2009/08/28
[jira] Commented: (NUTCH-696) Timeout for Parser
Julien Nioche (JIRA)
2009/08/25
Re: Nutch Performance Improvements
Ken Krugler
2009/08/25
RE: Nutch Performance Improvements
Fuad Efendi
2009/08/25
Nutch Performance Improvements
Fuad Efendi
2009/08/25
How to use Hbase with Nutch
ilayaraja
2009/08/24
[jira] Closed: (NUTCH-721) Fetcher2 Slow
JIRA
2009/08/21
[jira] Closed: (NUTCH-749) Fetching the url from crawldb
JIRA
2009/08/21
[jira] Created: (NUTCH-749) Fetching the url from crawldb
salima abdulsalam (JIRA)
2009/08/19
Indegree link analysis algorithm.
Artem Barger
2009/08/18
SegmentReader: Why Multiple CrawlDatum section for a record..
Ankit Dangi
2009/08/17
[jira] Created: (NUTCH-748) DiskChecker Could not find
mawanqiang (JIRA)
2009/08/17
RE-Crawling
hussam hamdan
2009/08/17
SegmentReader: How to write content to separate multiple files..
Ankit Dangi
2009/08/16
[jira] Commented: (NUTCH-650) Hbase Integration
JIRA
2009/08/13
My mistake
Paul Tomblin
2009/08/13
[jira] Updated: (NUTCH-679) Fetcher2 implementing Tool
Julien Nioche (JIRA)
2009/08/11
Re: fetch failed error 500
宫照
2009/08/11
Re: Why isn't this working?
Paul Tomblin
2009/08/11
Re: fetch failed error 500
Alex McLintock
2009/08/11
Re: Why isn't this working?
Alex McLintock
2009/08/10
fetch failed error 500
宫照
2009/08/10
Why isn't this working?
Paul Tomblin
2009/08/10
Found a second problem in the same code
Paul Tomblin
2009/08/10
Is this a bug?
Paul Tomblin
2009/08/10
Re: How to see System.out.println() values Featcher.java
ranjeet98
2009/08/10
[jira] Updated: (NUTCH-721) Fetcher2 Slow
Julien Nioche (JIRA)
2009/08/10
[jira] Commented: (NUTCH-721) Fetcher2 Slow
JIRA
2009/08/09
[jira] Commented: (NUTCH-251) Administration GUI
Marko Bauhardt (JIRA)
2009/08/09
nutch gui on github
Marko Bauhardt
2009/08/09
[jira] Commented: (NUTCH-721) Fetcher2 Slow
Andrzej Bialecki (JIRA)
2009/08/09
[jira] Commented: (NUTCH-721) Fetcher2 Slow
Julien Nioche (JIRA)
2009/08/08
[Nutch Wiki] Update of "PublicServers" by ReinierBattenberg
Apache Wiki
2009/08/08
Re: codeformatting
Marko Bauhardt
2009/08/08
Re: codeformatting
Andrzej Bialecki
2009/08/08
codeformatting
Marko Bauhardt
2009/08/08
Re: How to see System.out.println() values Featcher.java
Marko Bauhardt
2009/08/07
How to see System.out.println() values Featcher.java
ranjeet98
2009/08/07
Re: How to enter data in to the Crawldb
Marko Bauhardt
2009/08/06
How to enter data in to the Crawldb
Sailaja Dhiviti
2009/08/06
[jira] Commented: (NUTCH-747) inject&Index metadatas and inherit these metadatas to all matching suburls
Marko Bauhardt (JIRA)
2009/08/06
[jira] Updated: (NUTCH-747) inject&Index metadatas and inherit these metadatas to all matching suburls
Marko Bauhardt (JIRA)
2009/08/06
[jira] Created: (NUTCH-747) inject&Index metadatas and inherit these metadatas to all matching suburls
Marko Bauhardt (JIRA)
2009/08/06
Re: Can I add a url to be crawled without putting it in a file and feeding it to "Inject"?
Marko Bauhardt
2009/08/06
serializing and deserializing lucene query
ilayaraja
2009/08/06
Re: About NUTCH-650 (hbase integration)
Andrzej Bialecki
2009/08/06
About NUTCH-650 (hbase integration)
Doğacan Güney
2009/08/05
Can I add a url to be crawled without putting it in a file and feeding it to "Inject"?
Paul Tomblin
2009/08/04
[jira] Updated: (NUTCH-738) Close SegmentUpdater when FetchedSegments is closed
Otis Gospodnetic (JIRA)
2009/08/04
[jira] Updated: (NUTCH-746) NutchBeanConstructor does not close NutchBean upon contextDestroyed, causing resource leak in the container.
Otis Gospodnetic (JIRA)
2009/08/04
Re: OSGi progress
Kirby Bohling
2009/08/04
Re: OSGi progress
Andrzej Bialecki
2009/08/03
Re: MeetUp topic list posted
Ken Krugler
2009/08/03
Re: MeetUp topic list posted
Ken Krugler
2009/08/03
Re: MeetUp topic list posted
Andrzej Bialecki
2009/08/03
MeetUp topic list posted
Ken Krugler
2009/08/03
[Nutch Wiki] Trivial Update of "ApacheConUs2009MeetUp" by KenKrugler
Apache Wiki
2009/08/03
[Nutch Wiki] Trivial Update of "ApacheConUs2009MeetUp" by KenKrugler
Apache Wiki
2009/08/03
[Nutch Wiki] Update of "ApacheConUs2009MeetUp" by KenKrugler
Apache Wiki
2009/08/03
[Nutch Wiki] Trivial Update of "FrontPage" by KenKrugler
Apache Wiki
2009/08/03
[Nutch Wiki] Trivial Update of "FrontPage" by KenKrugler
Apache Wiki
2009/08/03
[Nutch Wiki] Trivial Update of "FrontPage" by KenKrugler
Apache Wiki
2009/08/03
Re: Web Crawler MeetUp info on wiki
Andrzej Bialecki
2009/08/02
OSGi progress
Kirby Bohling
2009/08/02
Web Crawler MeetUp info on wiki
Ken Krugler
2009/07/31
Meetup at ApacheCon US 2009
Ken Krugler
2009/07/30
[Nutch Wiki] Update of "PublicServers" by stoicleo
Apache Wiki
2009/07/30
Re: New Extension Points?
Marko Bauhardt
2009/07/29
[jira] Commented: (NUTCH-650) Hbase Integration
Andrzej Bialecki (JIRA)
2009/07/29
[jira] Commented: (NUTCH-650) Hbase Integration
JIRA
2009/07/29
Re: Nutch dev. plans
Andrzej Bialecki
2009/07/29
[Nutch Wiki] Update of "07CommandLineOptions" by AlexMc
Apache Wiki
Earlier messages
Later messages