nutch-dev
Thread
Date
Later messages
Messages by Date
2008/04/09
[jira] Commented: (NUTCH-500) Add hadoop masters configuration file into conf folder
Hudson (JIRA)
2008/04/09
[jira] Created: (NUTCH-627) Minimize host address lookup
Otis Gospodnetic (JIRA)
2008/04/09
Hudson build is back to normal: Nutch-trunk #416
Apache Hudson Server
2008/04/09
Re: what is the difference between nutch and some other opensource search engines
ogjunk-nutch
2008/04/08
found a bug in plugin/protocol-http
cybercouf
2008/04/07
Hudson build is back to normal: Nutch-trunk #414
Apache Hudson Server
2008/04/07
Build failed in Hudson: Nutch-trunk #413
Apache Hudson Server
2008/04/06
Re: Is there any LSI implementation?
ogjunk-nutch
2008/04/06
[jira] Updated: (NUTCH-626) fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects
Remco Verhoef (JIRA)
2008/04/06
[jira] Created: (NUTCH-626) fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects
Remco Verhoef (JIRA)
2008/04/05
Hudson build is back to normal: Nutch-trunk #412
Apache Hudson Server
2008/04/04
Build failed in Hudson: Nutch-trunk #411
Apache Hudson Server
2008/04/03
Hudson build is back to normal: Nutch-trunk #410
Apache Hudson Server
2008/04/03
Build failed in Hudson: Nutch-trunk #409
Apache Hudson Server
2008/04/02
Build failed in Hudson: Nutch-trunk #408
Apache Hudson Server
2008/04/01
Is there any LSI implementation?
Edward J. Yoon
2008/04/01
[jira] Updated: (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte)
Vinci (JIRA)
2008/04/01
Re: [jira] Created: (NUTCH-624) Better parsed text
Vinci
2008/04/01
[jira] Updated: (NUTCH-624) Better parsed text by default parser
Vinci (JIRA)
2008/03/31
[jira] Commented: (NUTCH-296) Image Search
Gordon Mohr (JIRA)
2008/03/31
[jira] Commented: (NUTCH-500) Add hadoop masters configuration file into conf folder
Dennis Kubes (JIRA)
2008/03/31
[jira] Updated: (NUTCH-500) Add hadoop masters configuration file into conf folder
Dennis Kubes (JIRA)
2008/03/31
Hudson build is back to normal: Nutch-trunk #406
Apache Hudson Server
2008/03/30
[jira] Assigned: (NUTCH-16) boost documents matching a url pattern
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-48) "Did you mean" query enhancement/refignment feature request
Dennis Kubes (JIRA)
2008/03/30
[jira] Closed: (NUTCH-75) Patch for WebDBReader to get more detailed information about WebDBs
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-213) checkstyle
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-295) More description for fetcher.threads.fetch property
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-291) OpenSearchServlet should return "date" as well as "lastModified"
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-249) black- white list url filtering
Dennis Kubes (JIRA)
2008/03/30
[jira] Resolved: (NUTCH-447) Dmoz Structure Parser Tool
Dennis Kubes (JIRA)
2008/03/30
[jira] Closed: (NUTCH-447) Dmoz Structure Parser Tool
Dennis Kubes (JIRA)
2008/03/30
[jira] Assigned: (NUTCH-500) Add hadoop masters configuration file into conf folder
Dennis Kubes (JIRA)
2008/03/30
[jira] Closed: (NUTCH-555) StackOverflowError in DomContentUtils
Dennis Kubes (JIRA)
2008/03/30
[jira] Resolved: (NUTCH-555) StackOverflowError in DomContentUtils
Dennis Kubes (JIRA)
2008/03/30
[jira] Updated: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Dennis Kubes (JIRA)
2008/03/30
Re: Why is Nutch not involved in Google Summer of Code - 2008?
ogjunk-nutch
2008/03/30
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Dennis Kubes
2008/03/30
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Andrzej Bialecki
2008/03/30
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Susam Pal
2008/03/30
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Dennis Kubes
2008/03/30
Re: [jira] Created: (NUTCH-624) Better parsed text
ogjunk-nutch
2008/03/30
[jira] Created: (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte)
Vinci (JIRA)
2008/03/30
[jira] Created: (NUTCH-624) Better parsed text
Vinci (JIRA)
2008/03/29
Re: Why is Nutch not involved in Google Summer of Code - 2008?
ogjunk-nutch
2008/03/29
Build failed in Hudson: Nutch-trunk #405
Apache Hudson Server
2008/03/29
siteinfo.xml
Chen, Tao
2008/03/29
[jira] Updated: (NUTCH-623) Change plugin source directory "languageidentifier" to "language-identifier"
Ignacio J. Ortega (JIRA)
2008/03/29
[jira] Created: (NUTCH-623) Change name of plugin source directory from "languageidentifier" to "language-identifier"
Ignacio J. Ortega (JIRA)
2008/03/29
Build failed in Hudson: Nutch-trunk #404
Apache Hudson Server
2008/03/27
Glitches debuggging on eclipse with languageidentifier plugin
Nacho (Derecho.com)
2008/03/26
[jira] Created: (NUTCH-622) Support for application/x-suggestions+json
Bobby Hubbard (JIRA)
2008/03/25
Hudson build is back to normal: Nutch-trunk #401
Apache Hudson Server
2008/03/25
Multiple readseg requests.
Nadav Hashimshony
2008/03/25
Build failed in Hudson: Nutch-trunk #400
Apache Hudson Server
2008/03/24
Re: Why is Nutch not involved in Google Summer of Code - 2008?
sishen
2008/03/24
Re: Why is Nutch not involved in Google Summer of Code - 2008?
All day coders
2008/03/24
Build failed in Hudson: Nutch-trunk #399
Apache Hudson Server
2008/03/24
Re: Why is Nutch not involved in Google Summer of Code - 2008?
sishen
2008/03/23
Re: Why is Nutch not involved in Google Summer of Code - 2008?
All day coders
2008/03/22
Build failed in Hudson: Nutch-trunk #398
Apache Hudson Server
2008/03/22
Why is Nutch not involved in Google Summer of Code - 2008?
Susam Pal
2008/03/21
Hudson build is back to normal: Nutch-trunk #397
Apache Hudson Server
2008/03/21
Build failed in Hudson: Nutch-trunk #396
Apache Hudson Server
2008/03/19
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Hudson (JIRA)
2008/03/19
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Hudson (JIRA)
2008/03/19
[jira] Closed: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
2008/03/19
[jira] Closed: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
2008/03/18
Hudson build is back to normal: Nutch-trunk #394
Apache Hudson Server
2008/03/18
[jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/18
Compilation errors at revision 638548
All day coders
2008/03/18
[jira] Commented: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Andrzej Bialecki (JIRA)
2008/03/18
[jira] Created: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
2008/03/18
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
2008/03/18
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
2008/03/18
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
2008/03/18
Re: Current OPIC implementation
Andrzej Bialecki
2008/03/18
Current OPIC implementation
Siddhartha Reddy
2008/03/17
[jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Hudson (JIRA)
2008/03/17
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Hudson (JIRA)
2008/03/17
[jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Hudson (JIRA)
2008/03/17
[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Hudson (JIRA)
2008/03/17
Build failed in Hudson: Nutch-trunk #393
Apache Hudson Server
2008/03/17
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/17
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/17
[jira] Closed: (NUTCH-610) Can't Update or modify an index while web gui is running
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Closed: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Commented: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Closed: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Closed: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Andrzej Bialecki (JIRA)
2008/03/17
Re: Retire the original Fetcher before the release?
Andrzej Bialecki
2008/03/17
Re: Retire the original Fetcher before the release?
Dennis Kubes
2008/03/17
Re: Retire the original Fetcher before the release?
Andrzej Bialecki
2008/03/17
Re: Retire the original Fetcher before the release?
Dennis Kubes
2008/03/17
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
2008/03/17
Retire the original Fetcher before the release?
Andrzej Bialecki
2008/03/17
[jira] Closed: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Closed: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
2008/03/17
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
2008/03/16
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/16
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb
Emmanuel Joke (JIRA)
2008/03/16
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
2008/03/16
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
2008/03/16
(nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess
Vinci
2008/03/16
Cached page - can it be changed?
Vinci
2008/03/16
Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer!
Vinci
2008/03/16
Write back to the segment?
Vinci
2008/03/16
Chnage the Analyzer by plugin - how to dealing with the query?
Vinci
2008/03/16
[jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/16
[jira] Created: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
2008/03/15
How can I change the analyzer of nutch query by plugin?
Vinci
2008/03/15
zh.ngp
Vinci
2008/03/15
[jira] Created: (NUTCH-619) Another Language Identifier Plugin using Unicode code point range
Vinci (JIRA)
2008/03/15
Thread behaviour in Nutch Crawl
naveen.goswami
2008/03/15
FW: Problem in running Nutch where proxy authentication is required.
naveen.goswami
2008/03/14
[jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Hudson (JIRA)
2008/03/14
[jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages
Hudson (JIRA)
2008/03/14
[jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Hudson (JIRA)
2008/03/14
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Hudson (JIRA)
2008/03/14
[jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Hudson (JIRA)
2008/03/14
[jira] Closed: (NUTCH-189) Injection infinite loop
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-189) Injection infinite loop
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-70) duplicate pages - virtual hosts in db.
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-70) duplicate pages - virtual hosts in db.
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs
Andrzej Bialecki (JIRA)
2008/03/14
Re: Problem in running Nutch where proxy authentication is required.
Susam Pal
2008/03/14
Problem in running Nutch where proxy authentication is required.
naveen.goswami
2008/03/14
Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki
2008/03/14
Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Jesiel Trevisan
2008/03/14
[jira] Closed: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-601) Recrawling on existing crawl directory using force option
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-610) Can't Update or modify an index while web gui is running
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Closed: (NUTCH-613) Empty Summaries and Cached Pages
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Assigned: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
2008/03/14
[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
2008/03/12
Problem in running Nutch where proxy authentication is required.
naveen.goswami
2008/03/12
I have some problem with nutch result
dong chen
2008/03/11
[jira] Commented: (NUTCH-296) Image Search
Otis Gospodnetic (JIRA)
2008/03/11
Re: Confine nutch to one NIC?
ogjunk-nutch
2008/03/09
Confine nutch to one NIC?
Euan Clark
2008/03/06
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
2008/03/06
[jira] Work started: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
2008/03/06
[jira] Assigned: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
2008/03/06
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Andrzej Bialecki (JIRA)
2008/03/05
[jira] Created: (NUTCH-618) Tika error "Media type alias already exists"
Andrzej Bialecki (JIRA)
2008/03/05
Re: Nightly builds unavailable
Sami Siren
2008/03/05
Nightly builds unavailable
Frederic Wenzel
2008/03/04
[jira] Closed: (NUTCH-617) Cached Text Only
Andrzej Bialecki (JIRA)
2008/03/04
[jira] Created: (NUTCH-617) Cached Text Only
Siddharth Jha (JIRA)
2008/03/03
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
Siddharth Jha (JIRA)
2008/02/29
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Susam Pal (JIRA)
2008/02/28
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Erol (JIRA)
2008/02/28
[jira] Updated: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
2008/02/27
Re: nutch latest build - inject operation failing
esmithers
2008/02/27
Hudson build is back to normal: Nutch-trunk #376
Apache Hudson Server
2008/02/27
Re: Failing Hudson Builds
Nigel Daley
2008/02/27
Re: Failing Hudson Builds
Dennis Kubes
2008/02/27
Re: Failing Hudson Builds
Andrzej Bialecki
2008/02/27
Re: Failing Hudson Builds
Nigel Daley
2008/02/26
Build failed in Hudson: Nutch-trunk #375
Apache Hudson Server
2008/02/26
Build failed in Hudson: Nutch-trunk #374
Apache Hudson Server
2008/02/26
Build failed in Hudson: Nutch-trunk #373
Apache Hudson Server
2008/02/26
Build failed in Hudson: Nutch-trunk #372
Apache Hudson Server
2008/02/26
Re: Build failed in Hudson: Nutch-trunk #371
Nigel Daley
2008/02/26
Build failed in Hudson: Nutch-trunk #371
Apache Hudson Server
2008/02/26
[jira] Updated: (NUTCH-614) Order Inlinks by OPIC score of parent page
Dennis Kubes (JIRA)
2008/02/26
[jira] Created: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Emmanuel Joke (JIRA)
2008/02/26
[jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Emmanuel Joke (JIRA)
2008/02/26
[jira] Created: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
2008/02/26
[jira] Updated: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
2008/02/26
Filter fetching by mime type
Nynodata Development Team
2008/02/25
[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.
Hudson (JIRA)
2008/02/25
Build failed in Hudson: Nutch-trunk #370
Apache Hudson Server
2008/02/25
[jira] Commented: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
2008/02/25
[jira] Assigned: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
2008/02/25
[jira] Work started: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
2008/02/25
[jira] Resolved: (NUTCH-567) Proper (?) handling of URIs in TagSoup.
JIRA
2008/02/24
Build failed in Hudson: Nutch-trunk #369
Apache Hudson Server
2008/02/24
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Emmanuel Joke (JIRA)
Later messages