nutch-dev
Thread
Date
Later messages
Messages by Thread
[jira] Created: (NUTCH-622) Support for application/x-suggestions+json
Bobby Hubbard (JIRA)
Multiple readseg requests.
Nadav Hashimshony
Build failed in Hudson: Nutch-trunk #398
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #399
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #400
Apache Hudson Server
Hudson build is back to normal: Nutch-trunk #401
Apache Hudson Server
Why is Nutch not involved in Google Summer of Code - 2008?
Susam Pal
Re: Why is Nutch not involved in Google Summer of Code - 2008?
All day coders
Re: Why is Nutch not involved in Google Summer of Code - 2008?
sishen
Re: Why is Nutch not involved in Google Summer of Code - 2008?
All day coders
Re: Why is Nutch not involved in Google Summer of Code - 2008?
sishen
Re: Why is Nutch not involved in Google Summer of Code - 2008?
ogjunk-nutch
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Dennis Kubes
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Susam Pal
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Dennis Kubes
Re: Why is Nutch not involved in Google Summer of Code - 2008?
Andrzej Bialecki
Re: Why is Nutch not involved in Google Summer of Code - 2008?
ogjunk-nutch
Build failed in Hudson: Nutch-trunk #396
Apache Hudson Server
Hudson build is back to normal: Nutch-trunk #397
Apache Hudson Server
[jira] Closed: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
Compilation errors at revision 638548
All day coders
[jira] Commented: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)
Sami Siren (JIRA)
[jira] Created: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Assigned: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Sami Siren (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Work started: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Jukka Zitting (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Grant Ingersoll (JIRA)
[jira] Resolved: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Updated: (NUTCH-621) Nutch needs to declare it's crypto usage
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage
Hudson (JIRA)
Preparations for release (Re: [jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage)
Andrzej Bialecki
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Hudson (JIRA)
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
[jira] Updated: (NUTCH-598) Remove deprecated use of ToolBase, Migration to the new implementation
Andrzej Bialecki (JIRA)
Current OPIC implementation
Siddhartha Reddy
Re: Current OPIC implementation
Andrzej Bialecki
Build failed in Hudson: Nutch-trunk #393
Apache Hudson Server
Hudson build is back to normal: Nutch-trunk #394
Apache Hudson Server
[jira] Closed: (NUTCH-610) Can't Update or modify an index while web gui is running
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-223) Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN
Hudson (JIRA)
[jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Hudson (JIRA)
[jira] Closed: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException
Andrzej Bialecki (JIRA)
Retire the original Fetcher before the release?
Andrzej Bialecki
Re: Retire the original Fetcher before the release?
Dennis Kubes
Re: Retire the original Fetcher before the release?
Andrzej Bialecki
Re: Retire the original Fetcher before the release?
Dennis Kubes
Re: Retire the original Fetcher before the release?
Andrzej Bialecki
(nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess
Vinci
Cached page - can it be changed?
Vinci
Write back to the segment?
Vinci
Chnage the Analyzer by plugin - how to dealing with the query?
Vinci
Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer!
Vinci
[jira] Created: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Mark DeSpain (JIRA)
[jira] Closed: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash
Hudson (JIRA)
How can I change the analyzer of nutch query by plugin?
Vinci
zh.ngp
Vinci
[jira] Created: (NUTCH-619) Another Language Identifier Plugin using Unicode code point range
Vinci (JIRA)
Thread behaviour in Nutch Crawl
naveen.goswami
[jira] Closed: (NUTCH-189) Injection infinite loop
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-189) Injection infinite loop
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-168) setting http.content.limit to -1 seems to break text parsing on some files
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-157) Problem during parsing msword document . It fetching properly but parsing is not working. Please show me the way how can i parse it
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Hudson (JIRA)
[jira] Closed: (NUTCH-126) Fetching via https does not work with a proxy (patch)
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-70) duplicate pages - virtual hosts in db.
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-70) duplicate pages - virtual hosts in db.
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb
Emmanuel Joke (JIRA)
[jira] Commented: (NUTCH-556) automatic adjust the CrawlDatum.fetchInterval according to the number of newly outlinks
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki (JIRA)
Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Jesiel Trevisan
Re: [jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Andrzej Bialecki
[jira] Commented: (NUTCH-575) NPE in OpenSearchServlet when summary is null
Hudson (JIRA)
[jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-601) Recrawling on existing crawl directory using force option
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-610) Can't Update or modify an index while web gui is running
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl
Hudson (JIRA)
[jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-613) Empty Summaries and Cached Pages
Hudson (JIRA)
[jira] Closed: (NUTCH-613) Empty Summaries and Cached Pages
Andrzej Bialecki (JIRA)
Problem in running Nutch where proxy authentication is required.
naveen.goswami
Problem in running Nutch where proxy authentication is required.
naveen.goswami
Re: Problem in running Nutch where proxy authentication is required.
Susam Pal
FW: Problem in running Nutch where proxy authentication is required.
naveen.goswami
I have some problem with nutch result
dong chen
[jira] Commented: (NUTCH-296) Image Search
Otis Gospodnetic (JIRA)
[jira] Commented: (NUTCH-296) Image Search
Gordon Mohr (JIRA)
Confine nutch to one NIC?
Euan Clark
Re: Confine nutch to one NIC?
ogjunk-nutch
[jira] Created: (NUTCH-618) Tika error "Media type alias already exists"
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Andrzej Bialecki (JIRA)
[jira] Assigned: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Work started: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Updated: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Work logged: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Updated: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Resolved: (NUTCH-618) Tika error "Media type alias already exists"
Chris A. Mattmann (JIRA)
[jira] Commented: (NUTCH-618) Tika error "Media type alias already exists"
Hudson (JIRA)
Nightly builds unavailable
Frederic Wenzel
Re: Nightly builds unavailable
Sami Siren
[jira] Created: (NUTCH-617) Cached Text Only
Siddharth Jha (JIRA)
[jira] Closed: (NUTCH-617) Cached Text Only
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
Siddharth Jha (JIRA)
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
cwi...@yahoo.com (JIRA)
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
Andrea Spinelli (JIRA)
[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
David Stuart (JIRA)
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Erol (JIRA)
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Susam Pal (JIRA)
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option
Hudson (JIRA)
Re: nutch latest build - inject operation failing
esmithers
Re: Failing Hudson Builds
Nigel Daley
Re: Failing Hudson Builds
Andrzej Bialecki
Re: Failing Hudson Builds
Dennis Kubes
Re: Failing Hudson Builds
Nigel Daley
[jira] Updated: (NUTCH-614) Order Inlinks by OPIC score of parent page
Dennis Kubes (JIRA)
[jira] Created: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Emmanuel Joke (JIRA)
[jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Emmanuel Joke (JIRA)
[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
[jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
[jira] Assigned: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful
Hudson (JIRA)
[jira] Created: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
[jira] Updated: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
[jira] Updated: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Emmanuel Joke (JIRA)
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
[jira] Closed: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Andrzej Bialecki (JIRA)
[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval
Hudson (JIRA)
Filter fetching by mime type
Nynodata Development Team
[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.
Hudson (JIRA)
[jira] Commented: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
[jira] Assigned: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
[jira] Work started: (NUTCH-578) URL fetched with 403 is generated over and over again
Dennis Kubes (JIRA)
[jira] Resolved: (NUTCH-567) Proper (?) handling of URIs in TagSoup.
JIRA
Build failed in Hudson: Nutch-trunk #369
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #370
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #371
Apache Hudson Server
Re: Build failed in Hudson: Nutch-trunk #371
Nigel Daley
Build failed in Hudson: Nutch-trunk #372
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #373
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #374
Apache Hudson Server
Build failed in Hudson: Nutch-trunk #375
Apache Hudson Server
Hudson build is back to normal: Nutch-trunk #376
Apache Hudson Server
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Emmanuel Joke (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Sami Siren (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Dmitry Lihachev (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Serykh Evgeniy (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Serykh Evgeniy (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Serykh Evgeniy (JIRA)
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
Chris A. Mattmann (JIRA)
Later messages