Nightly API lin kis broken

2007-04-02 Thread Lukas Vlcek
Hi, I found that nightly API link on Nutch web page is broken ( http://lucene.apache.org/nutch/nutch-nightly/docs/api/index.html). Is it because nightly building process is failing? Regards, Lukas

Re: Indexing and Re-crawling site

2006-12-12 Thread Lukas Vlcek
will be thrown while the index is being replaced? Regards, Armel -Original Message- From: Lukas Vlcek [mailto:[EMAIL PROTECTED] Sent: 04 December 2006 22:12 To: nutch-dev@lucene.apache.org Subject: Re: Indexing and Re-crawling site Hi, I will try to use my out-dated knowledge to answer

Re: Any plans to move to build Nutchusing Maven?

2006-08-16 Thread Lukas Vlcek
Hi, I have almost no experience with maven subprojects but somehow I feel this could help us with Nutch plugins. Am I correct? In maven we can always call ant goals as well and Jelly is a fun to use. With maven one of the biggest benefit would be that eclipse (or other IDE) classpath settings

Re: Any plans to move to build Nutchusing Maven?

2006-08-16 Thread Lukas Vlcek
. Lukas On 8/16/06, Nicolas Lalevée [EMAIL PROTECTED] wrote: Le Mercredi 16 Août 2006 17:18, Sami Siren a écrit: Lukas Vlcek wrote: Hi, I have almost no experience with maven subprojects but somehow I feel this could help us with Nutch plugins. Am I correct? In maven we can always call ant

Re: [Fwd: Re: 0.8 Recrawl script updated]

2006-08-07 Thread Lukas Vlcek
in index and which are not. Regards, Lukas On 8/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Matthew, In fact I didn't realize you are doing merge stuff (sorry for that) but frankly I don't know how exactly merging works and if this strategy would work in the long time perspective and whether

Re: [Fwd: Re: 0.8 Recrawl script updated]

2006-08-04 Thread Lukas Vlcek
due to the large amount of segments being kept. Thanks, Matt Lukas Vlcek wrote: Hi Matthew, I am surious about one thing. How do you know you can just drop $depth number of the most oldest segments in the end? I haven't studied nutch code regarding this topic yet but I thought that segment

Re: 0.8 release

2006-06-11 Thread Lukas Vlcek
Hi, Is there a real chance that NUTCH-273 would be fixed soon (let's say once 0.8 is relased)? Lukas On 6/10/06, Andrzej Bialecki [EMAIL PROTECTED] wrote: Sami Siren wrote: How would folks feel about releasing 0.8 now, there has been quite a lot of improvements/new features since 0.7 series

[jira] Commented: (NUTCH-273) When a page is redirected, the original url is NOT updated.

2006-05-27 Thread Lukas Vlcek (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-273?page=comments#action_12413602 ] Lukas Vlcek commented on NUTCH-273: --- May be I am wrong but handling redirects can be very complex topic and I am not sure if general solution can be easily found. Right now

[jira] Created: (NUTCH-273) When a page is redirected, the original url is NOT updated.

2006-05-20 Thread Lukas Vlcek (JIRA)
Environment: n/a Reporter: Lukas Vlcek [Excerpt from maillist, sender: Andrzej Bialecki] When a page is redirected, the original url is NOT updated - so, CrawlDB will never know that a redirect occured, it won't even know that a fetch occured... This looks like a bug. In 0.7 this was recorded

PATCH - Fixes for 0.8 tutorial

2006-05-09 Thread Lukas Vlcek
Hi, I reported some typos and incomplete information in nutch 08 tutorial some time ago. It seems that all commiters and voluntaries are busy with more important issues so I took this opportunity and now I am proud to present my *first-small-humble-patch-ever*. Please review the patch and let

Re: New tools: CrawlDbMerger, LinkDbMerger, SegmentMerger

2006-05-09 Thread Lukas Vlcek
Andrzej, My pleasure. I would choose the following location: http://wiki.apache.org/nutch/DevelopmentCommandLineOptions Let me know if you can think of anything better otherwise I'll do it. Regards, Lukas On 5/9/06, Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Andrzej

Re: A Developer's getting started doc?

2006-05-04 Thread Lukas Vlcek
. Rgrds, Thomas On 5/3/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Thanks Thomas, I gave a quick glance at Ivy. It looks interesting. But does it really bring heavy simplification over Maven if I need more advanced stuff? Does it allow jelly integration? How much it is adopted across open-source

Re: A Developer's getting started doc?

2006-05-03 Thread Lukas Vlcek
). It has the benefits of Maven, without the overhead and learning curve involved. Rgrds. Thomas On 5/2/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Thomas, I would really appreciate your .classpath and .project files for Eclipse (for Nutch-trunk). Could you send them to me? Or could you upload them

0.8 tutorial typos in Whole-web indexing?

2006-05-03 Thread Lukas Vlcek
Hi, Nutch 0.8 version tutorial (see: http://lucene.apache.org/nutch/tutorial8.html) in whole-web indexing paragraph - it says: bin/nutch index indexes crawl/linkdb crawl/segments/* Shouldn't it say: bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb segment#1_path [segment#2_path [...]] ?

Re: A Developer's getting started doc?

2006-05-02 Thread Lukas Vlcek
Thomas, I would really appreciate your .classpath and .project files for Eclipse (for Nutch-trunk). Could you send them to me? Or could you upload them somewhere? I don't think I am novice in terms of Eclipse but frankly I am to lazy configuring all these settings manually. I do use Maven all

Nutch merge problem after fetch is aborted with hung threads.

2006-01-24 Thread Lukas Vlcek
Re-posting to dev list after no response in user list. Lukas -- Forwarded message -- From: Lukas Vlcek [EMAIL PROTECTED] Date: Jan 19, 2006 8:42 AM Subject: Nutch merge problem after fetch is aborted with hung threads. To: nutch-user@lucene.apache.org Hi, I am facing

Re: Problem with latest SVN during reduce phase

2006-01-13 Thread Lukas Vlcek
is not null even though page has no content and title. Could it be FetcherOutput Object ??? P --- Lukas Vlcek [EMAIL PROTECTED] wrote: Hi, I think this issue can be more complex. If I remember my test correctly then parse object was not null. Also parse.getText() was not null

Re: mapred crawling exception - Job failed!

2006-01-06 Thread Lukas Vlcek
Huh... anybody interested in this? Normally I would be so pushy but to me it seems that Nutch dies if it meets word document which can't be parsed. This seems like a serious issue to me. Or did I overlooked something important/fundamental? Lukas On 1/6/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Hi

Re: mapred crawling exception - Job failed!

2006-01-05 Thread Lukas Vlcek
. Regards, Lukas On 1/5/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Hi Andrzej, This is what sets Fetcher to parse to true or false, right? property namefetcher.parse/name valuetrue/value descriptionIf true, fetcher will parse content./description /property I don't have my nutch-default

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
: Lukas Vlcek wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some timeout. This was fixed

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
Thanks guys! I really didn't have the latest copy... L. On 1/4/06, Byron Miller [EMAIL PROTECTED] wrote: Fixed in the copy i run as i've been able to get my 100k pages indexed without getting that error. -byron --- Andrzej Bialecki [EMAIL PROTECTED] wrote: Lukas Vlcek wrote: Hi

Re: mapred crawling exception - Job failed!

2006-01-04 Thread Lukas Vlcek
) at org.apache.nutch.crawl.Crawl.main(Crawl.java:121) I tried to turn off most of parsing pluggins but it didn't help so there is probably some general issue. Any ideas? Regards, Lukas On 1/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Thanks guys! I really didn't have the latest copy... L

mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some timeout. I am not sure whether this is a real nutch issue or just mine

Re: mapred crawling exception - Job failed!

2006-01-03 Thread Lukas Vlcek
mail-lists On 1/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Hi, I am trying to use the latest nutch-trunk version but I am facing unexpected Job failed! exception. It seems that all crawling work has been already done but some threads are hunged which results into exception after some

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

2005-12-21 Thread Lukas Vlcek
manually step by step, there is a tutorial in the wiki how to run the map rd commands step by step. Stefan Am 21.12.2005 um 06:56 schrieb Lukas Vlcek: Hi, I am trying to use nutch-0.8-dev and I have a problem with crawl run. I did checkout from SVN and prepared fresh package (ant package