Hi,
I found that nightly API link on Nutch web page is broken (
http://lucene.apache.org/nutch/nutch-nightly/docs/api/index.html). Is it
because nightly building process is failing?
Regards,
Lukas
will be thrown while the index is
being replaced?
Regards,
Armel
-Original Message-
From: Lukas Vlcek [mailto:[EMAIL PROTECTED]
Sent: 04 December 2006 22:12
To: nutch-dev@lucene.apache.org
Subject: Re: Indexing and Re-crawling site
Hi,
I will try to use my out-dated knowledge to answer
Hi,
I have almost no experience with maven subprojects but somehow I feel
this could help us with Nutch plugins. Am I correct?
In maven we can always call ant goals as well and Jelly is a fun to
use. With maven one of the biggest benefit would be that eclipse (or
other IDE) classpath settings
.
Lukas
On 8/16/06, Nicolas Lalevée [EMAIL PROTECTED] wrote:
Le Mercredi 16 Août 2006 17:18, Sami Siren a écrit:
Lukas Vlcek wrote:
Hi,
I have almost no experience with maven subprojects but somehow I feel
this could help us with Nutch plugins. Am I correct?
In maven we can always call ant
in index and which are not.
Regards,
Lukas
On 8/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Matthew,
In fact I didn't realize you are doing merge stuff (sorry for that)
but frankly I don't know how exactly merging works and if this
strategy would work in the long time perspective and whether
due to the large amount of segments being kept.
Thanks,
Matt
Lukas Vlcek wrote:
Hi Matthew,
I am surious about one thing. How do you know you can just drop $depth
number of the most oldest segments in the end? I haven't studied nutch
code regarding this topic yet but I thought that segment
Hi,
Is there a real chance that NUTCH-273 would be fixed soon (let's say
once 0.8 is relased)?
Lukas
On 6/10/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Sami Siren wrote:
How would folks feel about releasing 0.8 now, there has been quite a
lot of improvements/new features
since 0.7 series
[
http://issues.apache.org/jira/browse/NUTCH-273?page=comments#action_12413602 ]
Lukas Vlcek commented on NUTCH-273:
---
May be I am wrong but handling redirects can be very complex topic and I am not
sure if general solution can be easily found.
Right now
Environment: n/a
Reporter: Lukas Vlcek
[Excerpt from maillist, sender: Andrzej Bialecki]
When a page is redirected, the original url is NOT updated - so, CrawlDB will
never know that a redirect occured, it won't even know that a fetch occured...
This looks like a bug.
In 0.7 this was recorded
Hi,
I reported some typos and incomplete information in nutch 08 tutorial
some time ago. It seems that all commiters and voluntaries are busy
with more important issues so I took this opportunity and now I am
proud to present my *first-small-humble-patch-ever*.
Please review the patch and let
Andrzej,
My pleasure. I would choose the following location:
http://wiki.apache.org/nutch/DevelopmentCommandLineOptions
Let me know if you can think of anything better otherwise I'll do it.
Regards,
Lukas
On 5/9/06, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Andrzej
.
Rgrds, Thomas
On 5/3/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Thanks Thomas,
I gave a quick glance at Ivy. It looks interesting.
But does it really bring heavy simplification over Maven if I need
more advanced stuff? Does it allow jelly integration? How much it is
adopted across open-source
). It has the
benefits of Maven, without the overhead and learning curve involved.
Rgrds. Thomas
On 5/2/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Thomas,
I would really appreciate your .classpath and .project files for
Eclipse (for Nutch-trunk). Could you send them to me? Or could you
upload them
Hi,
Nutch 0.8 version tutorial (see:
http://lucene.apache.org/nutch/tutorial8.html) in whole-web indexing
paragraph - it says: bin/nutch index indexes crawl/linkdb
crawl/segments/*
Shouldn't it say: bin/nutch index crawl/indexes crawl/crawldb
crawl/linkdb segment#1_path [segment#2_path [...]] ?
Thomas,
I would really appreciate your .classpath and .project files for
Eclipse (for Nutch-trunk). Could you send them to me? Or could you
upload them somewhere?
I don't think I am novice in terms of Eclipse but frankly I am to lazy
configuring all these settings manually. I do use Maven all
Re-posting to dev list after no response in user list.
Lukas
-- Forwarded message --
From: Lukas Vlcek [EMAIL PROTECTED]
Date: Jan 19, 2006 8:42 AM
Subject: Nutch merge problem after fetch is aborted with hung threads.
To: nutch-user@lucene.apache.org
Hi,
I am facing
is not null even though
page has no content and title.
Could it be FetcherOutput Object ???
P
--- Lukas Vlcek [EMAIL PROTECTED] wrote:
Hi,
I think this issue can be more complex. If I
remember my test
correctly then parse object was not null. Also
parse.getText() was not
null
Huh...
anybody interested in this?
Normally I would be so pushy but to me it seems that Nutch dies if it
meets word document which can't be parsed. This seems like a serious
issue to me.
Or did I overlooked something important/fundamental?
Lukas
On 1/6/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Hi
.
Regards,
Lukas
On 1/5/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Hi Andrzej,
This is what sets Fetcher to parse to true or false, right?
property
namefetcher.parse/name
valuetrue/value
descriptionIf true, fetcher will parse content./description
/property
I don't have my nutch-default
:
Lukas Vlcek wrote:
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done but some threads are hunged which results into
exception after some timeout.
This was fixed
Thanks guys!
I really didn't have the latest copy...
L.
On 1/4/06, Byron Miller [EMAIL PROTECTED] wrote:
Fixed in the copy i run as i've been able to get my
100k pages indexed without getting that error.
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Hi
)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
I tried to turn off most of parsing pluggins but it didn't help so
there is probably some general issue.
Any ideas?
Regards,
Lukas
On 1/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Thanks guys!
I really didn't have the latest copy...
L
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done but some threads are hunged which results into
exception after some timeout.
I am not sure whether this is a real nutch issue or just mine
mail-lists
On 1/4/06, Lukas Vlcek [EMAIL PROTECTED] wrote:
Hi,
I am trying to use the latest nutch-trunk version but I am facing
unexpected Job failed! exception. It seems that all crawling work
has been already done but some threads are hunged which results into
exception after some
manually step by
step, there is a tutorial in the wiki how to run the map rd commands
step by step.
Stefan
Am 21.12.2005 um 06:56 schrieb Lukas Vlcek:
Hi,
I am trying to use nutch-0.8-dev and I have a problem with crawl run.
I did checkout from SVN and prepared fresh package (ant package
25 matches
Mail list logo