Doug Cutting wrote:
Andrzej Bialecki wrote:
I tested it on a 5 mln index.
Thanks, this is great data!
Can you please tell a bit more about the experiments? In particular:
. How were scores assigned to pages? Link analysis? log(number of
incoming links) or OPIC?
log()
. How were
Hey guys,
Been playing with Nutch quite a bit lately, here's a random grab-bag
of queries / questions / problems I've encountered.
- Classloading - I have had many problems with NutchConf due to the
way it loads it's resources. In a J2EE scenario, it's simply evil :)
Would there be any great
Andrzej Bialecki wrote:
. How were the queries generated? From a log or randomly?
Queries have been picked up manually, to test the worst performing cases
from a real query log.
So, for example, the 50% error rate might not be typical, but could be
worst-case.
. When results differed
Stefan Groschupf wrote:
Hi,
I counted the votes manually, I hope I didn't oversee something. I
didn't filter out issues that are 0.8 related, since it is good to
know community wishes anyway. :-)
Shouldn't the period for voting be a bit longer? I didn't have time to
vote yet... Anyway,
I hope it's not too late to accept my votes. Here there are:
NUTCH-136mapreduce segment generator generates 50 % less than
excepted urls
+1
NUTCH-121SegmentReader for mapred
+1
NUTCH-108tasktracker crashs when reconnecting to a new jobtracker.
+1
Thanks,
--Flo
Doug Cutting wrote:
Andrzej Bialecki wrote:
. How were the queries generated? From a log or randomly?
Queries have been picked up manually, to test the worst performing
cases from a real query log.
So, for example, the 50% error rate might not be typical, but could be
worst-case.
Stefan Groschupf wrote:
In case you setup one thread per host, you have maximal as much
connections to one host as you have boxes. In may case that are not
that much.
Anything more than one is not generally considered polite.
Also it is a reproducible bug that the segment is everytime
Mike Cannon-Brookes wrote:
Hey guys,
Hi, Mike! Welcome.
- Classloading - I have had many problems with NutchConf due to the
way it loads it's resources. In a J2EE scenario, it's simply evil :)
Would there be any great problem with switching it's classloader to
Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the
code from mapred, I guess some time around the middle of January (Doug?)
Thinking about this more, perhaps we should do it sooner. There's
already a branch for 0.7.x releases, so what point is there
Hi,
I have problems with JUnit tests in trunk and mapred branches.
TestFetcher fails in both branches. The same test executes correctly in
0.7 branch.
Is it only my problem (environment setup) or others are having it too?
I would suspect some changes in redirect handling
Regards
Piotr
Doug Cutting wrote:
Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the
code from mapred, I guess some time around the middle of January (Doug?)
Thinking about this more, perhaps we should do it sooner. There's
already a branch for 0.7.x releases,
Andrzej Bialecki wrote:
I agree. I just thought that we would prepare the relase based on the
code in trunk/ , and in that case we would like to wait with the merge
before we do the release.
My definition of trunk is that it should be where the majority of
development happens. It is what we
Doug Cutting wrote:
Andrzej Bialecki wrote:
I agree. I just thought that we would prepare the relase based on the
code in trunk/ , and in that case we would like to wait with the
merge before we do the release.
My definition of trunk is that it should be where the majority of
development
Andrzej Bialecki wrote:
Yes, we just need to make sure that all important bits from trunk are on
the 0.7 branch, before we start.
I will sync mapred with the trunk prior to the merge, so we should still
be able to get anything we need after mapred is merged back to trunk.
BTW, we're pretty
Wow - great responses all.
0.7 vs 0.8 - apologies if I'm using an old version. I'm using the
latest binary release. I'll switch to latest SVN HEAD and see how that
works in my application.
Is there any concrete timeline on 0.8?
I'm very glad to see the statics generally being reduced. I also
NutchConf should use the thread context classloader
---
Key: NUTCH-142
URL: http://issues.apache.org/jira/browse/NUTCH-142
Project: Nutch
Type: Improvement
Versions: 0.7
Reporter: Mike Cannon-Brookes
Just continue voting I will continue with my tally sheet. :-)
Why not creating a wiki page... so that you don't have to do this bad
work.
Jérôme
Filed as http://issues.apache.org/jira/browse/NUTCH-142
I didn't think there was much point creating a patch for a 1 line fix :)
m
On 12/16/05, Mike Cannon-Brookes [EMAIL PROTECTED] wrote:
Wow - great responses all.
0.7 vs 0.8 - apologies if I'm using an old version. I'm using the
latest
Mike Cannon-Brookes wrote:
0.7 vs 0.8 - apologies if I'm using an old version. I'm using the
latest binary release. I'll switch to latest SVN HEAD and see how that
works in my application.
The mapred branch will soon be moved to trunk, so you might be better
off starting there, since a lot
Doug Cutting wrote:
Once the mapred branch is folded in then there's a bunch of
stuff that's obsoleted that needs to be removed. I'd like to get
dynamic configuration in, if possible.
For reference, I found the message I posted about this a while back:
Sami Siren wrote:
+1. I think this is good time to merge now as the mapred is fully usable.
Barring objections, I will do this tomorrow morning, Pacific time.
Doug
My apologies if this is the second time I've sent this.
Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the
code from mapred, I guess some time around the middle of January
(Doug?)
Doug Cutting wrote:
Thinking about this more, perhaps we should do it
David Wallace wrote:
Would it be worthwhile discussing the pros and cons of having two
completely separate Nutch products? If it is, then now is probably the
right time to do so.
My take on this:
* it's too costly (in terms of available human resources) to maintain
both versions for a
Improper error numbers returned on exit
---
Key: NUTCH-143
URL: http://issues.apache.org/jira/browse/NUTCH-143
Project: Nutch
Type: Bug
Versions: 0.8-dev
Reporter: Rod Taylor
Nutch does not obey standard command line
24 matches
Mail list logo