I think you should go to JIRA, some big guys forgot us...

 

No any performance problems in comparison with parsing document... after
that, we can walk many times J 

 

Can NodeWalker stop after finding first title, or it will walk till end of
document?

 

 

From: Alexey Torochkov [mailto:all.net...@gmail.com] 
Sent: August-28-09 5:50 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Title inside body

 

I think it's still a good solution to make it configurable

 

<name>parser.html.skip.body.title</name>

<value>false</value>

true - for default

 

I see only one performance problem with it, if page doesn't have title at
all - NodeWalker will continue to walk on all nodes (but, actually it's not
a problem)

 

Patch attached

 

Should I create an issue on it in JIRA? Or this patch have no chances to be
applied?

-- 
Alexey Torochkov 

Reply via email to