[
https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark DeSpain updated NUTCH-620:
---
Priority: Minor (was: Major)
BasicURLNormalizer should collapse runs of slashes with a single slash
Hi all,
I have changed the Analyzer of nutch and make it work for the luence sandbox
analyzer. I use luke to check the language and the query and they look work
fine. However, I find the method posted in wiki is not work fine for me, and
most of the post just mention how to make the index work
Hi all,
I want to do cleaning on the html in the cached page - where is the cache
located that I should read and what extension point I can use? If I do that
before indexing, will this action be too expansive?
Thank you for any answer.
--
View this message in context:
Hi,
I added some log trace so I can see more detail...
finding now:
-both nutchBean and webapps fail, only Luke success (by manually select the
correct analyzer)
-the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer
for zh locale. Method is followed the Nutch wiki of
Hi all,
I would like to ask, if I want to change the cached page, do I need to
modify the nutchBean or just changed the jsp?
Thank you
--
View this message in context:
http://www.nabble.com/Cached-page---can-it-be-changed--tp16078204p16078204.html
Sent from the Nutch - Dev mailing list
Hi all,
(This is the most updated post, sorry for posting many time as I think I
describe the problem not well...)
The current condition is same as title: NutchBeans and webapps fail, but
Luke sucess - with my own analyzer plugin. That is, only Luke can search
with the index generated after
[
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579285#action_12579285
]
Emmanuel Joke commented on NUTCH-530:
-
OK
Add a combiner to improve performance on