[jira] Updated: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash

2008-03-16 Thread Mark DeSpain (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark DeSpain updated NUTCH-620: --- Priority: Minor (was: Major) BasicURLNormalizer should collapse runs of slashes with a single slash

Chnage the Analyzer by plugin - how to dealing with the query?

2008-03-16 Thread Vinci
Hi all, I have changed the Analyzer of nutch and make it work for the luence sandbox analyzer. I use luke to check the language and the query and they look work fine. However, I find the method posted in wiki is not work fine for me, and most of the post just mention how to make the index work

Write back to the segment?

2008-03-16 Thread Vinci
Hi all, I want to do cleaning on the html in the cached page - where is the cache located that I should read and what extension point I can use? If I do that before indexing, will this action be too expansive? Thank you for any answer. -- View this message in context:

Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer!

2008-03-16 Thread Vinci
Hi, I added some log trace so I can see more detail... finding now: -both nutchBean and webapps fail, only Luke success (by manually select the correct analyzer) -the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer for zh locale. Method is followed the Nutch wiki of

Cached page - can it be changed?

2008-03-16 Thread Vinci
Hi all, I would like to ask, if I want to change the cached page, do I need to modify the nutchBean or just changed the jsp? Thank you -- View this message in context: http://www.nabble.com/Cached-page---can-it-be-changed--tp16078204p16078204.html Sent from the Nutch - Dev mailing list

(nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess

2008-03-16 Thread Vinci
Hi all, (This is the most updated post, sorry for posting many time as I think I describe the problem not well...) The current condition is same as title: NutchBeans and webapps fail, but Luke sucess - with my own analyzer plugin. That is, only Luke can search with the index generated after

[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb

2008-03-16 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579285#action_12579285 ] Emmanuel Joke commented on NUTCH-530: - OK Add a combiner to improve performance on