Hi Rafael,
Have a look for the “search.analyzer = “ line in your dspace.cfg file. The
default is to use a standard English analyzer:
search.analyzer = org.dspace.search.DSAnalyzer
You’ll probably find that you have a Brasilian Portuguese analyser being used,
and it’s “helpfully” doing stemming, gender analysis on words, etc., which
perhaps you don’t want?
The only downside you’ll encounter going to the default DSAnalyzer is that
characters like ç and õ won’t be automatically filtered to ‘c’ and ‘o’, which
might make it difficult for people searching your site with
non-international/extended keyboard layouts.
It is possible to easily hack DSAnalyzer.java to include just the filters you
want, without doing full stemming, stop-words, etc. I’ve done this on our
repositories to filter out macronised vowels which are common to words in
Māori. If you really don’t want the full Portuguese search analyzer but do want
the extended Latin characters filtered, I suggest sticking with DSAnalyzer and
adding ISOLatin1AccentFilter.java (available from Lucene) to the list of
filters used.
It may be just as easy to hack org.apache.lucene.analysis.br
(http://www.docjar.com/docs/api/org/apache/lucene/analysis/br/package-index.html)
to only stem the words you want it to.
Cheers,
Kim
From: Rafael Henkin [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 2 December 2008 8:46 a.m.
To: DSpace-tech@lists.sourceforge.net
Subject: [Dspace-tech] DSpace/Lucene removing suffixes when searching?
Hi,
We’re using Dspace 1.4.2 with the Lucene that’s shipped with it.
We don’t know if it’s a matter of configuration (apparently not)
but Dspace (or Lucene) is removing (common) suffixes from words before
searching.
For example (removing “ações”)
2008-12-01 17:44:04,120 INFO org.dspace.search.DSQuery @ Final query string:
+(((title:alterações))) +location:m72
2008-12-01 17:44:04,129 INFO org.dspace.search.DSQuery @ Search[+title:alter
+location:m72], sort by (sorttitle), Result: [EMAIL PROTECTED]
Or: (again “ações”)
2008-12-01 17:44:58,857 INFO org.dspace.search.DSQuery @ Final query string:
+(((title:modificações))) +location:m72
2008-12-01 17:44:58,887 INFO org.dspace.search.DSQuery @ Search[+title:modific
+location:m72], sort by (sorttitle), Result: [EMAIL PROTECTED]
With these words it wouldn’t matter (as sometimes you don’t know exactly
the title that you are searching but when you search for author Laura, it also
returns results for Lauro, when there ARE results for Laura (if there wasn’t
any I would understand).
2008-12-01 17:49:37,522 INFO org.dspace.search.DSQuery @ Final query string:
+(((tdautor:laura))) +location:m72
2008-12-01 17:49:37,530 INFO org.dspace.search.DSQuery @ Search[+tdautor:laur
+location:m72], sort by (sorttitle), Result: [EMAIL PROTECTED]
Is there any way to disable this through configuration or
through Dspace or is it natural to Lucene?
Thanks,
Rafael
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech