Re: Accent-insensitive searches

d...@xx Fri, 11 Sep 2009 09:57:52 -0700

Bonsoir Sylvain,

I was just starting to look for the way to implement case insensitive andaccent insensitive SELECT in Derby when I received this mail.

I tried what you suggest but I can't make it working. I kown, it is not apurely Derby question but I guess that those who are not familiar with spimay have the same problem, so I post on this mailing list.

I have defined my jar with the META-INF file and ServiceLoader.load(java.text.spi.CollatorProvider.class ) call is returning my class correctly.

Anyway Derby throw ERROR XBM04 as it can't find my provider.

I search into the java API source and found thatServiceLoader.loadInstalled() was called bysun.util.LocaleServiceProviderPool (Locale.getAvailableLocales())And javadoc says : "This method is intended for use when only installedproviders are desired. The resulting service will only find and loadproviders that have been installed into the current Java virtual machine;providers on the application's class path will be ignored."

Does that mean I have to deploy my jar CollatorProvider somewhere else thanin the application class path ? It should be a deployment problem for me...

What means "installed into the current Java virtual machine" ?

It is a shame that DERBY doesn't implement an easier way to do such frequentaction. A case insensitive, accent insensitive behaviour option **shouldbe** a standard DERBY feature (as SQL Server has and ORACLE don't have) !


Thanks

jylaxx

----- Original Message -----From: "Sylvain Leroux" <[email protected]>

To: "Derby Discussion" <[email protected]>
Sent: Friday, September 11, 2009 10:33 AM
Subject: Re: Accent-insensitive searches

Hi Josu,

And sorry for the late reply.
I've tried the exact same thing as you yesterday - except I have usedLocale.FRENCH as /base/ locale:
>         Collator c=Collator.getInstance(Locale.FRENCH);

It works like a charm (both with = and LIKE).
I am using Apache Derby 10.5.1.1 and Sun JDK 1.6.0_12.


Maybe you should try with a more recent version of Derby (?)
Otherwise, check that you've specified your custom collator at DB creationtime. Not when booting the DB: since even if you specify an other collatorwhen you boot an existing DB, it's still the collator specified atcreation time that is used.
Finally, the most doubtful one: accent difference might be a PRIMARYdifference for the default es_ES collator (??). But, that would be reallysurprising...
Let us know if you find the answer!

Best regards,
Sylvain


josu a écrit :
I'm working on an database application. Items in the database are all in
spanish language. It's mandatory that searches are accent-insensitive,
meaning that, for example, a search for the word 'electrico' (no accent)
must return entrances containing 'eléctrico' (with accent).

Searching the web for a solution, I find I must set these two properties
when creating the database: territory=es_ES
collation=TERRITORY_BASED

But it still doesn't work this way. Looks like the default collation for
es_ES is still accent-sensitive.
So I try to use a custom collator that will behave as I need to. I findsome
instructions for this in the following blog:

  http://blogs.sun.com/kah/entry/user_defined_collation_in_apache
http://blogs.sun.com/kah/entry/user_defined_collation_in_apache In brief,I define a new CollatorProvider and register it with the JVM.
Here's the code for this class:


public class IgnoraAcentosCollatorProvider extends
java.text.spi.CollatorProvider {

    @Override
    public Collator getInstance(Locale locale) {
        if (!locale.equals(new Locale("es","ES","accentinsensitive"))){
            throw new IllegalArgumentException("Solo acepta
es_ES_accentinsensitive");
        }
        Collator c=Collator.getInstance(new Locale("es","ES"));
        c.setStrength(Collator.PRIMARY);
        return c;
    }

    @Override
    public Locale[] getAvailableLocales() {
        return new Locale[]{
            new Locale("es","ES","accentinsensitive")
        };
    }

}
It simply takes the default es_ES Collator and changes strength toPRIMARY.This makes the collator return 0 when comparing 'electrico' and'eléctrico'.
After making sure this new Collator is available for the JVM, I re-start
Derby and make a new database, now settingterritory=es_ES_accentinsensitive
The database is created without errors (meaning Derby reaches myCollator),
but searches are still accent-sensitive (no matter if I use = or LIKE
operators).

Any clue? I made intensive searches about this issue but I found no
solution. I can avoid the problem simply using MySQL (the default spanish
configuration has already the desired behaviour) but I would like to keepon
using Derby if possible.

I'm using JavaDB-Derby 10.4.2.1

Thanks.
--
Website: http://www.chicoree.fr

Re: Accent-insensitive searches

Reply via email to