Hi Solrists,
thank you for your kind responses.
Grant, François, I'll keep your advice in mind your links in store;
they may be useful in one of my use cases,
even though I doubt they might in the primary one.
As RDF litterals, my documents are affixed with language tags @en, @fr, @ja c,
as per ISO 639-1, so that language identification is straightforward.
It's the discovery of the corresponding Analyzer subclasses constructors
I'm trying to automate.
Solr looks like it's up to the server admins to specify in XML
what Analyzer subclasses they want in a given case,
then it's up to Solr to instantiate those subclasses by Java reflection.
I would like to spare myself the burden to write maintain this XML.
Rather, I'd use Java code to build the mapping
by inventorying the classpath, with rules like
on finding jarentry /whats/this/package/analysis/xx/WhatsThisAnalyzer.class,
if class WhatsThisAnalyzer is a subclass of lucene.analysis.Analyzer,
if reflection reveals a public new WhatsThisAnalyzer(lucene.util.Version),
if instantiation succeeds,
then the instance is the presumptive default analyzer for ISO 639-1 code xx.
Might make a Lucene submission, more properly than a Solr one.
Thanks again for your time your help.
Best regards,
François Jurain.
Message du 25/03/11 à 23h06
De : François Schiettecatte fschietteca...@gmail.com
A : solr-user@lucene.apache.org
Copie à :
Objet : Re: Wanted: a directory of quick-and-(not too)dirty analyzers for
multi-language RDF.
François
I think there is a language identification tool in the Nutch code base,
otherwise I have written one in Perl which could easily be translated to
Java. I wont have access to it for 10 days (I am traveling), but I am happy
to send you a link to it when I get back (and anyone else who wants it).
Cheers
François
On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote:
You are looking for a language identification tool. You could check
https://issues.apache.org/jira/browse/SOLR-1979 for the start of this.
Otherwise, you have to roll your own or buy a third party one.
On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote:
Hello Solrists,
As it says in the subject line, I'm looking for a Java component that,
given an ISO 639-1 code or some equivalent,
would return a Lucene Analyzer ready to gobble documents in the
corresponding language.
Solr looks like it has to contain one,
only I've not been able to locate it so far;
can you point the spot?
I've found org.apache.solr.analysis,
and thing like org.apache.lucene.analysis.bg c in lucene/modules,
with many classes which I'm sure are related, however the factory itself
still eludes me;
I mean the Java class.method that'd decide on request, what to do with all
these packages
to bring the requisite object to existence, once the language is specified.
Where should I look? Or was I mistaken Solr has nothing of the kind, at
least in Java?
Thanks in advance for your help.
Best regards,
François Jurain.
Retrouvez les 10 conseils pour économiser votre carburant sur Voila :
http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/
--
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search
Suivez toute l'actualité en photos de l'émission Carré Viiip et retrouvez les
derniers échanges des viiip sur :
http://people.voila.fr/evenementiel/carre-viiip