Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-29 Thread fr . jurain
Hi Solrists, 

thank you for your kind responses. 
Grant, François, I'll keep your advice in mind  your links in store; 
they may be useful in one of my use cases, 
even though I doubt they might in the primary one. 
As RDF litterals, my documents are affixed with language tags @en, @fr, @ja c, 
as per ISO 639-1, so that language identification is straightforward. 
It's the discovery of the corresponding Analyzer subclasses  constructors
I'm trying to automate. 
 
Solr looks like it's up to the server admins to specify in XML 
what Analyzer subclasses they want in a given case, 
then it's up to Solr to instantiate those subclasses by Java reflection. 
I would like to spare myself the burden to write  maintain this XML. 
 
Rather, I'd use Java code to build the mapping 
by inventorying the classpath, with rules like 
on finding jarentry /whats/this/package/analysis/xx/WhatsThisAnalyzer.class, 
if class WhatsThisAnalyzer is a subclass of lucene.analysis.Analyzer, 
if reflection reveals a public new WhatsThisAnalyzer(lucene.util.Version), 
if instantiation succeeds,
then the instance is the presumptive default analyzer for ISO 639-1 code xx. 
 
Might make a Lucene submission, more properly than a Solr one. 
 
Thanks again for your time  your help.
Best regards,
 François Jurain.

 Message du 25/03/11  à 23h06
 De : François Schiettecatte fschietteca...@gmail.com
 A : solr-user@lucene.apache.org
 Copie à : 
 Objet : Re: Wanted: a directory of quick-and-(not too)dirty analyzers for 
 multi-language RDF.
 
 
 François
 
 I think there is a language identification tool in the Nutch code base, 
 otherwise I have written one in Perl which could easily be translated to 
 Java. I wont have access to it for 10 days (I am traveling), but I am happy 
 to send you a link to it when I get back (and anyone else who wants it).
 
 Cheers
 
 François
 
 On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote:
 
  You are looking for a language identification tool.  You could check 
  https://issues.apache.org/jira/browse/SOLR-1979 for the start of this.  
  Otherwise, you have to roll your own or buy a third party one.
  
  On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote:
  
  Hello Solrists,
  
  As it says in the subject line, I'm looking for a Java component that,
  given an ISO 639-1 code or some equivalent,
  would return a Lucene Analyzer ready to gobble documents in the 
  corresponding language.
  Solr looks like it has to contain one,
  only I've not been able to locate it so far; 
  can you point the spot?
  
  I've found org.apache.solr.analysis,
  and thing like org.apache.lucene.analysis.bg c in lucene/modules,
  with many classes which I'm sure are related, however the factory itself 
  still eludes me;
  I mean the Java class.method that'd decide on request, what to do with all 
  these packages
  to bring the requisite object to existence, once the language is specified.
  Where should I look? Or was I mistaken  Solr has nothing of the kind, at 
  least in Java?
  Thanks in advance for your help.
  
  Best regards,
François Jurain.
  
  
  
  Retrouvez les 10 conseils pour économiser votre carburant sur Voila :  
  http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/
  
  
  
  
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
  
  Search the Lucene ecosystem docs using Solr/Lucene:
  http://www.lucidimagination.com/search
  
 
 




  Suivez toute l'actualité en photos de l'émission Carré Viiip et retrouvez les 
derniers échanges des viiip sur : 
http://people.voila.fr/evenementiel/carre-viiip





Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-24 Thread fr . jurain
Hello Solrists,
 
As it says in the subject line, I'm looking for a Java component that,
given an ISO 639-1 code or some equivalent,
would return a Lucene Analyzer ready to gobble documents in the corresponding 
language.
Solr looks like it has to contain one,
only I've not been able to locate it so far; 
can you point the spot?
 
I've found org.apache.solr.analysis,
and thing like org.apache.lucene.analysis.bg c in lucene/modules,
with many classes which I'm sure are related, however the factory itself still 
eludes me;
I mean the Java class.method that'd decide on request, what to do with all 
these packages
to bring the requisite object to existence, once the language is specified.
Where should I look? Or was I mistaken  Solr has nothing of the kind, at least 
in Java?
Thanks in advance for your help.
 
Best regards,
François Jurain.



  Retrouvez les 10 conseils pour économiser votre carburant sur Voila :  
http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/