Thanks Gert, that is what I thought. We have some existing applications
in production that use GSearch through SOAP and I would like to disrupt
them as little as possible. If we choose to go with Solr in our
production environment I think the way to go will be to use its API
directly, but for testing I would like to simply point the development
version of the app at the new Solr-backed GSearch.
Matt
From: Gert Schmeltz Pedersen [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 30, 2008 3:46 AM
To: Matthew Cordial; [email protected]
Subject: RE: Stemming and Queries in GSearch with Solr
You are quite right. When GSearch sends index documents to Solr, they
are analyzed as specified in the Solr configuration files. When you
search with gfindObjects, then GSearch will use the "fgsindex.analyzer"
property in the index.properties configuration file, which therefore
should have the same value as the similar Solr configuration property.
Your application may just as well search directly on Solr, which will
make sure that the query is analyzed the same way as the index
documents.
If you do want to search through GSearch and your Solr specification
gives different results than the analyzers that Lucene offers directly,
then you have to write your own Analyzer, as you say, and then specify
it in the "fgsindex.analyzer" property. This is the same as you would
do, if you run GSearch with the lucene plugin, and you want customized
analysis.
Best,
Gert
PS: A related link:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
From: Matthew Cordial [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 30, 2008 12:07 AM
To: [email protected]
Subject: [Fedora-commons-users] Stemming and Queries in GSearch with
Solr
I have GSearch 2.1.1 configured to use Solr. I am finding that there are
significant differences in the search results returned from the GSearch
interfaces and Solr. The main problem has to do with stemmed terms - the
GSearch interfaces do not find them. For instance, a document with a
dc.subject of "dynamics" will be stemmed down to "dynam" in the index. A
query of "dc.subject:dynamics" returns no hits through GSearch, but
works as expected through Solr. GSearch will, however, find the document
with a query of "dc.subject:dynam".
I am assuming that this is because GSearch is not analyzing queries with
the same Analyzer (+ filters) used to create the index. Solr is
configured in schema.xml so that that the "text" field-type is analyzed
using the EnglishPorterFilter when doing *both* indexing and querying. I
believe this is why they query acts as expected through Solr but not
GSearch.
Is there a configuration option in GSearch to indicate what filters
should be used for the query? Or, do we need to write our own Analyzer
which utilizes the same filters as Solr and tell GSearch to use it with
the "fgsindex.analyzer" property? Are there any other ways of dealing
with this?
Thanks,
Matt
----------------------
Matt Cordial
Digital Libray Software Engineer
Informatics and Cyberinfrastructure Services
Arizona State University Libraries
480.965.9094
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users