While it's possible that Lucene (or Solr) is faster for the keyword
searches I wouldn't be convinced until I saw a comparison done on a
reasonably large data set between Lucene and the ProductKeyword table
using a few different keyword combinations. With ProductKeyword we're
using a database index on the keywords to lookup productIds, which is
basically what Lucene does with its own reverse index.
Lucene does do some cool search expression stuff that our current
product searching doesn't support. However, the current product search
does support various features like stem removal and thesaurus
expansion (which has been mentioned in this thread).
One of the really big problems with moving to Lucene is how to handle
the parametric searching and flexible sorting that we currently do by
taking advantage of a dozen or so tables in the database to search on
features associated with products and categories (optionally including
all sub-categories) and prices and catalogs and stores, and on top of
that it's easy to add constraints for just about anything else you
might associate with a product.
The option of doing a Lucene search first to get a set of productIds
that match and then passing that to the database with a possibly
massive IN expression would work, but might perform horribly because
of all of the data that needs to be moved around and such.
If Solr supports this sort of parametric search it might be
interesting, but it would be a LOT of redundant data to keep track of,
and I don't really like that a whole lot...
So, back to the beginning, unless someone can show that Lucene beats
out the keyword indexing that a good database (and properly configured
to make sure the keyword index is working and so on) does with the
ProductKeyword table then I wouldn't even want to start going in this
direction.
-David
On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote:
Hello,
Just to put some light on the product search.
Main class involved :
applications/product/src/org/ofbiz/product/product/ProductSearch.java
It's 100% SGDB based, not lucene or whatever.
For a reminder, there is an entity in Ofbiz called ProductKeyword
which
primary key is ProductId and Keyword (varchar(60)) and that is
filled at
each creation update of the product carateristics, name, fields,....
So is it today the best and most efficient way to do search? huho,
not sure
you are right.. But for product only, it's usually enough (boolean
search
speaking). Now if need also to index files that are associated with
product
and may be (but i don't know if exist already as i never looked) if
need to
index CMS and files uploaded through CMS, a solution based on a real
search
engine should be far more superior.
Regards
2008/9/11 madppiper <[EMAIL PROTECTED]>
BJ Freeman wrote:
You have stated what caused the responses, when you made
assumptions.
[I have worked with Solr, not lucene.]
You have not investigated how ofbiz works.
I think that comments like that are not only unneccesary, but
unhealthy for
any open discussion. (Please read my original message again,
replace the
term "proprietary" with "native", keep in mind that OFBIz does NOT
use
Lucene for searching - so I was told several times now, and then skip
through the original question at hand)
@Jacques: Thanks for the response - not quite. There are actually two
questions at hand:
1)
What search engine, if any, is used by OFBiz to generate keyword
search
results for Products?
2)
If 1) can be answered with "NO Searchengine per se" - which would
implie
that we are doing real database queries right now (perhaps one that
use
Fulltext-query algorithms), would it not be a good idea to move to a
standalone searchengine as Solr?
--
View this message in context:
http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
Sent from the OFBiz - Dev mailing list archive at Nabble.com.