While it's possible that Lucene (or Solr) is faster for the keyword searches I wouldn't be convinced until I saw a comparison done on a reasonably large data set between Lucene and the ProductKeyword table using a few different keyword combinations. With ProductKeyword we're using a database index on the keywords to lookup productIds, which is basically what Lucene does with its own reverse index.

Lucene does do some cool search expression stuff that our current product searching doesn't support. However, the current product search does support various features like stem removal and thesaurus expansion (which has been mentioned in this thread).

One of the really big problems with moving to Lucene is how to handle the parametric searching and flexible sorting that we currently do by taking advantage of a dozen or so tables in the database to search on features associated with products and categories (optionally including all sub-categories) and prices and catalogs and stores, and on top of that it's easy to add constraints for just about anything else you might associate with a product.

The option of doing a Lucene search first to get a set of productIds that match and then passing that to the database with a possibly massive IN expression would work, but might perform horribly because of all of the data that needs to be moved around and such.

If Solr supports this sort of parametric search it might be interesting, but it would be a LOT of redundant data to keep track of, and I don't really like that a whole lot...

So, back to the beginning, unless someone can show that Lucene beats out the keyword indexing that a good database (and properly configured to make sure the keyword index is working and so on) does with the ProductKeyword table then I wouldn't even want to start going in this direction.

-David


On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote:

Hello,
Just to put some light on the product search.
Main class involved :
applications/product/src/org/ofbiz/product/product/ProductSearch.java

It's 100% SGDB based, not lucene or whatever.

For a reminder, there is an entity in Ofbiz called ProductKeyword which primary key is ProductId and Keyword (varchar(60)) and that is filled at
each creation update of the product carateristics, name, fields,....

So is it today the best and most efficient way to do search? huho, not sure you are right.. But for product only, it's usually enough (boolean search speaking). Now if need also to index files that are associated with product and may be (but i don't know if exist already as i never looked) if need to index CMS and files uploaded through CMS, a solution based on a real search
engine should be far more superior.

Regards

2008/9/11 madppiper <[EMAIL PROTECTED]>



BJ Freeman wrote:

You have stated what caused the responses, when you made assumptions.
[I have worked with Solr, not lucene.]

You have not investigated how ofbiz works.

I think that comments like that are not only unneccesary, but unhealthy for any open discussion. (Please read my original message again, replace the term "proprietary" with "native", keep in mind that OFBIz does NOT use
Lucene for searching - so I was told several times now, and then skip
through the original question at hand)



@Jacques: Thanks for the response - not quite. There are actually two
questions at hand:

1)
What search engine, if any, is used by OFBiz to generate keyword search
results for Products?

2)
If 1) can be answered with "NO Searchengine per se" - which would implie that we are doing real database queries right now (perhaps one that use
Fulltext-query algorithms), would it not be a good idea to move to a
standalone searchengine as Solr?


--
View this message in context:
http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
Sent from the OFBiz - Dev mailing list archive at Nabble.com.



Reply via email to