I understand that "Documents are the primary retrievable units from a Lucene query" But I don't know if I want to have 12 documents in the lucene index that represent the same business object, or if I should place 12 different business documents within the lucene index.

Here is the background:
I want to index a product catalog (some data in database and some data on the filesystem, I have cross-reference between the two). Each product is associated to attributes, categories and one or more PDF/MS Word documents, HTML descriptions, images, etc...
A product could have 12 different files associated to it.

Is it okay if I create as many documents as assets that I want to return from a search and add information to each document tying it back to the product that it is assocated with? Is that the right approach?

Thanks, it's keeping me up at night.


BTW, I am working on a release of a professional-grade ecommerce suite that is open-source (apache license), I wouldn't mind help on the lucene/search stuff. There's plenty more for me to do. 120+ tables, going to prod for a client this weekend (without search;) Contact me!





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to