: A document (in our case an xml that has many metadata) can have more : than one date, each date with 2 attributes:
: <date type="document" art="geburt">00-00-1886</date> : : In the date index I have for every <date> in the input xml a document : with fields: type (document |other), date, art (birthday | deportation | : death...). For example if I merge all the dates that correspond to a : document then the new type field will contain all the values. So if I so make a seperate field for each type of date, lucene is very good at supporting documents with heterogenous sets of fields -- and it's even better if you use OMIT_NORMS (which makes perfect sense for a date field where norm values are meaningless anyway) : A couple of suggestions... : : 1) don't use multiple indexes. create one index, with one document per : "thing" you want to return (in this case it sounds like books) and index : all of the relevent data about each thing in that doc. If multiple : people : worked on a book, add all of their names to the same field. addd all of : the dates to the book doc -- if you need to distibguish the differnet : types of dates, make a seaprete field for each type. : : If you *must* cross refrence... : : 2) make sure you aren't useing the Hits API to iterate over all the : results when gathering IDs -- use a lower level api (like a : HitCollector) : : 3) use the FieldCache to get the IDs instead of he stored Document : fields. : : 4) don't extract full ID lists from all of then indexes and then search : on one of the indexes again with the ID list ... use the ID lists : generated from the supporting indexes (people and dates) to build a : Filter : that you can use when searching the main index. : : : : : Date: Mon, 12 Jun 2006 12:22:30 +0300 : : From: Mile Rosu <[EMAIL PROTECTED]> : : Reply-To: java-user@lucene.apache.org : : To: java-user@lucene.apache.org : : Subject: Using more than one index : : : : Hello, : : : : We have an application dealing with historical books. The books have : : metadata consisting of event dates, and person names among others. : : The FullText, Person and Date indexes were split until we realized : that : : for a larger number of documents (400K) the combination of the : : sequential search hits took a way too long time to complete (15 min). : : The date index was built using the suggestion found at: : : http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing : (big : : thanks for the hint) : : : : Is there a recommended approach to combining results from different : : indexes (with different fields)? : : : : The indexes structure: : : MainIndex: : : Fields: : : @ID@ - keyword (document id) : : @FULLTEXT@ - tokenized (used for full text6 search) : : Ptitle - tokenized (used for full text publication title : : search) : : Dtitle - tokenized (used for full text document title : : search) : : Type - keyword - (used for document type) : : : : PersonIndex: : : @ID@ - keyword (document id == [EMAIL PROTECTED]@) : : Person - tokenized (full text person name search) : : DateIndex: : : @ID@ - keyword (document id == [EMAIL PROTECTED]@) : : Date - date as YYYYMMDD - keyword : : Type - type of date (document date, birth day, etc...) : : @YYYY@ - year of date : : @YYYYMM@ - year and month of date : : @DDD@ - decade : : @CC@ - century of date : : : : : : Eg: : : If I want to search for documents that contain: person "John", full : text : : "book" and date: before 06/12/2005 : : Step 1: search in personIndex for John - retrieve all @ID@ from the : hit : : list : : Step 2: search in DateIndex for documents that have dates before : : 06/12/2005 - retrieve id from the hit list : : Step 3: search in mainIndex for "book" - retrieve all @ID@ : : Step 4: combine all the lists : : Step 5: search mainIndex for documents with the @ID@ from the combined : : id list : : : : Each search takes less then 1 second, but retrieving @ID@ from the : index : : takes a lot more - the time increases by the number of hits. This is : : because when retrieving a field value from a document hit, the Lucene : : engine loads all the fields from the index (the entire document). So : if : : in one search I get 300.000 hits cont, I have to iterate through all : and : : retrieve the @ID@ field value - this takes a lot of time. : : : : Regards, : : Mile Rosu : : : : --------------------------------------------------------------------- : : To unsubscribe, e-mail: [EMAIL PROTECTED] : : For additional commands, e-mail: [EMAIL PROTECTED] : : : : : : -Hoss : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]