Greetings all. I have read many posts concerning similar use cases, but I am still a little hazy on the best way to achieve what I need to do. Here is the background:
2 million documents with multiple sections, some sections contain structured data, some unstructured. We parse the docs and place the structured stuff in oracle where each section is a table and one master table to relate them all. We index the unstructured sections with lucene where each section is a document (meaning a total of about ~30 million documents) with extra fields including one for the primary key of the master table and then some meta fields to describe the section - type, date, etc. For a common use case, say we have a table called demographics with a number field that represents age (overly simplistic but gets the point across). So say we want all people over the age of 50 who may have visited Panama: -- We have our lucene index and we want to search the section text for the word "panama" AND We want to select from the demographics table where age > 50. -- Now I need to intersect the master table IDs from my lucene hits and my table results. I have a java stored procedure that runs the lucene query and creates a temporary table with a single column where I insert the master id from the hits of my lucene query. I then can do a join with my structured query results. The problem here is obviously the speed of iterating through the hits to extract the single field that I need. Notes: - I must be able to get a full set of results, though I only need the one id field - We originally went with Oracle text which was simple, but limited and quite slow for most queries I have read a little about the hitcollector class and the fieldselector api, but I am still not sure how they may help me or even if they can. I have also tooled around with the idea of using termdocs, but the queries may get a little complex with various ors/ands/nots, though probably not spans and so forth. Any suggestions will be greatly apreciated. Thanks, J -- View this message in context: http://www.nabble.com/retrieve-all-docs-efficiently---just-one-field-tp17766268p17766268.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]