David Johnson wrote:
I think I was again focusing on range queries and giving Lucene some way of
filtering out subsets of the document set, so that the whole document set
wouldn't have to be walked.  For the date range query the from and to dates
would most likely share some set of most significant bytes - these bytes
could just be passed to Lucene as a direct match thereby reducing the subset
of the collection that would by walked.  If the range query is fixed this
"optimization" would be unnecessary.  Nevertheless, I still wonder if there
is additional information that could be stored in Lucene to augment the
index and improve query processing.

ah, now I see. yes, that might help in some cases. e.g. you could say get me all documents with a year value of 2007 and month value of 7. which would be equivalent to a range query 2007-07-01 to 2007-07-31

In this case I was considering using the node UUID as the cross-index join
parameter.  Still, there is the problem of combining the results from two
different indexes.

there are two issues with this approach:
1) getting the UUID requires lucene to load the document
2) implementing an *efficient* join across system boundaries is not easy, even if the documents are sorted.

3) Use the database to provide the indexing structures.

To me this seems to be a very interesting option, though it requires
considerable effort.

Yes, I agree, this is an interesting option, and does seem that it would
take a fair amount of effort.  Your comments on the user list to this same
thread seems like a start to the thought process needed.  I am not very
familiar with the details of the PM, although I do think that bringing
together data storage and indexing will help with improving query processing
speed, as well as help with some data integrity issues that have been
discussed in other threads.

Over the weekend, I will see if I can come up with a solution to the range
query issue discussed above.

great.

regards
 marcel

Reply via email to