On Aug 15, 2007, at 10:46 AM, Peter Keegan wrote:

Grant,

I built an index as described here:
http://www.nabble.com/SpanQuery-and-database-join-tf4262902.html

Many documents have only 1 or 2 rows, some have dozens.
Here is a typical query without spans:

+((+contents:quaker +contents:cereal) (+boost50:quaker +boost50:cereal))
+literals:co$us), sort=<custom:"feedbabe":
[EMAIL PROTECTED]>,"dateactiveR"!


Here is a typical query with spans:

+spanNear([adliterals:jb$1, adliterals:co$us], 8, false)
+(+((+contents:quaker +contents:cereal) (+boost50:quaker +boost50:cereal))
+literals:co$us), sort=<custom:"feedbabe":
[EMAIL PROTECTED]>,"dateactiveR"!

The addition of the spanNear clause caused the 10X decrease in throughput. I could probably change the way rows are indexed and use ordered terms, which
seems to be a bit faster (only 5X decrease)

In looking at the code, it makes sense that an ordered SpanNearQuery would be faster.

I am still trying to dig into the logistics of the Unordered SpanNearQuery, as it is the only thing hanging me up on adding payload access to Spans. I need to step through and debug. As your stack trace showed, there is a lot of work taking place to manage the priority queue that is created. I just don't understand the relation between the SpanCells, the "ordered" List and the PriorityQueue "queue" just yet. It seems the SpanCells make a linked list, the "ordered" list is for getting the spans from the sub queries and the queue seems to rearrange the ordered list

If anyone wants to chip in with pseudocode explaining what is going on in NearSpansUnordered.java it would be helpful.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to