On Aug 15, 2007, at 10:46 AM, Peter Keegan wrote:
Grant,
I built an index as described here:
http://www.nabble.com/SpanQuery-and-database-join-tf4262902.html
Many documents have only 1 or 2 rows, some have dozens.
Here is a typical query without spans:
+((+contents:quaker +contents:cereal) (+boost50:quaker
+boost50:cereal))
+literals:co$us), sort=<custom:"feedbabe":
[EMAIL PROTECTED]>,"dateactiveR"!
Here is a typical query with spans:
+spanNear([adliterals:jb$1, adliterals:co$us], 8, false)
+(+((+contents:quaker +contents:cereal) (+boost50:quaker
+boost50:cereal))
+literals:co$us), sort=<custom:"feedbabe":
[EMAIL PROTECTED]>,"dateactiveR"!
The addition of the spanNear clause caused the 10X decrease in
throughput. I
could probably change the way rows are indexed and use ordered
terms, which
seems to be a bit faster (only 5X decrease)
In looking at the code, it makes sense that an ordered SpanNearQuery
would be faster.
I am still trying to dig into the logistics of the Unordered
SpanNearQuery, as it is the only thing hanging me up on adding
payload access to Spans. I need to step through and debug. As your
stack trace showed, there is a lot of work taking place to manage the
priority queue that is created. I just don't understand the relation
between the SpanCells, the "ordered" List and the PriorityQueue
"queue" just yet. It seems the SpanCells make a linked list, the
"ordered" list is for getting the spans from the sub queries and the
queue seems to rearrange the ordered list
If anyone wants to chip in with pseudocode explaining what is going
on in NearSpansUnordered.java it would be helpful.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]