> Another question: having gotten a result from a reverse search at the full > document level, is there a way to know *which* queries matched? If so then it > would be easy enough to apply those queries to the relevant elements to do > additional filtering (although I suppose that might get us back to the same > place).
I'm a little confused. You're putting multiple serialized queries into each document? If you have just one serialized query in a document it's going to be obvious which query was the reverse match -- it was that one. > In particular, if I have 125,000 reverse queries applied to a single document > (assuming that total database volume doesn’t affect query speed in this case) > on a modern fast server with appropriate indexes in place, how fast should I > expect that query to take? 1ms?, 10ms?, 100ms? 1 second? If you have 125,000 documents each with a serialized query in it and you do a reverse query for one document against those serialized queries and there's no hits, it should be extremely fast. More hits will slow things a little bit because hits involve a little work. The IMLS paper explains what the algorithm has to do. I suspect (but haven't measured) that it's a lot like forward queries in that the timing depends a lot on number of matches. > Our corpus has about 25 million elements that would be fragments per the > advice above (about 1.5 million full documents). If you have 25 million elements you want to run against 125,000 serialized queries, wouldn't forward queries be faster? You'd only have to do 125,000 search calls instead of 25,000,000. :) > I’ve never done much with fragments in MarkLogic so I’m not sure what the > full implication of making these subelements into fragments would be for > other processing. Yeah, fragmentation is not to be done lightly. -jh- _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
