Bilal - Thanks for the excellent explanation of what you're trying to accomplish. It is quite believable that your search may return more results than the frequency analysis. Here are some things to consider: 1) Depending on the collation used, element-values() will returns results that are case and diacritic distinct. In this case, I'm expecting that the collation is either the codepoint or root collation. In either case, element-values() will, for instance, return "water" as distinct from "Water". 2) If element-values() returns a lower-case, diacritic-less result as the most frequently occurring result, and then you pass this as an argument to element-query(), element-word-query() or element-value-query(), then the default search rules will cause that to match on a case-insensitive, diacritic-insensitive basis. This could be a cause of more results being returned. You can override this by providing options to the query constructor that force a case-sensitive, diacritic-sensitive, etc. basis. There is a code pattern illustrated in the developer guide (chapter on lexicons) that shows you how to do this correctly. 3) element-query() does not perform a QName to value match. It performs a constructor in (QName and descendants) match. So there are a couple of things here that may be going on: a) if $kwd has descendants, you may get unexpected matches. however, if $kwd has descendants, the lexicon itself could be a bit wacky. so let's assume that's not the problem. b) if $kwd has multiple words in it (let's say "water works"), then it would match an argument of "water" or "works". this could be a cause of more results being returned. i suspect you should be using element-value-query(), not element-query() My suggestion is that you should actually do the cts:search(), and in a FLWR over all the matching docs, return only those which do not exactly match your expected $elem. By looking at the pattern of the "non-matching" results, you'll pretty rapidly get an idea of which of the above (or some other case I didn't think of off the top of my head) is happening to you, and your course of action should be pretty clear. If it's not, then post again, telling us the value of $elem, and giving us sample of the <kwd> elements that are being returned by the search that you don't think should be matching. Cheers ian ________________________________
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bilal Khalid Sent: Thursday, May 24, 2007 10:43 AM To: [email protected] Subject: Re: [MarkLogic Dev General] Frequency and Search Results in 3.2 Ian, Thanks for pointing out the oddness in the query I pasted. I've attempted to modify the code to fix that as well as another glaring oversight, and hopefully with a little annotation my question will make more sense. Please do correct any misconceptions I have at any stage. (: Say we want to retrieve the top most frequently used keyword tied to the word "water" from our database. I would do it thusly, using the value lexicon on the element "kwd" that we've created beforehand. :) let $q := cts:word-query("water") let $elem := cts:element-values(xs:QName("kwd"), "", ("frequency-order","fragment-frequency"), $q)[1] let $freq := cts:frequency($elem) (: Now, I would like to see all documents in my database that use this retrieved keyword ($elem). My understanding is that the search I perform should return the same number of results as is indicated by the frequency of the keyword $elem. I've attempted this by using a cts:and-query of my original word-query and a cts:element-query :) let $cnt := xdmp:estimate(cts:search(doc(), cts:and-query( ($q, cts:element-query(xs:QName("kwd"), $elem)) ) )) (: When I compare the two numerical results however, I find that they aren't the same. The search results tend to be greater than the frequency count :) return concat ($elem, " (freq: ", $freq, ") (search: ", $cnt, ")") I think my misunderstanding lies in the query I'm using to perform the search, so it would be great if I could be pointed in the right direction with regards to that. Thanks, -Bilal From: Ian Small <[EMAIL PROTECTED]> Date: May 23, 2007 5:47 PM Subject: RE: [MarkLogic Dev General] Frequency and Search Results in 3.2 To: General Mark Logic Developer Discussion < [email protected] <mailto:[email protected]> > Bilal - It's not clear what you're trying to accomplish from the code fragment below, since your use of cts:element-query() on QName kwd as a filter to the value lexicon on element QName kwd is puzzling to me. If you'd like to explain what your objective is, we might be able to suggest a simpler code fragment, and then address why things do or don't match. At a glance, the code fragment you've provided is (sort of) doing two different things between the element-values()/frequency call and the xdmp:estimate() call. But I'd rather start from your objective than try to unwind this code sample. Cheers ian ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bilal Khalid Sent: Wednesday, May 23, 2007 2:40 PM To: [email protected] Subject: [MarkLogic Dev General] Frequency and Search Results in 3.2 Hello All, I've been trying to resolve the frequency count (retrieved through a function like cts:element-values) into actual search results in 3.2, but the number of results does not match the frequency count. The example below shows the query I've been trying out in CQ. let $elem := cts:element-values(xs:QName("kwd"), "", ("frequency-order","fragment-frequency"), cts:element-query(xs:QName("kwd"), "water"))[1] let $freq := cts:frequency($elem) let $cnt := xdmp:estimate(cts:search(doc(), cts:element-query(xs:QName("kwd"), $elem))) return concat ($elem, " (freq: ", $freq, ") (search: ", $cnt, ")") Please note that I use cts:element-query in both lexicon and search queries. The only difference is that the search parameter in the second query is changed to the first item of the results from the lexicon. -Bilal _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
