RE: [MarkLogic Dev General] Frequency and Search Results in 3.2

Ian Small Fri, 25 May 2007 21:51:58 -0700

Bilal -
 
Thanks for the excellent explanation of what you're trying to
accomplish.  It is quite believable that your search may return more
results than the frequency analysis.  Here are some things to consider:
 
1) Depending on the collation used,  element-values() will returns
results that are case and diacritic distinct.  In this case, I'm
expecting that the collation is either the codepoint or root collation.
In either case, element-values() will, for instance, return "water" as
distinct from "Water".
 
2) If element-values() returns a lower-case, diacritic-less result as
the most frequently occurring result, and then you pass this as an
argument to element-query(), element-word-query() or
element-value-query(), then the default search rules will cause that to
match on a case-insensitive, diacritic-insensitive basis.  This could be
a cause of more results being returned.  You can override this by
providing options to the query constructor that force a case-sensitive,
diacritic-sensitive, etc. basis.  There is a code pattern illustrated in
the developer guide (chapter on lexicons) that shows you how to do this
correctly.  
 
3) element-query() does not perform a QName to value match.  It performs
a constructor in (QName and descendants) match.  So there are a couple
of things here that may be going on:
a) if $kwd has descendants, you may get unexpected matches.  however, if
$kwd has descendants, the lexicon itself could be a bit wacky.  so let's
assume that's not the problem.
b) if $kwd has multiple words in it (let's say "water works"), then it
would match an argument of "water" or "works".  this could be a cause of
more results being returned.  i suspect you should be using
element-value-query(), not element-query()
 
My suggestion is that you should actually do the cts:search(), and in a
FLWR over all the matching docs, return only those which do not exactly
match your expected $elem.  By looking at the pattern of the
"non-matching" results, you'll pretty rapidly get an idea of which of
the above (or some other case I didn't think of off the top of my head)
is happening to you, and your course of action should be pretty clear.
If it's not, then post again, telling us the value of $elem, and giving
us sample of the <kwd> elements that are being returned by the search
that you don't think should be matching.
 
Cheers
ian
 
 
________________________________


From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bilal
Khalid
Sent: Thursday, May 24, 2007 10:43 AM
To: [email protected]
Subject: Re: [MarkLogic Dev General] Frequency and Search Results in 3.2


Ian,

Thanks for pointing out the oddness in the query I pasted. I've
attempted to modify the code to fix that as well as another glaring
oversight, and hopefully with a little annotation my question will make
more sense. Please do correct any misconceptions I have at any stage. 


(: Say we want to retrieve the top most frequently used keyword tied to
the word "water" from our database. I would do it thusly, using the
value lexicon on the element "kwd" that we've created beforehand. :) 

let $q := cts:word-query("water")
let $elem := cts:element-values(xs:QName("kwd"), "",
("frequency-order","fragment-frequency"), $q)[1]
let $freq := cts:frequency($elem) 

(: Now, I would like to see all documents in my database that use this
retrieved keyword ($elem). My understanding is that the search I perform
should return the same number of results as is indicated by the
frequency of the keyword $elem. I've attempted this by using a
cts:and-query of my original word-query and a cts:element-query :) 

let $cnt := xdmp:estimate(cts:search(doc(), cts:and-query( ($q,
cts:element-query(xs:QName("kwd"), $elem)) ) ))

(: When I compare the two numerical results however, I find that they
aren't the same. The search results tend to be greater than the
frequency count :) 

return concat ($elem, " (freq: ", $freq, ") (search: ", $cnt, ")")


I think my misunderstanding lies in the query I'm using to perform the
search, so it would be great if I could be pointed in the right
direction with regards to that. 

Thanks,

-Bilal




        From: Ian Small <[EMAIL PROTECTED]>
        Date: May 23, 2007 5:47 PM
        Subject: RE: [MarkLogic Dev General] Frequency and Search
Results in 3.2
        To: General Mark Logic Developer Discussion <
[email protected] <mailto:[email protected]>
>
        
        
        Bilal -
         
        It's not clear what you're trying to accomplish from the code
fragment below, since your use of cts:element-query() on QName kwd as a
filter to the value lexicon on element QName kwd is puzzling to me.
         
        If you'd like to explain what your objective is, we might be
able to suggest a simpler code fragment, and then address why things do
or don't match.  At a glance, the code fragment you've provided is (sort
of) doing two different things between the element-values()/frequency
call and the xdmp:estimate() call.  But I'd rather start from your
objective than try to unwind this code sample.
         
        Cheers
        ian

________________________________

        From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bilal
Khalid
        Sent: Wednesday, May 23, 2007 2:40 PM
        To: [email protected]
        Subject: [MarkLogic Dev General] Frequency and Search Results in
3.2
        
        
        
        Hello All,
        
        I've been trying to resolve the frequency count (retrieved
through a function like cts:element-values) into actual search results
in 3.2, but the number of results does not match the frequency count.
The example below shows the query I've been trying out in CQ. 
        
        let $elem := cts:element-values(xs:QName("kwd"), "",
("frequency-order","fragment-frequency"),
cts:element-query(xs:QName("kwd"), "water"))[1]
        let $freq := cts:frequency($elem) 
        let $cnt := xdmp:estimate(cts:search(doc(),
cts:element-query(xs:QName("kwd"), $elem)))
        return concat ($elem, " (freq: ", $freq, ") (search: ", $cnt,
")")
        
        Please note that I use cts:element-query in both lexicon and
search queries. The only difference is that the search parameter in the
second query is changed to the first item of the results from the
lexicon. 
        
        -Bilal 

        _______________________________________________
        General mailing list
        [email protected] 
        http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Frequency and Search Results in 3.2

Reply via email to