David,

This is what I thought you were trying to do:

declare function local:new-text($query){
  let $query-text := cts:word-query-text($query)
  let $link := fn:concat("default.xqy?search=",$query-text)
  let $count := xdmp:estimate(cts:search(collection(),$query))
  return
    <a href="{$link}">{$query-text} [{$count}]</a>
};
let $text := <doc>The anterior glandular lobe of the pituitary gland, also 
known as the adenohypophysis. It secretes the ADENOHYPOPHYSEAL HORMONES that 
regulate vital functions such as GROWTH; METABOLISM; and REPRODUCTION.</doc>
let $q := cts:or-query(("ADENOHYPOPHYSEAL 
HORMONES","GROWTH","METABOLISM","REPRODUCTION"))
return
cts:highlight($text,$q,local:new-text($cts:queries))

--> 
<doc>The anterior glandular lobe of the pituitary gland, also known as the 
adenohypophysis. It secretes the <a href="default.xqy?search=ADENOHYPOPHYSEAL 
HORMONES">ADENOHYPOPHYSEAL HORMONES [0]</a> that regulate vital functions such 
as <a href="default.xqy?search=GROWTH">GROWTH [0]</a>; <a 
href="default.xqy?search=METABOLISM">METABOLISM [0]</a>; and <a 
href="default.xqy?search=REPRODUCTION">REPRODUCTION [0]</a>.</doc>

(I don't have any documents in my database that match those terms)

I think the key piece you were missing is the $cts:queries variable, which 
tells you which query matched the text. From the query that is returned you can 
use other functions to get the text from the query, etc.

In the example above counts are generated for each word, which requires an 
additional search for each word. If you wanted to link to the first matching 
document for each word, I would simply pass the appropriate text to a new 
request and run the query when the user clicks on the word rather than running 
all the searches and providing direct links to the document. That way, you 
could build the page in one search instead of potentially many of searches.

Kelly

Message: 1
Date: Wed, 18 Nov 2009 14:38:38 -0800
From: "Lee, David" <d...@epocrates.com>
Subject: RE: [MarkLogic Dev General] FreeText searching - Take Two
To: "General Mark Logic Developer Discussion"
        <general@developer.marklogic.com>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a713f37...@postoffice>
Content-Type: text/plain; charset="us-ascii"

Thanks but I think I'm missing something really subtle (or really
obvious !)
I dont want to hilight the results of the search ... I want to hilight
the results of a *previous* search given a new search.

I think this is hard to explain.

 

Let me try again.

Suppose a *previous search* got me a document with an element with this
string which I am displaying

 

 

"The anterior glandular lobe of the pituitary gland, also known as the
adenohypophysis. It secretes the ADENOHYPOPHYSEAL HORMONES that regulate
vital functions such as GROWTH; METABOLISM; and REPRODUCTION."

 

Now I want to find within this string  matches such as "GROWTH" across
the entire database, then link "GROWTH" to the result of that search,

but "METABOLISM" to the results of that matches to "METABOLISM"

 

So suppose I do a cts:search on  "GROWTH" , "ADENOHYPOPHYSEAL HORMONES"
, "METABOLISM" 

and get a bunch of results to many different documents

 

The result I want would ultimately be

 

The anterior glandular lobe of the pituitary gland, also known as the
adenohypophysis. It secretes the 
<A href=" get.xquery?doc3>ADENOHYPOPHYSEAL HORMONES</A> that regulate
vital functions such as <A href="get.xquery?doc1">GROWTH</A>;<A
href="get.xquery?doc1"> METABOLISM</A>; and REPRODUCTION."

 

I dont want to highlight the results of the search, I want to highlight
the original text , and replace say the document-uri of the
corresponding hit for that keyword.

 

 

So if I did as suggested

 

 

   let $paragraph = "... the above paragraph"

   $query  = (list of upper case words  in $paragraph)

 

for $res in cts:search(collection(), $query)

return

cts:highlight($paragraph, $query, <A href="?????">{$cts:text}</A>)

 

Whats the missing ????

Where do I find which result corresponded to which word in the original
search ?

I cant find any API that given a search result points into which part of
the query it matched.

The only way I can figure to do this is to do a new cts:search() for
each of the terms one by one that I want to match.

 

 

 

 

 

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny
Sokolsky
Sent: Wednesday, November 18, 2009 4:46 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] FreeText searching - Take Two

 

And when you do the cts:highlight, you typically do it on a subset of
the results (the first page, for example) so that you do not need to get
every result from your search, which as you said, would be expensive.
So the psuedo-code is something like:

 

let $query := "hello"

for $res in cts:search(collection(), $query)[1 to 10]

return

cts:highlight($res, $query, <b>{$cts:text}</b>)

 

The Search API (search:search, for example) does a lot of this for you,
so you might want to look into that for a more complete solution.

 

-Danny

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike
Sokolov
Sent: Wednesday, November 18, 2009 1:30 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] FreeText searching - Take Two

 

If I understood you right, what you want is:

let $query := cts:word-query ("ADENOHYPOPHYSEAL HORMONES", "GROWTH",
...)
for $result in cts:search ( , $query)
return cts:highlight ($result, $query, <match>$cts:text</match>)

and then arrange for <match> nodes to be highlighted in your output?

You'd guest a list of (whatever you searched for - documents?) with the
highlighting applied...

I'm confused because you seem to have already written that almost, so
maybe I'm missing something?

maybe what you're looking for is more consolidated output like:

return <result uri="{base-uri($result)}">{cts:highlight ( ... ) //
hit}</result> ?

-Mike 

Lee, David wrote: 

This is a refinement on the question I asked the other day.   I'm
getting better at formulating my questions so maybe the advise might be
closer :)

 

Suppose I do a query and get a XML document which has a field that has
text that looks like this :

 

"The anterior glandular lobe of the pituitary gland, also known as the
adenohypophysis. It secretes the ADENOHYPOPHYSEAL HORMONES that regulate
vital functions such as GROWTH; METABOLISM; and REPRODUCTION."

 

Those things in all upper case are likely terms that exist in other
documents.

What I'd like to do is to do a search for each of those terms across the
entire DB, and if found, create links to either the highest scored find,
or to a results page (either will do).

 

I've poked around and found many little things that are part of a
possible solution, but nothing that does exactly what I want.

 

For example: cts:hilight() could be used to add the links to the words.

and I could find a consolidated result set by using 

cts:search( ... , cts:word-query ( fn:tokenize(phrase) ) )   

 

to find all matches etc.    I an easily get all the upper case words and
create a search.

 

But my problem is this.  Suppose I create a search on all the upper case
words ("ADENOHYPOPHYSEAL" , "GROWTH" , "METABOLISM") 

and get a result back with cts:search()

 

How can I match up Which nodes of the result matched which word so I can
hilight them ?

e.g if I did

                for $result in  cts:search( ..... all the words )

              // Which word did $result match ??? 

 

 

My only thing I can think of is that i would have to iteratively loop
through the terms and do a search one by one.

 

    for $word in   ( big oh list of search words )

       for $result in cts:search( ... , $word )

                cts:hilight( $phrase , $word , { link the word } )

 

My guess is that this will perform horribly.    I'd rather get a single
consolidated search then do some magic like

      

       for $result in cts:search( ... , all the words )

              cts:hilight( $phrase , the word that matched $result )

 

 

Does this make any sense ? Is there an API or design pattern to do this
? Or should I do the outer loop instead ?

 

I looked at cts:walk but it looked like it to use it for this would
still involve looping on the cts:search() for each term matched.

 

 

Thanks for any advise.

 

   

         

 

 

 

 

 

----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

d...@epocrates.com

812-482-5224

 

 

 
 
 


________________________________



 
 
 
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20091118/2717e14d/attachment.html

------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general


End of General Digest, Vol 65, Issue 47
***************************************
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Reply via email to