Re: [MarkLogic Dev General] Exact match in search:search implementation

Erik Hennum Wed, 14 Jan 2015 05:25:45 -0800

Hi, Shruti:

MarkLogic calculates facets over indexes.  To calculate facets over the 
filtered result set, the query would have to read every document in the result 
set on every query, which would scale poorly.


The best approach is to tune the indexes so an unfiltered search closely 
approximates the desired result set and, if necessary, use filtering to weed 
out a few exception cases.

If the result set is very small, you could extract the XML elements or JSON 
properties for the facets and do the facet counts in your own code.  For most 
adopters, however, the result set is much too large for that to be a 
possibility.


Erik Hennum

________________________________
From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of shruti kapoor 
[shrutikapoor....@gmail.com]
Sent: Tuesday, January 13, 2015 10:04 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Exact match in search:search implementation

Hi Erik,

I am calculating the total number of results by counting the search:results. 
For the search facets - the count attribute for each search:facet-value is 
used, which is not in sync with the search-results count.
What could I do create facets of only the filtered search results.


Thanks
Shruti Kapoor

On Tue, Jan 13, 2015 at 8:44 PM, Erik Hennum 
<erik.hen...@marklogic.com<mailto:erik.hen...@marklogic.com>> wrote:
Hi, Shruti:

In general, the total is an estimate based on the indexes which will be exactly 
accurate only if the indexes are turned so the search can run unfiltered. [1]

To generate an accurate count for a filtered search, the Search API would have 
to retrieve and count every document, which would not scale.

Can you expand on how you're using the total?  You may be able to solve the 
problem in other ways. For instance, you can terminate paging on the first page 
with fewer than page-length results.


Erik Hennum

[1]  The exception is that, starting in 8.0, the Search API calculates the 
total on the last page based on the number of documents paged.


________________________________
From: 
general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 
[general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>]
 on behalf of shruti kapoor 
[shrutikapoor....@gmail.com<mailto:shrutikapoor....@gmail.com>]
Sent: Tuesday, January 13, 2015 2:00 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Exact match in search:search implementation

Hi

Thanks Peter & David. However I have one more problem.
When I apply <search-option>filtered</search-option>, the total search result 
count and the facet count do not match. How can I return the facet count on the 
basis of the filtered search results?



Thanks
Shruti Kapoor

On Tue, Jan 6, 2015 at 3:48 PM, David Ennis 
<david.en...@hinttech.com<mailto:david.en...@hinttech.com>> wrote:
HI.

Peter, you are correct in what was missing in the search.  However, you are 
right in being confused about the search:qtext.....

brace yourself..

Another query console gotcha...  Using 'auto' formats it as html and removes 
the extra whitespace..  When viewed in raw format, your search:report reflects 
the double space and verifies your solution:

Your example expanded to show search parse as well yields the following 
(expected) whitespace when viewed in raw format:

xquery version "1.0-ml";
declare namespace host = "http://marklogic.com/xdmp/status/host";;
import module namespace search = "http://marklogic.com/appservices/search";
     at "/MarkLogic/appservices/search/search.xqy";

let $options := <options xmlns="http://marklogic.com/appservices/search";>
      <term>
                <term-option>case-sensitive</term-option>
                <term-option>diacritic-sensitive</term-option>
                <term-option>punctuation-sensitive</term-option>
                <term-option>whitespace-sensitive</term-option>
                <term-option>unstemmed</term-option>
                <term-option>unwildcarded</term-option>
      </term>
      <debug>true</debug>
      <search-option>filtered</search-option>
    </options>

return (
  search:search('"Hello  world"',$options),
  search:parse('"Hello  world"',$options)
)


<search:response snippet-format="snippet" total="0" start="1" page-length="10" 
xmlns:search="http://marklogic.com/appservices/search";>
  <search:qtext>"Hello  world"</search:qtext>
  <search:report id="SEARCH-FLWOR">(cts:search(fn:collection(), 
cts:word-query("Hello  world", 
("case-sensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","unstemmed","unwildcarded","lang=en"),
 1), ("filtered",cts:score-order("descending")), 1))[1 to 10]</search:report>
  <search:metrics>
    <search:query-resolution-time>PT0.00387S</search:query-resolution-time>
    <search:facet-resolution-time>PT0.00005S</search:facet-resolution-time>
    <search:snippet-resolution-time>PT0S</search:snippet-resolution-time>
    <search:total-time>PT0.018433S</search:total-time>
  </search:metrics>
</search:response>
<cts:word-query xmlns:cts="http://marklogic.com/cts";>
  <cts:text xml:lang="en">Hello  world</cts:text>
  <cts:option>case-sensitive</cts:option>
  <cts:option>diacritic-sensitive</cts:option>
  <cts:option>punctuation-sensitive</cts:option>
  <cts:option>whitespace-sensitive</cts:option>
  <cts:option>unstemmed</cts:option>
  <cts:option>unwildcarded</cts:option>
</cts:word-query>






Kind Regards,
David Ennis


David Ennis
Content Engineer

[HintTech] <http://www.hinttech.com/>
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[http://www.hinttech.com]<http://www.hinttech.com> [X] 
<https://twitter.com/HintTech>  [X] <http://www.facebook.com/HintTech>  [X] 
<http://www.linkedin.com/company/HintTech>

On 6 January 2015 at 10:06, Peter Kester 
<peter.kes...@marklogic.com<mailto:peter.kes...@marklogic.com>> wrote:
Hi Shruti,

MarkLogic does what it is supposed to do. See below sample

xquery version "1.0-ml";

xdmp:document-insert("/foo.xml", <foo>Hello  world</foo>);
xdmp:document-insert("/foo2.xml", <foo>Hello world</foo>);

xquery version "1.0-ml";
declare namespace host = "http://marklogic.com/xdmp/status/host";;
import module namespace search = "http://marklogic.com/appservices/search";
     at "/MarkLogic/appservices/search/search.xqy";

let $options := <options xmlns="http://marklogic.com/appservices/search";>
      <term>
                <term-option>case-sensitive</term-option>
                <term-option>diacritic-sensitive</term-option>
                <term-option>punctuation-sensitive</term-option>
                <term-option>whitespace-sensitive</term-option>
                <term-option>unstemmed</term-option>
                <term-option>unwildcarded</term-option>
      </term>
      <debug>true</debug>
      <search-option>filtered</search-option>
    </options>

return search:search('"Hello  world"',$options)

This will give you:
<search:response snippet-format="snippet" total="1" start="1" page-length="10" 
xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns=""xmlns:search="http://marklogic.com/appservices/search";>
<search:result index="1" uri="/foo.xml" path="fn:doc("/foo.xml")" score="47104" 
confidence="0.5532626" fitness="0.6769772">
<search:snippet>
<search:match path="fn:doc("/foo.xml")/foo">
<search:highlight>
Hello world
</search:highlight>
</search:match>
</search:snippet>
</search:result>
<search:qtext>
"Hello world"
</search:qtext>
<search:report id="SEARCH-FLWOR">
(cts:search(fn:collection(), cts:word-query("Hello world", 
("case-sensitive","diacritic-sensitive","punctuation-sensitive","whitespace-sensitive","unstemmed","unwildcarded","lang=en"),
 1), ("filtered"), 1))[1 to 10]
</search:report>
<search:metrics>
<search:query-resolution-time>
PT0.004736S
</search:query-resolution-time>
<search:facet-resolution-time>
PT0.000051S
</search:facet-resolution-time>
<search:snippet-resolution-time>
PT0.000603S
</search:snippet-resolution-time>
<search:total-time>
PT0.021619S
</search:total-time>
</search:metrics>
</search:response>

As you can see it returns the correct document foo.xml and not foo2.xml. The 
difference is that you also need to specify unfiltered in the options section.
I’m not sure why the double spaces are not reflected in the search:qtext or the 
cts:word-query in the search:report.

Hope this helps.

Peter


Peter Kester
Senior Consultant
peter.kes...@marklogic.com<mailto:peter.kes...@marklogic.com>
+31 611188543<tel:%2B31%20611188543>
http://nl.linkedin.com/in/peetkes/

[Description: Description: cid:image001.png@01CCB980.BB82DD90]

[Description: Description: MarkLogic Corporation]
Graadt van Roggenweg, 328-334, 3531 AH Utrecht
http://www.marklogic.com/


New generation databases, you just need to think differently 
www.nosqlfordummies.com<http://t.co/YKkJ0Wxseo>




_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general




--

Regards,
Shruti Kapoor

Software Engineer
Innodata India Pvt. Ltd.

7th floor, Stellar IT Park,

Sector 62, Noida, Uttar Pradesh 201309, India
Cell: (+91) 9990340628

Email: skapo...@innodata.com<http://www.anmsoft.com/>  | Web: 
www.innodata.com<http://www.anmsoft.com/>

_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general




--

Regards,
Shruti Kapoor

Software Engineer
Innodata India Pvt. Ltd.

7th floor, Stellar IT Park,

Sector 62, Noida, Uttar Pradesh 201309, India
Cell: (+91) 9990340628

Email: skapo...@innodata.com<http://www.anmsoft.com/>  | Web: 
www.innodata.com<http://www.anmsoft.com/>

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Exact match in search:search implementation

Reply via email to