Mike - This is also what I have found. Search:parse has to actually return this 
"empty" query for the <empty> option to have any effect:

<cts:and-query qtextempty="1" xmlns:cts="http://marklogic.com/cts"/>

When it is passed punctuation text and "punctuation-insensitive" in options it 
returns:

<cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts";>
  <cts:text>,</cts:text>
  <cts:option>punctuation-insensitive</cts:option>
</cts:word-query>

The same problem occurs with "whitespace-insensitive" in options and 
search:parse("&nbsp;",$options):

<cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts";>
  <cts:text> </cts:text>
  <cts:option>whitespace-insensitive</cts:option>
</cts:word-query>

Both these queries are unaffected by <empty apply="all-results"/> and return no 
results. I don't think this is desirable for any application. Ideally I think 
Search API would provide an option to behave like your parser or for 
search:parse to return empty queries for these scenarios.

Stripping out punctuation from the input query is a decent workaround, but we 
have to be careful not strip out characters that could be part of a constraint, 
phrase, custom grammar, etc., so the regex gets uglier.

-Will

 
-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Michael Blakeley
Sent: Wednesday, February 01, 2012 10:56 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] element-query with punctuation insensitive 
and punctuation marks as cts:text

In cases like this it's worth looking at the query output. The search:parse 
function produces this:

  <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" 
xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query qtextpre="&quot;" qtextref="cts:text" qtextpost="&quot;">
      <cts:text>metal</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>unstemmed</cts:option>
      <cts:option>punctuation-insensitive</cts:option>
    </cts:word-query>
    <cts:and-query strength="20" qtextjoin="" qtextgroup="( )">
      <cts:word-query qtextref="cts:text">
        <cts:text>,</cts:text>
        <cts:option>case-insensitive</cts:option>
        <cts:option>unstemmed</cts:option>
        <cts:option>punctuation-insensitive</cts:option>
      </cts:word-query>
      <cts:word-query qtextpre="&quot;" qtextref="cts:text" qtextpost="&quot;">
        <cts:text>locker</cts:text>
        <cts:option>case-insensitive</cts:option>
        <cts:option>unstemmed</cts:option>
        <cts:option>punctuation-insensitive</cts:option>
      </cts:word-query>
    </cts:and-query>
  </cts:and-query>

See the cts:text entry for ','? After some testing with 5.0-2, my guess is that 
since ',' is the only character in that punctuation-insensitive word-query, 
that word-query term ends up not matching anything. I think it should match 
*everything*, which would also cause problems if search:parse created that 
query. But whether the existing behavior is a bug or not, the workaround should 
be simple: rewrite the input query so that it does not contain any punctuation. 
This might be suitable:

  replace($query, '[^\w\s]', ' ')

Or you might look into using https://github.com/mblakele/xqysp with 
search:resolve(). XQYSP ignores unexpected punctuation unless it is part of a 
quoted term.

-- Mike

On 1 Feb 2012, at 09:21 , Will Thompson wrote:

> Abhishek - I recently had a very similar issue with empty searches and 
> punctuation, and the solution appeared to be adding <empty 
> apply="all-results" /> to search options. However, after further testing, I 
> am also getting empty results. For example,
>  
> let $options :=
> <options xmlns="http://marklogic.com/appservices/search";>               
>   <term>
>       <empty apply="all-results" />
>       <term-option>punctuation-insensitive</term-option>
>   </term>
>   <searchable-expression>//doc</searchable-expression>
> </options>
> let $empty :=
> <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts";>
>   <cts:text>;</cts:text>
>   <cts:option>punctuation-insensitive</cts:option>
> </cts:word-query>
> return
> search:resolve($empty,$options)
>  
> This returns no results, and the value of @apply does not seem to have any 
> effect. I think this is probably a bug.
>  
> -Will
>  
>  
> From: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] On Behalf OfAbhishek53 S
> Sent: Wednesday, February 01, 2012 2:55 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] element-query with punctuation 
> insensitive and punctuation marks as cts:text
>  
> 
> Hi Geert, 
> 
> Here is the sample query I used 
> 
> import module namespace search = "http://marklogic.com/appservices/search"; 
>                                     at 
> "/MarkLogic/appservices/search/search.xqy"; 
> let $parsed-query := search:parse('"metal" , "locker"', 
>                         <options 
> xmlns="http://marklogic.com/appservices/search";> 
>                             
>                                 <search-option>unfiltered</search-option> 
>                                 <term> 
>                                   <empty apply="all-results" /> 
>                                   <term-option>case-insensitive</term-option> 
>                                   <term-option>unstemmed</term-option> 
>                                   
> <term-option>punctuation-insensitive</term-option> 
>                                 </term>                                 
>                                                                               
>               
>                         </options>) 
> 
> let $query := cts:element-query(xs:QName("data"),cts:query($parsed-query)) 
> return 
> 
> xdmp:estimate(cts:search(fn:doc(), 
>                 $query)) 
> 
> 
> 
> Thanks 
> Abhishek Srivastav
> Tata Consultancy Services
> Cell:- +91-9883389968
> Mailto: abhishek5...@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.        IT Services
>                        Business Solutions
>                        Outsourcing
> ____________________________________________ 
> 
> 
> From:
> Abhishek53 S <abhishek5...@tcs.com>
> To:
> General MarkLogic Developer Discussion <general@developer.marklogic.com>
> Date:
> 02/01/2012 04:17 PM
> Subject:
> Re: [MarkLogic Dev General] element-query with punctuation insensitive and 
> punctuation marks as cts:text
> Sent by:
> general-boun...@developer.marklogic.com
>  
> 
> 
> 
> 
> Hi Geert, 
> 
> Thanks for your response. Currently I am not inclined towards removing the 
> word-query with punctuation marks (Until it will be the last option to do) 
> from the main query. I am using search:parse function to parse the search 
> term. 
> 
> I tried with your 3rd option but still unable to get the expected result 
> [count without punctuation (,) = count with punctuation (,) as 
> punctuation-insensitive]. If I can recall it correctly this term option is 
> used to send result or not when the term is empty terms how this would help 
> me in this case... 
> 
> Thanks for you help! 
> 
> Abhishek Srivastav
> Tata Consultancy Services
> Cell:- +91-9883389968
> Mailto: abhishek5...@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.        IT Services
>                       Business Solutions
>                       Outsourcing
> ____________________________________________
> 
> From:
> Geert Josten <geert.jos...@dayon.nl>
> To:
> General MarkLogic Developer Discussion <general@developer.marklogic.com>
> Date:
> 02/01/2012 03:26 PM
> Subject:
> Re: [MarkLogic Dev General] element-query with punctuation insensitive and 
> punctuation marks as cts:text
> Sent by:
> general-boun...@developer.marklogic.com
>  
> 
> 
> 
> 
> Hi Abishek, 
>  
> What is happening here is that you pass ',' as search term to a word-query 
> with 'punctuation-insensitive' option. That option causes the comma character 
> effectively to be stripped out of the search term, leaving an empty search 
> term. Doing a cts:word-query with an empty search term results nothing. 
>  
> I think you have few options: 
> 1.      Don't tokenize the search string yourself (at least, if that is what 
> you are doing), and pass in 'metal,' or ', metal' as search term with 
> punctuation insensitive. That is effectively the same as searching for 
> 'metal'. 
> 2.      Strip punctuation yourself before parsing it to <cts:query> element 
> structure (or post-process the query element structure to filter out 
> punctuation-only queries) 
> 3.      Add <empty apply="all-results" /> to your search options (I'm 
> guessing you are using search:parse, so to the options you pass in there) 
>  
> Kind regards, 
> Geert 
>  
> Van: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] NamensAbhishek53 S
> Verzonden: woensdag 1 februari 2012 10:30
> Aan: General MarkLogic Developer Discussion
> Onderwerp: [MarkLogic Dev General] element-query with punctuation insensitive 
> and punctuation marks as cts:text 
>  
> 
> Hi Folks, 
> 
> I am not sure if I am wrong somewhere while explaining this issue of 
> punctuation-insensitive search with punctuation marks as cts:text 
> (element-query). While executing the below query I am not getting any count 
> back because punctuation mark is not ignored during search (even if 
> punctuation-insensitive). The expected behavior of our application is always 
> punctuation-insensitive . If I remove word query with punctuation marks, It 
> will start returning count based on remaining search criteria. On the other 
> hand word query with punctuation-sensitive option is behaving similar to it 
> is ignored from the search criteria. 
> 
> 
> Please let me know how to make this element-query punctuation insensitive 
> even if punctuation marks are present into cts:text node of word-query . 
> xdmp:estimate(cts:search(fn:doc(), 
>                cts:query( 
>                    <cts:element-query> 
>                      <cts:element xmlns="">data</cts:element> 
>                            <cts:and-query> 
>                              <cts:word-query> 
>                                  <cts:text xml:lang="en">,</cts:text> 
>                                  <cts:option>case-insensitive</cts:option> 
>                                  
> <cts:option>punctuation-insensitive</cts:option> 
>                                  <cts:option>unstemmed</cts:option> 
>                               </cts:word-query> 
>                               <cts:word-query> 
>                                  <cts:text xml:lang="en">metal</cts:text> 
>                                  <cts:option>case-insensitive</cts:option> 
>                                  
> <cts:option>punctuation-insensitive</cts:option> 
>                                  <cts:option>unstemmed</cts:option> 
>                               </cts:word-query> 
>                           </cts:and-query> 
>                    </cts:element-query> 
>               )))
> 
> Thanks & Regards 
> Abhishek Srivastav
> Tata Consultancy Services
> Cell:- +91-9883389968
> Mailto: abhishek5...@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.        IT Services
>                       Business Solutions
>                       Outsourcing
> ____________________________________________
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you_______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general_______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to