Thanks Danny, but I'm not sure I follow. Maybe that was not the best 
explanation. Rather than use dashes like hyphens, I just want a search for 
something like "Venue ― Motion to Transfer" to ignore the dash when 
parsed. It appears to be treating it like a word instead and is not ignored:

cts:and-query(
  (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("―", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("to", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1)),
  ())

-Will

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Thursday, January 26, 2012 5:35 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?

Hi Will,

One thing you can do is change your search grammar to use a joiner other than 
the negative sign.

Here is the default grammar:

http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, January 26, 2012 4:34 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] en/em dashes punctuation?

Our search autocomplete pulls from doc titles, some of which contain en or em 
dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to Transfer" 
- search:parse parses it into the query, even though 
<term-option>punctuation-insensitive</term-option> is included in the <term> 
section of the search options node. I thought it may just be getting ignored 
when it's evaluated but it's definitely limiting the query.

I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[. 
instance of cts:punctuation] => "- - - ―"

But is there an exception here (the same way hyphens are always parsed to 
negate)? Do I just need to remove these from the query string before calling 
search:parse? If there is a cleaner way, that would be great.


Best,

Will
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to