That is a good idea. Unfortunately I couldn't get it to work. I added a custom
dictionary entry for "judgement", followed by a reindex. Maybe you can't do
this with stemming? The documentation suggests you can, giving as examples
"aluminum" and "aluminium," but there's no code example.
cdict:dictionary-read("en") =>
<cdict:dictionary xmlns:cdict="http://marklogic.com/xdmp/custom-dictionary"
xml:lang="en">
<cdict:entry>
<cdict:word>Judgement</cdict:word>
<cdict:stem>Judgment</cdict:stem>
<cdict:pos>Nn</cdict:pos>
</cdict:entry>
</cdict:dictionary>
xdmp:estimate(cts:search(//document,cts:word-query("judgement","case-insensitive"))
=> 0
xdmp:estimate(cts:search(//document,cts:word-query("judgment","case-insensitive"))
=> 3220
-Will
From: [email protected]
[mailto:[email protected]] On Behalf Of Harry B.
Sent: Wednesday, July 18, 2012 4:48 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] spell:suggest behavior
Interesting caveat. I guess that's why spell check didn't scream at me. I
wonder if language-specific stemming could handle this? (specifying en-us) This
is similar to color vs. colour in my mind...relying on stemming, you could get
results with both spellings.
On Jul 18, 2012 5:31 PM, "Will Thompson"
<[email protected]<mailto:[email protected]>> wrote:
Harry - I think in the UK the extra "e" is correct, but "judgment" is the
correct spelling in the US. This is really the only word giving us trouble (our
content is law-related), and I have some simple logic in place to check and
provide the correct suggestion, but I thought there might be a better way.
Thank you for this info!
-Will
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
On Behalf Of Harry B.
Sent: Wednesday, July 18, 2012 12:52 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] spell:suggest behavior
Are you sure you have these the right way around? Judgement is spelled
correctly...
It looks like the double-metaphones may be giving more weight to the
suggestions than you'd want in this case. The word distances and Levenshtein
distances are higher for the suggestions than for judgment. I don't know of a
way around this sort of thing. I've run across words from time to time where
the suggestions aren't what I'd expect or not available. In my test dictionary,
taking the other words out so that judgement was the only word in the
dictionary still didn't correct judgment to judgement. I think this is a word
where you'll have to have other logic that can catch this specific misspelling
(before using the dictionary to check spelling, look at another list and see if
the word is there). This should be able to be done in a performant way. In
fact, regular expressions run extremely fast and you could have a list of words
you come across like this that need forced suggestions.
spell:double-metaphone("judgment") => jtkmnt atkmnt
spell:double-metaphone("judgement") => jjmnt ajmnt
spell:double-metaphone("augment") => akmnt
spell:double-metaphone("oddment") => akmnt
spell:double-metaphone("element") => almnt
On Wed, Jul 18, 2012 at 12:55 PM, Will Thompson
<[email protected]<mailto:[email protected]>> wrote:
Is there a way to force different behavior of spell:suggest()? For example,
although the correct spelling, "judgment," is in the dictionary, these are the
suggestions for the most common misspelling:
spell:suggest("jmp-dictionary.xml", "judgement")
=> augment element oddment Sagemont easement regiment
As far as I can tell, this is not correctable with a custom dictionary.
-Will
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general