Quite right you are. There were some errors in our Unicode tables,
which were also built against an older version of the Unicode standard.
As of MLS 4.2: matches("ˌ","\p{L}") => true()
//Mary
On Thu, 03 Jun 2010 17:03:23 -0700, Michael Sokolov <[email protected]>
wrote:
> I ran across an anomaly in MarkLogicb this week while trying to evaluate
> a
> regular expression replacement using the Letter class:
>
> replace ($string, "\P{L}", "")
>
> Some characters which are classed as letters AFAICT according to Unicode
> are
> not treated as letters by MarkLogic. For example, ˌ, "MODIFIER
> LETTER
> LOW VERTICAL LINE" is treated as a non-letter.
>
> This link spells out the details:
> http://www.fileformat.info/info/unicode/char/02cc/index.htm
>
> I wouldn't even have noticed if it weren't for the fact that Saxon did
> something different from ML - and I think Java would do the same (based
> on
> the evidence on the link above, I haven't tested myself) - in Saxon I
> had to
> use the "modifier letter" class: \P{Lm} to remove these characters.
>
> I have to say, it doesn't look like a letter to me (it's a little line -
> a
> stress marker): MarkLogic performed as I was expecting, at first, but
> that's
> only because I am not a walking Unicode standard. I think I'd prefer it
> if
> ML adhered closely to the UC standard in cases like this, even if it's
> counterintuitive, if only so that it would behave the same as other
> standards-compliant software.
>
> -Mike
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general