On 06/08/2012 05:16 AM, Ulf Zibis wrote:


Is there any spec weather the Java Regex API has a general contract with 16-bit chars or Unicode codepoints?

The regex spec says Pattern and Matcher work ON character sequence with the reference to CharSequence interface, but the pattern itself does support Unicode character via various regex constructors and flags. An empty String pattern is really a corner case here, it does not say anything about "character", the current implementation interprets it as each, every stop when you iterate through the target CharSequence. It might not be desirable for some
use scenario, but not not-reasonable.


Additionally I like to discuss: "any possible zero-width position of the target String" If String length is l, maybe it's arguable, that position l is no valid position in the String.

If you considering those "boundary matcher" regex constructs, it might be reasonable to consider this "invalid position" as a valid when using regex. I think must of other
regex engines do the same thing, for example, the perl.

$mystring="Peter";
$mystring =~ s// /g;
printf "[%s]\n", $mystring;
[ P e t e r ]

But I have to say you might have a point here:-)

-Sherman

From the use case point of view, I think "P e t e r" as result of "Peter".replaceAll("", " ") is the most useful.


Reply via email to