At 11:05 AM 3/23/2001 -0600, Garrett Goebel wrote:

>From: Nicholas Clark [mailto:[EMAIL PROTECTED]]
> >
> > On Thu, Mar 22, 2001 at 04:10:28PM -0500, Dan Sugalski wrote:
> > > 1) All Unicode data perl does regular expressions against
> > >    will be in Normalization Form C, except for...
> > > 2) Regexes tagged to run against a decomposed form will
> > >    instead be run against data in Normalization Form D.
> > >   (What the tag is at the perl level is  up for grabs. I'd
> > >   personally choose a D suffix)
> > > 3) Perl won't otherwise force any normalization on data
> > >    already in Unicode format.
> >
> > So if I understand that correctly, running a regexp against a
> > scalar will cause that scalar to become normalized in a
> > defined way (C or  D, depending on regexp)
>
>I'm not sure whether to read that as resulting in scalar being normalized, 
>or if the "data perl does the regular expressions against" would be a 
>normalized copy of that scalar's value.

It could be either way.

>Wouldn't normalizing the scalar lose information? I don't know Unicode, 
>but surely someone must have a use for storing strings in both NFC and 
>NFD. Is it valid to intermix both forms? Isn't there a need to preserve 
>the data in its original encoding? I don't like the idea of the language 
>losing information without the programmer's permission.

Whether normalizing loses information seems to depend on your definition of 
"lose". When you take a Unicode string and put it into either NFC or NFD, 
the result is equivalent, but not the same. The Unicode standard specifies 
what characters and character sequences are equivalent. When you're dealing 
with Unicode data, you're not supposed to care about the actual code 
points, as far as I can tell. (With the possible exception of general 
things like "must be NFC" or "must be NFD")


                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to