[fpc-devel] _wcsicmp

2014-12-04 Thread Adriaan van Os
Is there an RTL or package equivalent for _wcsicmp et all 
http://msdn.microsoft.com/en-us/library/k59z8dwe.aspx or should the libc declaration be added to 
the application ? Or is there another recommendation to compare wide (so-called Unicode) strings 
case-insensitive ?


Regards,

Adriaan van Os

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] _wcsicmp

2014-12-04 Thread Jonas Maebe


On 04 Dec 2014, at 13:02, Adriaan van Os wrote:

Is there an RTL or package equivalent for _wcsicmp et all http://msdn.microsoft.com/en-us/library/k59z8dwe.aspx 
 or should the libc declaration be added to the application ? Or is  
there another recommendation to compare wide (so-called Unicode)  
strings case-insensitive ?


The closest equivalent is probably sysutils.UnicodeCompareText(). It  
uses CompareStringW on Windows though, and on Unix it converts both  
strings to uppercase, locale-sensitive-wise, and compares those.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] AnsiUpperCase problems

2014-12-04 Thread Hans-Peter Diettrich
The following console program demonstrates various problems with the new 
(encoded) AnsiStrings (FPC trunk):


program litTest2;
{.$codepage UTF8} //off for now
uses Classes,SysUtils;
var A: AnsiString;
begin
  a := 'äöü';
  //a := a+' '; //uncomment later
  WriteLn(a,'äöü');
  WriteLn(AnsiUpperCase(a),AnsiUpperCase('äöü'));
end.

The output varies depending on (at least) the file encoding and target 
platform (tested only on Windows, using Lazarus).


With an Ansi source file the last line shows as 'ÄÖÜÄÖÜ', as expected. 
The variable also shows as 'äöü', but not the literal (3 graphical 
characters). In all other (tested) cases something different is shown, 
no uppercase letters at all.


With an UTF-8 source file (with BOM) both the variable and literal show 
as 'äöü', but unfortunately never in upper case.


Adding {$codepage UTF8} requires an UTF-8 source file. That's compatible 
with Lazarus defaults, so that further tests (here) will use this 
combination. Please note that (currently) Lazarus sets or leaves 
DefaultSystemCodePage as according to the actual OS, i.e. 1252 for my 
installation, regardless of $codepage.


Now all items are shown as 'äöü', but again never in uppercase - how that?


AnsiUpperCase finally calls Win32AnsiUpperCase (on Windows), declared as
  function Win32AnsiUpperCase(const s: string): string;
which in turn calls CharUpperBuffA.
This explains why no uppercase conversion is performed, when S has a 
dynamic encoding different from (WinAPI) CP_ACP, which is expected by 
CharUpperBuffA. Actually I found the *dynamic* encoding of A and S as 
CP_UTF8, even if its static encoding is CP_ACP (or 1252).


Consequently AnsiUpperCase should convert S to the WinAPI CP_ACP 
(GetACP), before passing it to CharUpperBuffA. The same for all other 
functions with AnsiString arguments, calling external (OS API...) 
routines expecting a specific encoding, on all platforms. And for user 
code, which relies on the encoding of all strings being the declared 
one, like in:

  str1[1]:=str2[1]; //both strings of same type

IMO such additional checks and conversions should be avoided, they bloat 
the library code and consume runtime. Note that SetCodePage requires an 
RawByteString (var parameter), and thus cannot be used immediately to 
adjust the dynamic codepage of an AnsiString.



Now let's add (uncomment) the line
  a := a+' ';
and voila, AnsiUpperCase works, because now the string has the expected 
CP_ACP instead of UTF-8. The same effect occurs when A is assigned from 
an UnicodeString variable.


Is it really intended, that AnsiString behaviour depends on such details?


The most simple solution would disallow a different static and dynamic 
encoding of AnsiStrings, except for RawByteString. Then no additional 
checks and conversions are required, except the one in the assignment of 
an RawByteString to an AnsiString of different type, and everything else 
can be determined by the compiler from the known static=dynamic encoding 
of strings.


More checks and conversions can be avoided, when the dynamic encoding of 
string literals is the actual encoding, as used by the compiler for the 
stored literal, not Delphi incompatible placeholders like CP_ACP. Then 
TranslatePlaceholderCP is required only for explicitly given encoding 
values, but no more for the dynamic encoding of strings.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] _wcsicmp

2014-12-04 Thread Chris Dryburgh

On 04/12/14 07:16 AM, Jonas Maebe wrote:


The closest equivalent is probably sysutils.UnicodeCompareText(). It 
uses CompareStringW on Windows though, and on Unix it converts both 
strings to uppercase, locale-sensitive-wise, and compares those.


As a general rule it is better to convert to lower case before comparing 
Unicode characters. Accents can get lost in conversion to uppercase. 
Accent loss does not happen or is much more rare in conversion to lower 
case.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] _wcsicmp solved

2014-12-04 Thread Adriaan van Os
Is there an RTL or package equivalent for _wcsicmp et all 
http://msdn.microsoft.com/en-us/library/k59z8dwe.aspx or should the libc declaration be added to 
the application ? Or is there another recommendation to compare wide (so-called Unicode) strings 
case-insensitive ?


Sorry, I found WideCompareText after sending my message.

Regards,

Adriaan van Os


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel