Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Jonas Maebe
On 28/11/14 21:30, Hans-Peter Diettrich wrote:
> I prefer to specify and document everything *before* coding, so that
> everybody can expect that the code will behave as specified.

If certain behaviour is explicitly undefined, it *is* specified and
documented. It means that your program is buggy if it triggers such
behaviour, and that the effect of triggering it could be anything.

This is standard practice in computer science. E.g., pretty much every
manual of every processor contains descriptions of explicitly undefined
behaviour (search e.g. for "undefined" in the Intel or ARM architecture
manuals).

An example from FPC itself is accessing an array beyond its bounds when
range checking is switched off. *Some* of the possible outcomes are
accessing a value from a variable declared/before after it, accessing
random data that has nothing to do with any of those variables, a
program crash, or actually accessing an element of the array anyway. We
don't guarantee that any of those possibilities will happen, we don't
say that those are the only possibilities, we don't say they stay the
same across compiler or OS versions, or even across program executions.
Hence, it's undefined.

Exactly the same goes for converting strings with code page CP_NONE to a
different code page: your program is broken when it tries to do that,
and we cannot guarantee any outcome. This is exactly what "the behaviour
is undefined" means.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicodesupport"

2014-11-28 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:

In our previous episode, Hans-Peter Diettrich said:
While it certainly is a stupid (Microsoft) idea to use UTF-16 for file 
storage, we'll have to take that into account.


(16-bit codepages were designed into OS/2 and Windows NT before utf-8 even
existed)


Right, both systems were developed by Microsoft :-]

No problem, as long as proper host/network byteorder conversion is 
applied in reading/writing such files. But in former times every 
computer manufacturer was proud of *his* clever text processing 
features, with characters stored in 6 up to 9 bit registers. In those 
times it was an essential *marketing* feature, when files could *not* be 
read by competing systems, due to different bytesize, bit-/byteorder, 
character sets, file formats etc.


But times have changed, nowadays the Internet requires certain common 
standards (e.g. 8-bit bytes = octets, HTML, Unicode and more), which 
allow for data exchange across machine and country boundaries.


The lack of far-east support already forced the Japanese to invent their 
own BIOS, codepages etc.  Nowadays continued use of UCS2 had forced the 
Chinese to invent their own character encoding, which then would be used 
by more people than UCS2. Guess what would happen to the rest of the 
world, then...



Or will the Chinese government enforce such a development soon, to 
eliminate the need for continued censorship of foreign web pages, 
because legal equipment then only could present genuine Chinese pages, 
but no more HTML, JavaScript and Unicode? How would the official Chinese 
programming language look like?



DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

I fear that there will be code that relies on the "flawed" behavior of 
RawByteString ("it's a feature, not a bug") and using the same name with 
different behavior would brake same. And a really usable DynmicString 
would not adhere to  that description.


How can somebody "rely" on behaviour *stated* as undefined, or not 
working as defined?


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich

Jonas Maebe schrieb:


I'm sorry, but I simply cannot discuss with people that, when I
literally state "the result is undefined", think that I may actually
have meant "the result is defined and if you change the
implementation and/or keep it stable across compiler releases, then
it will also conform to whatever you think that this defined
behaviour should be". I don't have the energy nor the patience for
that.


I also have no use for continuing such discussions.

I prefer to specify and document everything *before* coding, so that 
everybody can expect that the code will behave as specified.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:


An *efficient* implementation would be based on a single program-wide 
string representation, with different encodings being handled only in 
an exchange with external data sources.
Yep. But it would result in severe user code portability issues (see 
above). IMHO using DynamicString at the correct locations would not be 
(noticeably) less efficient but a lot more versatile.


You suggested to use "string" as UTF-16 on Windows, and UTF-8 on Linux. 
That's what I understand as a unique program-wide string representation 
(not sourcecode-wide, instead program as *compiled*). Then I cannot see 
any need or use for another DynamicString type.



I also don't think we will ever see a fix for the poor implementation of 
RawByteString (avoiding the word flaw and the suggestion of a bad 
purpose), because it would brake existing user code.


Nothing can be broken, as long as the Delphi behaviour is undefined. 
Code relying on specific compiler/library bugs is bound to that 
compiler, not portable in any way.


Regarding fpc, "correcting the flaws" and keeping the name RawByteString 
would result in incompatibility issues vs Delphi and breaking code that 
will be ported from Delphi.


Same as above. When application code works properly with strings of 
*sometimes* different static and dynamic encoding, it will not stop 
working with strings of *never* different encodings.


Of course the opposite is not true. When some code works properly (only) 
with strings of the same static and dynamic encoding, it will stop 
working when compiled with Delphi. Then the coder has to insert explicit 
checks for the dynamic encoding of *all* strings, all over his code.


Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means 
that it's obviously easier to *prevent* possibly different 
static/dynamic encodings, instead of *checking and reacting* on such 
flaws throughout the entire codebase. Apart from that, every 
encoding-tolerant code will execute much slower than code without a need 
for checks and conversions everywhere.


I seriously doubt that the FPC developers ever realized these 
consequences, and the amount of time required for finding, reporting and 
fixing the bugs in all affected pieces of their code :-(


That is why fpc would need to define an additional type name (e.g 
"DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00) for a 
decently usable type for intermediately holding a  String content.


This again would make *FPC* programs incompatible with Delphi. While 
fixing the RawByteString flaw would at least allow to *compile* FPC code 
with Delphi, the use of an different encoding value would definitely 
prevent compilation of such code with Delphi. What's the more serious 
incompatibility?



RawXxxString can be used for really "uncoded" data as done with 
old-style strings in a lot of applications.


Such a feature would be appreciated by many users, indeed :-)

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] RFC: proper interpretation and implementation of Unicode Support

2014-11-28 Thread Hans-Peter Diettrich

In response to another thread (this should start an new thread):


"CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of any explicit or implicit
operation that converts this data to another code page is undefined."


After rereading I found this definition incorrect, the entire section 
(and more) deserves a correction/clarification. The implementation may 
have to be changed accordingly.


This is my interpretation of the Delphi API around encoded AnsiStrings, 
as documented and implemented there, with added clarifications and notes 
on omissions and possible problems on non-Windows platforms.


I do not expect that the FPC developers fully agree with this 
interpretation, but I expect that all items of a revised version of the 
following draft become part of the FPC documentation, somehow.




1) CP_ACP, CP_OEM and CP_NONE are "generic" encodings (placeholders), 
applicable as *static* string encodings inside a program only, they 
never can denote a dynamic string encoding.


Note: "codepage" here means byte-based ANSI/ISO codepages, applicable to 
AnsiStrings, not Unicode codepages (BMP...). While CP_UTF16 (and BE/LE 
variations) can be used to specify a concrete (string,textfile...) 
*encoding*, they do not describe codepages (neither Ansi nor Unicode).


Note: these identifiers (names) should be used with exreme care in 
documentation/discussions. In most cases CP_ACP stands for the *actual* 
default encoding, equivalent to the value of a hypothetical *variable* 
named CP_ACP, i.e. currently (see below) should be understood as 
DefaultSystemCodePage. It should be made clear that the value of the 
CP_ACP *constant* identifier (=0) is meant and usable only in few cases, 
like in the declaration of an string type; it may also be acceptable in 
explicit conversion requests, and to denote the encoding to use in 
file/stream I/O, where the functions replace CP_ACP by the actual 
(DefaultSystemCodePage) value internally.


Note: in compiler, library and application code a value of CP_ACP should 
be considered equal to (be mapped into) the actual 
(DefaultSystemCodePage) encoding.


2) A platform (or Unicode library) may or may not provide their own 
*generic* values (constants) for application (CP_ACP) and console 
(CP_OEM) encoding, as well as further constants for e.g. filenames.


Note: CP_ACP is zero on Windows, possibly different on other platforms 
or libraries. Thus AnsiString(0) may be different from 
AnsiString(CP_ACP). It may be required to distinguish between a named 
Pascal constant CP_ACP=0, and the value of the generic 
application/default encoding in API calls (CP_SYS?).


3) The *actually* associated codepages are defined by the platform, 
eventually can be changed by the user (admin). A program may or may not 
be allowed to change the associated codepages, either locally (process 
wide) or globally (system wide).


Note: the name "DefaultSystemCodePage" should be reserved for the 
*system* defined codepage. When this setting can be different from an 
application-wide setting, another DefaultApplicationCodePage variable 
should be added. See the comments on Modifications and Notes on 
DefaultSystemCodePage in the Wiki page!


Note: a process should determine (retrieve) the platform settings 
*before* any attempt to interpret system-provided strings (commandline, 
environment variables...). Depending on the platform, more generic 
settings may apply to specific strings, like for filenames. In all 
external API calls, the RTL is responsible for the correct encoding of 
all string arguments, as expected by the called function. This applies 
in detail to CP_ACP, when this encoding can be changed inside a program 
to something different from the external (platform...) setting.


4) A RawByteString variable, of the static encoding CP_NONE, can hold 
strings of *any* dynamic encoding. No conversion is performed when a 
string is assigned to such a variable. In the opposite direction the 
standard handling should apply, i.e. different static encodings require 
a conversion into the static target encoding.


Note: Its known that Delphi does not always convert an RawByteString, in 
an assignment to a variable of an different type. This flaw should be 
fixed in FPC. Is the according Delphi behaviour *defined* anywhere?


5) Use StringCodePage to get an actual (dynamic) string encoding. 
StringCodePage never returns one of the generic values. The dynamic 
codepage of an unassigned (empty) string is assumed (by Delphi) as the 
actually selected CP_ACP codepage for AnsiString arguments, CP_UTF16 (or 
whatever applicable) for UnicodeString arguments.


Note: while an unassigned (empty) string variable has a static encoding, 
known to the compiler, this encoding is unknown to StringCodePage. The 
overloaded Ansi/Unicode versions of StringCodePage only know about the 
basic string type (Ansi/Unicode) of their arguments, but cannot 
determine a s

Re: [fpc-devel] Windows DirectX9

2014-11-28 Thread Sven Barth
Am 28.11.2014 17:52 schrieb "Adriaan van Os" :
>
> Sven Barth wrote:
>
>>  > Is there a fixed policy to include packages like these with FPC (or
not) ? The license is Mozilla Public License 1.1.
>>
>> The policy is that we'd like to reduce the amount of packages we ship
with FPC directly. For this there is a package repository which can be
accessed using fppkg. Currently there is only a lnet package, but if you
take a look at the wiki entry for fppkg you might be able to add a new
package description file for the DirectX headers so that we can add it
there.
>
>
> So, these "extra" packages are not in svn trunk ? But on separate servers
? But how are they synced with compiler changes ? Who creates and maintains
them ?

Currently quite some are (lazarus-ccr also contains some), but we want to
change this in the future so that not every package under the sun is
distributed with FPC. Also those packages are normally not that sensitive
to compiler changes (and if something needs to he adjusted it would be the
job of the maintainer to do that). The release cycle of the compiler is one
of the reasons why we want to change the current way: the compiler gets a
new release every 1 to 1 1/2 years, but some packages (e.g. MySQL) might
have a quicker iteration or might not be in "sync" with FPC's release cycle
(imagine a new MySQL version released shortly after a FPC release... Great
-.- ).
Both creators and maintainers could (and for some packages even should)
come from the community. That would be one of the benefits of fppkg: less
work for us in the long term :)

> I will note that DirectX Pascal bindings do seem to be distributed with
Delphi.

That is no reason whatsoever to distribute them with FPC as well.
Currently everyone gets the full package of packages even if they aren't
needed. So it's more useful to have a repository where everyone can
download those packages he needs. Other languages have similar approaches
(e.g. Python and Ruby if I remember correctly).

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Windows DirectX9

2014-11-28 Thread Adriaan van Os

Sven Barth wrote:

 > Is there a fixed policy to include packages like these with FPC (or 
not) ? The license is Mozilla Public License 1.1.


The policy is that we'd like to reduce the amount of packages we ship 
with FPC directly. For this there is a package repository which can be 
accessed using fppkg. Currently there is only a lnet package, but if you 
take a look at the wiki entry for fppkg you might be able to add a new 
package description file for the DirectX headers so that we can add it 
there.


So, these "extra" packages are not in svn trunk ? But on separate servers ? But how are they synced 
with compiler changes ? Who creates and maintains them ?


I will note that DirectX Pascal bindings do seem to be distributed with Delphi.

Regards,

Adriaan van Os
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Windows DirectX9

2014-11-28 Thread Sven Barth
Am 28.11.2014 16:25 schrieb "Adriaan van Os" :
>
> Looking on the internet for DirectShow Pascal bindings, I came across <
http://code.google.com/p/dspack/source/browse/#svn%2Ftrunk%2Fsrc%2FDirectX9>.
With a few modifications, this does compile with fpc-2.6.4, e.g.
>
> adriaan% fpc -MDelphi -Twin32 DirectShow9.pas
> Free Pascal Compiler version 2.6.4 [2014/02/26] for i386
> Copyright (c) 1993-2014 by Florian Klaempfl and others
> Target OS: Win32 for i386
> Compiling DirectShow9.pas
> Compiling DirectDraw.pas
> Compiling DirectSound.pas
> Compiling DXTypes.pas
> Compiling Direct3D9.pas
> 53273 lines compiled, 2.3 sec
>
> Is there a fixed policy to include packages like these with FPC (or not)
? The license is Mozilla Public License 1.1.

The policy is that we'd like to reduce the amount of packages we ship with
FPC directly. For this there is a package repository which can be accessed
using fppkg. Currently there is only a lnet package, but if you take a look
at the wiki entry for fppkg you might be able to add a new package
description file for the DirectX headers so that we can add it there.

Regards
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Windows DirectX9

2014-11-28 Thread Adriaan van Os
Looking on the internet for DirectShow Pascal bindings, I came across 
. With a few 
modifications, this does compile with fpc-2.6.4, e.g.


adriaan% fpc -MDelphi -Twin32 DirectShow9.pas
Free Pascal Compiler version 2.6.4 [2014/02/26] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling DirectShow9.pas
Compiling DirectDraw.pas
Compiling DirectSound.pas
Compiling DXTypes.pas
Compiling Direct3D9.pas
53273 lines compiled, 2.3 sec

Is there a fixed policy to include packages like these with FPC (or not) ? The license is Mozilla 
Public License 1.1.


Regards,

Adriaan van Os
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Michael Schnell

On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote:

Michael Schnell schrieb:
 E.g. there are (are least two "Code pages" for UTF-16 ("LE", and 
"BE"), that would be worth supporting.


You are confusing codepages and encodings :-(
That is why I put "goose-feet" around "Code pages". I used this wording 
because fpc (and Delphi ?) uses it abbreviated as "CP" in the constant 
name "CP_UTF-8",  "CP_UTF16" and "CP_UTF16BE) [ see Jonas post: 
"CP_UTF16 and CP_UTF16BE can be returned by StringCodePage() when called 
on a unicodestring, and that's it." ]





See it as a multi-level protocol for text processing. 
Yep. I see that is is workable and I understand the (supposedly mostly 
historical) reasons. But IMHO not a good (i.e. crafted from ground up) 
concept.




It's known that the Delphi AnsiString implementation is flawed,...
And hence it's frustrating to see that fpc needs to follow for 
compatibility reasons. That is why I suggested an improved 
implementation (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support). 
While the seriously flawed Delphi compatible use of the dynamic 
encoding-brand (and bytes-per element) information (only implemented 
with  RawByteString) can be left at it is and a decent implementation 
with a new DynmicString Type (CP_ANY) should be crafted.




I see no problem in using the same names and values. Delphi documents 
clearly state: ...
I fear that there will be code that relies on the "flawed" behavior of 
RawByteString ("it's a feature, not a bug") and using the same name with 
different behavior would brake same. And a really usable DynmicString 
would not adhere to  that description.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Michael Schnell

On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
The "universal paradigm" would allow for extensions (e.g. UTF-32, 
multiple 16 Bit Code pages, an additional fully dynamic String type, 
n-byte "un-encoded" string types), as I described in the Wiki page.


Even if feasable, such arbitrary string storage can dramatically 
increase the number of implicit string conversions. 


Of course it can do harm on that behalf, if the user is silly enough to 
*explicitly* define variables in a brand without thinking about what he 
is doing. But this exactly the same when he just uses the stuff 
currently offered by Delphi and fpc. If you arbitrary define code pages 
for variables for your 8 bit ("ANSI") strings you will enforce many 
conversions.


Currently in Delphi if you don't define special code pages anything will 
be UTF-16. So no unnecessary conversions.


In fpc (and maybe Lazarus, as well) I suppose the way currently in the 
works is (when not changing the Default behavior by certain options):
 - when compiling for Windows, "String" is UTF-16, and the RTL and LCL 
ubiquitously use "String": So no unnecessary conversion
 - when compiling for Linux,  "String" is UTF-8, and the RTL and LCL 
ubiquitously use "String": So no unnecessary conversion, either.


If this is done in the libraries (e.g. RTL and LCL) and in user code, 
this would allow for as little conversions as possible and thus best 
performance. Here, you would need different library binaries which might 
or might not be a problem.


But of course the portability is very questionable (including, but not 
limited to the fact that the result of "pos" is different)-


When (on top of this) doing the interfaces to libraries (including 
TStrings) with "DynamicString" (encoding brand "CP_ANY"), no additional 
conversions would be necessary, as - because all other Strings use the 
same encoding brand (either UTF-16 or UTF-8, depending on the OS) and 
hence the dynamic encoding of all DynamicStrings used would always be 
exactly that brand. Hence, IMHO, this would nor harm at all, as the 
overhead the compiler needs to implement to just check the dynamic type 
brand and find that no conversion is necessary is extremely small.


But now the user has a choice !

 - If he does not do anything regarding the encoding brand of his 
strings, he will not notice the existence of the DynamicString Type at 
all. Not even Performance-wise. (But he might encounter portability issues.)
 - if he decides that he wants to use a dedicated encoding brand in all 
or parts of his code, he of course needs to know what he is doing. This 
can result

   - in improved portability (if decently done)
   - in improved performance (if decently done) e.g. by using on-byte 
strings for compact storing the information and two-byte strings for 
e.g. search loops, or using the best fitting encoding in the loops in 
the user code while allowing auto-conversion when accessing the 
libraries in case the underlying OS enforces a different encoding.
   - in disastrous increase of auto-conversions and thus performance 
degradation, (if not decently done).



An *efficient* implementation would be based on a single program-wide 
string representation, with different encodings being handled only in 
an exchange with external data sources.
Yep. But it would result in severe user code portability issues (see 
above). IMHO using DynamicString at the correct locations would not be 
(noticeably) less efficient but a lot more versatile.




After all I have the impression that the known RawByteString flaws 
will never be fixed in Delphi, in order to encourage the users to take 
the step to UnicodeString. Now the question is whether these flaws are 
fixed in FPC, or whether Lazarus will become the first project that 
definitely requires an complete move to UnicodeString, for reliable 
operation.

For best support of non-UTF-16 platforms I'd suggest to fix the flaws...

I also don't think we will ever see a fix for the poor implementation of 
RawByteString (avoiding the word flaw and the suggestion of a bad 
purpose), because it would brake existing user code.
Regarding fpc, "correcting the flaws" and keeping the name RawByteString 
would result in incompatibility issues vs Delphi and breaking code that 
will be ported from Delphi.


That is why fpc would need to define an additional type name (e.g 
"DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00) for a 
decently usable type for intermediately holding a  String content. (see 
Wiki -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
)


RawXxxString can be used for really "uncoded" data as done with 
old-style strings in a lot of applications. Even if "seriously flawed" 
auto-conversion might be implemented in fpc for RawByteStrimg (for 
Delphi-compatibility), the user can easily avoid it by not directly 
combining RAW and differently statically encoded strings in an operation.


-Michael




Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicodesupport"

2014-11-28 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said:
> While it certainly is a stupid (Microsoft) idea to use UTF-16 for file 
> storage, we'll have to take that into account.

(16-bit codepages were designed into OS/2 and Windows NT before utf-8 even
existed)
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Jonas Maebe


On 27 Nov 2014, at 17:11, Hans-Peter Diettrich  wrote:

> Such statements come only from writers that do not believe that their words 
> can be understood in various ways ;-)

I'm sorry, but I simply cannot discuss with people that, when I literally state 
"the result is undefined", think that I may actually have meant "the result is 
defined and if you change the implementation and/or keep it stable across 
compiler releases, then it will also conform to whatever you think that this 
defined behaviour should be". I don't have the energy nor the patience for that.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel