On 11/16/2011 05:24 PM, Marco van de Voort wrote:

The original proposal was like (A) but only for base unicode encodings
(utf8/16 and maybe 32), but went down due to either excess conversions and
need for overloading.  The amount of overloading for the current 3-4
stringtypes is already a bit much.  (short/ansi/wide/unicodestring)
...

This is exactly what I meant to say. (It's a viable definition, but...)

(B) was a counter proposal floated by Florian. The cons were pretty much
that you had to guard every encoding sensitive routine (e.g.  every API/OS
call) to enforce the string contained the encoding you expected.  Combining
one and two byte types also cast doubt on the [] operator's performance.
...

This is exactly what I meant to say. (It's a viable definition, but...)
Then Yury proposed to combine A and B, in retrospect a bit like the current
Delphi implementation but with one and two byte encodings in one type.
...

Yep. But IMHO the wording I proposed by (C): such as "object-alike" IMHO leads to a more "understandable" definition, in effect providing identical (or at least very similar) results and i.e. to at most an identical implementation, as most of the differences might be considered "implementation depending" (not defined by the pure, documented definition of the behavior (such as what happens with "intersexual" variables).

Note that the Delphi2009 definition is theoretically capable of combining one 
and
two bytes in one type (like Yury's).
As I don't have such a Delphi please help me to understand:

Is there a general type dedicated for being able to hold any encoding ? (be it ANSIxyz, UTF-8 or UTF-16) ?

Of course, when assigning something to a "strictly encoded" String (the type denotes the encoding) the definition of what is supposed to happen is clear and obvious. If the Type name or the dynamic encoding of the target (even if Length=0) is used for deciding about a conversion is an "implementation detail".

Is there a clear definition about what happens if the "general" string type is the target ? Here, IMHO, it would be very hard to understand, if the history of the target variable (i.e. has a string of some encoding been assigned to it before) would decide about a conversion. IMHO a General string type needs to be handled as fully dynamically encoded and thus as a target always needs to get the source's encoding.

Such "assignment" can happen with ":=", and with function calls. With function calls there is "value" and "var" parameters. All this should behave identical, any other behavior would be very hard to understand.

And on top of this: what is the type "String" ? Of course the general String type would be an obvious choice, but perhaps (depending on the implementation) this might result in worse performance in certain cases of usage and thus some strict (specifically encoded) Type could be chosen. (In fact I will never again use "String" in any project, but use a propriety type defined in some central unit so that I at any time can do a central change to some specific string type.)

Embarcadero kept the two types separate,
Making a decently clear definition of the behavior (from a user's view) rather complicated.
- backwards compatibility (and thus the hurdle to upgrade)
This did not seem to have worked. Everybody, I asked, who migrated a large project to the new strings, was very unhappy.

Explain parent-child for explicitely this context. This kind of stuff is
what I meant with self contained. Don't use terms that you don't fully
describe elsewhere.
Sorry that I seemingly failed with my intention to help understanding what I meant by stating the similarity to the objects' parent-child relationship.

I just meant a "General" (or "Raw") string type needs to exist that can hold any encoding and needs no conversion when a strictly encoded variable is assigned to it (via ":=", value parameter or var parameter). Similar as with a parent object it "is" any strictly encoded string type, so that when using it as a nominal parameter of a function, it can - without conversion - take any strict string type (and of course the general type, too). Similar as with an object's runtime type (such as via "is" and "as"), the encoding of a General string can be detected and handled when appropriate (e.g. when combining with a another (strict or general) string or assigning to a strict string variable might request for conversion).
the RAW string type and the types supposed to hold a specific encoding.
Explain RAW.
See above. "General" or "not Strict" would be more appropriate (I took the term "Raw" from other recent discussions on the issue.)
Yes, I never really considered (B) a workable solution. It would break
existing code, and the ways to deal with the other problems was hackish at
best.
Yep. But I was told by unhappy coders that the new Delphi way breaks a lot of existing code, as well. So a new FPC way has a chance of being better. :) This might (or might not) be a way to do this.
I think the A-B hybrid is better than either A or B. And that is what is
being implemented.
Yep. Only the definition of it's behavior of course is a lot more complex. In fact with "C" I tried (and failed) to find a proper basic definition of exactly this.
Then describe how that should work. What should happen if I pass such marked
raw string to a function that wants encoding<y>?
I hope I did this some lines up. But better see below
So IMHO the Parent-Child (alike) relationship between RAW and any other
new string type is quite obvious.
No it is not. And you don't make the situation any clearer by writing yet
another message without a concrete description (either using specs or with
examples), and not defining RAW and exactly how the parent-child relation
works.
I hope I did this some lines up. But better see below

I've been doing OOP for 15 years now, but I've no idea whatsoever.
Obviously it was not as a good idea as I thought to state the similarity between the relation between a single "General" and multiple "strictly encoded" string types regarding a Parent object and multiple Child objects. But I am not at all against dropping this analogy and just using a "self-contained" definition. Moreover I think we agree upon dropping the term "Raw" for the general string type.

So the wording could be similar to:
- There is a General String type that can hold any encoding (and any width of the code elements) - There are lots of Strict String types that are supposed to hold strings in a predefined encoding (somebody else might describe in detail how these are defined) - There are the appropriate single-character types corresponding to all of the above string types - A variable of the General string type or the General character type only has a defined encoding if it before has been the target of an assignment of a not empty string or a character with defined encoding.
 - If just using strict types the conversion rules are obvious
- If assigning a value to a General string or character, (via ":=", value or var parameter) no conversion is done. - There are means to detect the actual encoding of a general String or Character variable. - If combining any Strings/Chars with General String or Chars, the coding of the General ones is fetched from their embedded dynamic encoding definition to decide upon conversion.

I hope this is more like what you'd like to see.

Note that the recent discussion about how variable passing with RAW / not RAW strings is implemented might be decided by such a definition.

Note that Delphi seemingly introduced the encoding types $0000 for "None"/"to be assigned" and $FFFF" for "Raw". This allows for a string variable to be "General" or "Raw" with different meaning. How use this to implement a proper handling of General variables that hold a certain encoding but still are strictly General so that they get a different encoding with the next assignment ? I don't see of / if / if not this is helps implementing the above definition of if this or if it is a contradiction to same and/or provides nasty ambiguity.

Of course you can try to create some object based stringtype like C++, but
then you will have to deal with all its problems, and the fact that Pascal's
object model is not the same as C++'s. Also stuff that we take for granted
(like copy-on-write) would be hard.
Of course I agree.

Thanks,
-Michael

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to