Martin Friebe wrote:
Just to make sure, all of this discussion is based on various collation
No part of this discussion is based on collation.
I am going to leave out the object question for now. I said all I can
say in earlier mails.
That's good. Thank you.
And also from your comments it appears more a question of collation
> being stored with the string, substring, or even each char.
Martin, are you doing this on purpose? I mean, are you intentionaly
driving me up the wall?
Seriously. Can't you forget/drop this 'collation' word?!
And, then, think a little deeper.
Here is a scenario for you:
You have multilanguage text as data. Someone has asked you to search it
and see if a certain peice of string (in a given language) exists in it.
This search needs to be NOT case-sensitive.
How can you do this?
Is it doable if TCharacter (or wahtever you call it) has no 'langauge'
attribite?
[Note that, here 'TCharacter' isn't necessarily an object; it might as
well be a simple record structure.]
As found in the last mail, there is currently no standard for handling
cross-collation in any string function (that is string function, which
could be collation based).
1) IMHO only few people would need this. For the majority it would be
unwanted overhead.
2) Within those few, there would be too many different Expectation as to
what the "standard" should be. If FPC choose one such standard at will,
it would benefit almost no one.
You're still stuck with that wretched word 'collation'.
The best FPC could to is provide storage, for something that is not
handled or obeyed in any function handling the data. This doesn't sound
desirable to me. If anyone who needs it will have to implement the
functions, then those may add there own storage for it too.
Besides instead of storing it per char, you can use unused unicode as
start/stop markers. So it can be implemented on top of a string that
stores unicode-chars (and chars only, no attributes)
Is there, in Unicode, start-stop markes that denote 'language'?
All the others are not an intrinsic part of o a char at all --they
vary by context.
Why is language intrinsic to the text? An "A" is an "A" in any language.
At best language is intrinsic to sorting/comparing(case on non
case-sense) text
Comparing is a lot more important an operation than collating --or,
rather, collation is achieveable only if you can do proper comparisons.
Take this, for example:
"if SameText(SomeString, SomeOtherString) then do ..."
For this to work properly, in both 'SomeString' and 'SomeOtherString',
you need to know which language *each* character belongs to.
If you dont have that informtaion, you might as well not have a
SameText() function in FPC.
Please note the 'case-INsensitive' keyword there.
Well I needed an actual example where case sense differs by language
(assuming we talk about language using the same charset (not comparing
Chinese whit English).
Here is a simple example for you:
"if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ..."
Now.. how are you going to decide that SameText() function here returns
true unless you have information that the substring 'FoolStraße' is in
German?
I know that this is a very simple example --that 'ß' exists only in
German, and that you could infer that when you met that char.
But, this hightlights the problem --and there are times when you cannot
infer.
In any case, I can write up several different algorithms how to do that.
Please do. SameText(), for one, will need all the help it can get.
What I can not do (or what I do not want to do) is to decide which of
them other people do want to use.
But, isn't this just that: IOW, you're deciding what other people will
NOT want to use if you throw the 'language' attribute (for each char)
out of the window..
Or, if this is not what you think of, please clarify by example..
Here is another typical example:
SameText('Istanbul', 'istanbul') can only return true when both
'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani.
Otherwise, the same SameText() has to return false.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel