Re: Major performance problem with std.array.front()

Marc Schütz Sun, 09 Mar 2014 07:15:51 -0700

On Friday, 7 March 2014 at 23:13:50 UTC, H. S. Teoh wrote:

On Fri, Mar 07, 2014 at 10:35:46PM +0000, Sarath Kodali wrote:
On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleevwrote:
>On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu
>wrote:
[...]
>>Clearly one might argue that their app has no business>>dealing
>>with diacriticals or Asian characters. But that's the typical
>>provincial view that marred many languages' approach to UTF>>and
>>internationalization.
>
>So is yours, if you think that making everything magically a>dchar
>is going to solve all problems.
>
>The TDPL example only showcases the problem. Yes, it works>with
>Swedish. Now try it again with Sanskrit.
+1
In Indian languages, a character consists of one or moreUNICODE
code points. For example, in Sanskrit "ddhrya"
http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
consists of 7 UNICODE code points. So to search for this charI have
to use string search.
[...]
That's what I've been arguing for. The most general form ofcharactersearching in Unicode requires substring searching, andsimilarly many
character-based operations on Unicode strings are effectively
substring-based operations, because said "character" may be amultibytecode point, or, in your case, multiple code points. Sincethat's the
case, we might as well just forget about the distinction between
"character" and "string", and treat all such operations assubstringoperations (even if the operand is supposedly "just 1 characterlong").
This would allow us to get rid of the hackish auto-decoding ofnarrowstrings, and thus eliminate the needless overhead of alwaysdecoding.

That won't work, because your needle might be in a differentnormalization form than your haystack, thus a byte-by-bytecomparison will not be able to find it.

Re: Major performance problem with std.array.front()

Reply via email to