Re: Major performance problem with std.array.front()

Andrea Fontana Mon, 10 Mar 2014 07:06:35 -0700

In italian we need unicode too. We have several accented lettersand often programming languages don't handle utf-8 and otherencoding so well...

In D I never had any problem with this, and I work a lot on textprocessing.

So my question: is there any problem I'm missing in D withunicode support or is just a performance problem on algorithms?

If the problem is performance on algorithms that use .front() butdon't care to understand its data, why don't we add a .rawFront()property to implement only when make sense and then a "fallback"like:

auto rawFront(R)(R range) if ( ... isrange ... &&!__traits(compiles, range.rawFront)) { return range.front; }

In this way on copy() or other algorithms we can use rawFront()and it's backward compatible with other ranges too.


But I guess I'm missing the point :)


On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:

On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote:
I'm not sure I understood the point of this (long) thread.
The main problem is that decode() is called also if not needed?
I'd like to offer up one D 'user' perspective, it's just asingle data point but perhaps useful. I write applications thatprocess Arabic, and I'm thinking about converting one of thoseapps from python to D, for performance reasons.
My app deals with unicode arabic text that is 'out there', andthe UnicodeTM support for Arabic is not that well thought out,so the data is often (always) inconsistent in terms ofsequencing diacritics etc. Even the code page can vary.Therefore my code has to cater to various ways that otherdevelopers have sequenced the code points.
So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode,usually UTF8, if isn't already.* I want to iterate over code points. I don't care about theraw data.* When I get the length of my string it should be the number ofcode points.
* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.
If I want to access the raw data, which I don't, then I'm veryhappy to cast to ubyte etc.
If encode/decode is a performance issue then perhaps therecould be a cache for recently used strings where the code pointrepresentation is held.
BTW to answer a question in the thread, yes the data isleft-to-right and visualised right-to-left.

Re: Major performance problem with std.array.front()

Reply via email to