On 6/17/13 6:14 PM, TommiT wrote:
On Monday, 17 June 2013 at 17:43:25 UTC, Andrei Alexandrescu wrote:
On 6/17/13 1:36 PM, TommiT wrote:
std.algorithm.copy adheres to the specification of the C++ standard
library function std::copy. That specification states that the source
and target ranges may not overlap. Yet, all the major C++ standard
libraries (libc++, libstdc++, Dinkum C++ Library) implement std::copy so
that it becomes a memmove if the source and target element types are the
same and the element type is std::is_trivially_copy_assignable. Thus,
the current C++ specification seems to be too strict.

The current std.algorithm.copy implementation doesn't get optimized into
a memmove when it would be safe to do so. It really should, because it's
the fastest way to get the job done, and it allows the ranges to
overlap. Also, the documentation should be changed to notify of this
relaxation.

There's some benchmarks of memmove vs other methods:
http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly

Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that was some 
10 years ago.

Andrei

Another thing that comes to mind is that...
R find(alias pred = "a == b", R, E)(R haystack, E needle)
...could be optimized, when pred is "a == b", R is an array of E's, and E is 
either char, byte or
ubyte, to call the c function memchr instead. It's a bit tricky though: quite 
recently a bug was
found in this particular optimization in the standard library that ships with 
Visual Studio, as
described in this interesting tutorial:

http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n

More generally, what I'm saying is that we should probably just go through some 
good C++ standard
library implementation and copy all the optimizations that they have done. I'm 
sure there's much more.

In my experience, this is an area that ends up being deferred over to the compiler (via a builtin or direct use of mem* that the compiler is just smart enough to recognize and implement itself) to let it generate the optimal code. The thresholds for inline vs function call and myriad of optimization techniques end up being super platform specific and duplicating all that complication essentially guarantees that you're not going to do as good a job. As an example, there's people at intel that are paid to do little more than improve the code generation in gcc and often submit new versions of the mem* routines both to gcc and to glibc. It's actually amazing how often that code is tweaked for both older generations of cpus and to add new ones.

- Brad

Reply via email to