Hi,

I think a good solution in this case would be to disable this
optimization globally.
If our implementation is a trivial loop, the optimization makes no sense anyway.
(Replacing one loop with a call to the same kind of loop is not really
improving anything).

Given that it is not possible to optimize these routines in C, we can
reenable it (per arch?)
when we provide optimized asm implementations. I have not checked current
asm implementations, but implementing trivial byte loops in asm for the
sake of enabling this optimization is IMO not the best way.

note that this particular optimization is causing problems for glibc
as well [0],
so we might see some generic gcc provided solution in the future
(attribute memset/memcpy?).

jan

[0] 
http://sourceware-org.1504.n7.nabble.com/GCC-4-8-and-ftree-loop-distribute-patterns-td222259.html

On Mon, Apr 1, 2013 at 10:03 AM, Jakub Jermar <[email protected]> wrote:
> On 04/01/2013 02:08 AM, Martin Decky wrote:
>>>> builds that use -O2 -ftree-loop-distribute-patterns show the original
>>>> problem
>>>
>>> Thanks, this helped. I committed the fix to mainline,1789. Looks like
>>> there can be a problem with memmove() too.
>>
>> I don't get it. In what sense is using such a highly-specific
>> optimization attribute and sticking to the sub-optimal C implementations
>> of the mem*() functions the _best_ solution?
>>
>> I have the assembler versions of memset() for all platforms almost
>> ready, we already have memcpy() for all platforms. So what's the point
>> of reverting back to revision 1783?
>
> There are a few points to consider:
>
> - the fix is generic and only requires each prototype of these risky
> functions (memset(),memcpy(),memove(),others?) to be decorated by the
> attribute; the attribute is exactly the one which causes/solves the
> trouble (is even documented as such, so we are not suppressing some
> side-effect of it) and can be abstracted away in some compiler-dependent
> macro, such as COMPILER_RECOGNIZED_PATTERN
>
> - on the other hand, we would have to provide asm implementations for
> triplets [boot, kernel, uspace] of 7 x (3 + ?) functions, out of which
> we only have memcpy() for all platforms and now you report you are close
> to having memset() for all; how about memmove()? Are we going to rewrite
> all functions that are 'recognized' by gcc into asm? I think this is
> highly impractical
>
> - you say sub-optimal C implementations; it seems to me that the
> compiler is very well aware of what is the function doing and will
> optimize the function heavily anyway (-O3 still applies here - just see
> the difference between a -O0 and -O3 version), only in a non-recursive
> way; it looks easier to me to write an optimized eg. memmove() in C than
> an optimized memmove() in 7 assemblers by hand
>
> - we have had bugs in our C memmove() implementation: easy to fix in C,
> much more difficult to fix in all asm implementations; or conversely, we
> can have a latent bug in only one of the hand-written asm
> implementations (hasn't this been already the case of a very
> hard-to-find  bug in our mips32 memcpy_from/to_uspace?); I see this as a
> maintenance burden
>
> - anyway, this universal fix can be considered only a stop-gap solution
> which can be easily reverted once we have all the asm implementations
> ready and believe they provide the best solution; having this solution
> in place allowed me to see another problem exhibited by the new
> toolchain: mips32, ia64 and sparc64 have an alignment issue in
> fat_lfn_size()
>
> Jakub
>
> _______________________________________________
> HelenOS-devel mailing list
> [email protected]
> http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to