On Wed, Mar 20, 2013 at 1:15 PM, Erik Schnetter <
[email protected]> wrote:

> On Wed, Mar 20, 2013 at 12:06 PM, Pekka Jääskeläinen <
> [email protected]> wrote:
>
>> On 03/20/2013 05:49 PM, Erik Schnetter wrote:
>>
>>  With gcc, memcpy is completely optimized away. With clang as well -- I
>>> am using memcpy internally e.g. to convert doubles into integers to
>>> access certain bits, and this translates to no instruction at all,
>>> things are just kept in the same register. I would therefore hope that
>>> the pocl->vecmathlib transition would be similarly ideal.
>>>
>>
>> Let's hope so. Anyways, the generic type version is useful to ensure
>> portability to other targets. Afterall, the most important thing is to
>> have an inlineable math library. Other optimizations are secondary
>> at this point.
>>
>> I cannot compile vecmathlib separately to produce the 'test' binary:
>>
>> [  2%] Building CXX object CMakeFiles/bench.dir/bench.cc.**o
>> make[2]: clang++-mp-3.3: Command not found
>> make[2]: *** [CMakeFiles/bench.dir/bench.**cc.o] Error 127
>> make[1]: *** [CMakeFiles/bench.dir/all] Error 2
>>
>
> There is a file CMakeLists.txt that hard-codes (didn't use autoconf here)
> the clang compiler name and compiler options. If you modify these manually,
> you should be able to build.
>
>
>> But I know it uses the SSE2 optimized header as I inserted an
>> #warning there where it includes them. I do not have AVX.
>>
>> cat /proc/cpuinfo
>> ...
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
>> nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
>> nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
>> ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dtherm
>> tpr_shadow vnmi flexpriority ept vpid
>
>
> On this system, round() should translate to a single machine instruction.
> With optimization, clang should inline all function calls, and there should
> not be a long string of calls.
>
> I'm continuing to investigate.
>

I think I found the problems. The C++ compiler does not know that long and
double are to be supported, since the C++ code does not include types.h.
Therefore, only round(float) is generated, and not round(double).
Presumably, round(double) is then taken from somewhere else. Also, the C++
compiler doesn't seem to see the optimization settings, so it produces
unoptimized code, so that the calls to memcpy remain, and the call chain
within VML is not inlined.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to