Thanks to all!

I am not sure if I figured out why all this is working as it is working,
but I did change the format from ASCII to FLOAT and the time was
considerably (90% ?) reduced.

I may do some more tests in the future. Thanks to all!

Germán

2017-05-10 19:41 GMT-03:00 Ian Ashdown <ian_ashd...@helios32.com>:

> > You could be suffering from memory allocation costs, not so much the
> operations themselves.  The rmtxop
>
> > program uses doubles and 3 components per matrix entry, so that's
> (3x2048x2305 x 8 bytes) or 108 Mbytes
>
> > for each of your matrices.  When you multiply one such matrix by a sky
> vector, you only have the additional
>
> > memory needed by the vector (54 KBytes).  When you add matrices, rmtxop
> keeps two of them in memory
>
> > at a time, or 216 MBytes of memory.  That's not a lot for most PCs these
> days, but the allocation and freeing
>
> > of that much space may take some time if malloc is not efficient.
>
> >
>
> The problem here is not the amount of memory, but of the CPU’s access to
> it. When the CPU is accessing the arrays, the data is stored in a hierarchy
> of caches. For a modern Intel Core i7, for example,  there are typically
> four L2 caches of 256 KB each and a slower L3 cache of 8 MB that is shared
> by the CPU cores.
>
>
>
> If the arrays can be stored in the L2 cache, the processor can usually run
> at full speed. If not, then the CPU will typically have to wait while the
> data is retrieved from the slower L3 cache.
>
>
>
> The caches are on-chip. If the arrays exceed the L3 cache capacity, then
> the data will need to be retrieved from the much slower main memory and
> transferred over the memory bus. With 216 MB of array data to contend with,
> this is most likely the culprit.
>
>
>
> Performance optimization typically involves arranging the array data in
> memory such that it can be loaded in cache lines, organizing the array
> stride, splitting the arrays into subarrays for multithreaded processing,
> and so on. However, these are mostly processor-family specific. (For Intel
> CPUs, SSE and AVX instructions are also available to improve parallelism.
> Optimizing compilers can help here, but hand coding may be needed for
> optimal performance on specific processors.)
>
>
>
> If the arrays are stored in ASCII rather than binary, you will typically
> see a performance hit of several hundred times as the CPU spends most of
> its time parsing the strings into floating-point data.
>
>
>
> Ian Ashdown, P. Eng. (Ret.), FIES
>
> Senior Scientist
>
> SunTracker Technologies Litd.
>
> www.suntrackertech.com
>
>
>
> _______________________________________________
> Radiance-dev mailing list
> Radiance-dev@radiance-online.org
> https://www.radiance-online.org/mailman/listinfo/radiance-dev
>
>
_______________________________________________
Radiance-dev mailing list
Radiance-dev@radiance-online.org
https://www.radiance-online.org/mailman/listinfo/radiance-dev

Reply via email to