PING**1.2

Yet another slightly updated patch attached. Compared to the previous
version, now with specializations for size 12 and 16 as well. For the
real(10) benchmark, with the previous v3 patch (please disregard the
absolute values in the post quoted below, there were wrong due to a
bug):

  Unformatted sequential write/read performance test
 Record size           Write MB/s                 Read MB/s
 ==========================================================
           4   80.578833140738340        127.33074266188656
           8   137.61682156650559        184.49033790407984
          16   202.72871312800621        275.98801561061816
          32   275.33538767460863        413.43956672052303
          64   341.04488670485119        555.13744525826564
         128   384.77917051919820        671.44655208024699
         256   410.97208129045833        763.97660513918527
         512   425.76619227779878        826.41086693364593
        1024   430.77035999730009        840.30757120448550
        2048   438.30318459339475        885.50033810296600
        4096   455.79422809097599        919.78265920652086
        8192   465.74499205886326        959.06963983370918
       16384   472.48133493971142        991.11244162081744
       32768   471.00024619567603        1015.7428144049615
       65536   474.91235280949985        1021.2150519080892
      131072   475.18664487440901        1006.3701982554830
      262144   478.00435092846868        985.17141300594039
      524288   476.72837201590363        991.74226579987987

With the new v4 patch:

 Unformatted sequential write/read performance test
 Record size           Write MB/s                 Read MB/s
 ==========================================================
           4   87.353141847504133        145.09410391177835
           8   166.95093628370549        223.60877830048437
          16   272.20937208187746        364.91673986840277
          32   415.26016354252715        599.41744252952310
          64   592.97676703528009        900.53345964312450
         128   748.27218547147686        1189.7131837787238
         256   874.83098506714384        1561.3649529261234
         512   935.69494481144284        1823.1760143164879
        1024   983.51689491813215        1931.8773088107300
        2048   1009.5491761651396        1971.6978586130062
        4096   1115.5862027658552        2119.4151169997808
        8192   1172.9400229568287        2184.1403983641089
       16384   1222.6659284153168        2258.5490449229878
       32768   1242.2417626697293        2251.8159046253918
       65536   1227.9967555594396        2313.4106672387143
      131072   1204.4295656544052        2129.1309150039478
      262144   1135.7905614378458        2154.7146453789856
      524288   1075.5769074402640        2170.5151501933169


On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist
<blomqvist.ja...@gmail.com> wrote:
> PING.
>
> Slightly updated patch attached, which further improves the generic
> size fallback that is used when the element size is not 2/4/8 bytes.
> Changing the us_perf benchmark to use real(10), with the v2 patch the
> performance is:
>
>  Unformatted sequential write/read performance test
>  Record size           Write MB/s                 Read MB/s
>  ==========================================================
>            4   59.028550429522085        86.019754350948787
>            8   79.028327063130590        95.803502000733374
>           16   99.980457395413296        138.68367462874946
>           32   122.56886206338788        180.05609910155042
>           64   152.00478266944486        212.69931319407567
>          128   197.74137934940202        235.19728791956828
>          256   155.36245780017779        244.60578379215929
>          512   157.13385845966246        245.07467397691480
>         1024   177.26553799130201        260.44908357795623
>         2048   208.22852888945587        260.21587143113527
>         4096   222.88410474980634        262.66162209490591
>         8192   226.71167580652920        265.81191407123663
>        16384   206.51818241747065        263.59395165591724
>        32768   230.18707026455866        265.88990325026526
>        65536   229.19783089391504        268.04485112932684
>       131072   231.12215662044449        267.40543904427710
>       262144   230.72012123598142        267.60086931504122
>       524288   230.48959460456055        268.78750211303725
>
> With the new v3 patch I get
>
>  Unformatted sequential write/read performance test
>  Record size           Write MB/s                 Read MB/s
>  ==========================================================
>            4   59.779061121239941        92.777125264010024
>            8   92.727504266051341        126.64775563782673
>           16   128.94793911163904        184.69194300482837
>           32   169.78916283536847        267.06752001266767
>           64   209.50296476919556        341.60515130910238
>          128   236.36709738360679        416.73212655882151
>          256   251.79029695383340        465.46804746749740
>          512   259.62269939828633        500.87346060356265
>         1024   265.08842337586458        508.95530627428275
>         2048   268.71795530051884        532.12211365683640
>         4096   280.86546884821030        546.88907054369884
>         8192   286.96049684823578        569.60958187426183
>        16384   292.04368984868103        608.11503416324865
>        32768   292.96677387959392        629.80651297065833
>        65536   291.69098580137114        624.27103478079641
>       131072   292.75666234956418        605.99766136491496
>       262144   291.35520038228975        611.59061455535834
>       524288   292.15446100501691        623.76232623081580
>
>
> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
> <blomqvist.ja...@gmail.com> wrote:
>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <sch...@linux-m68k.org> 
>>> wrote:
>>>> Janne Blomqvist <blomqvist.ja...@gmail.com> writes:
>>>>
>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>>> index c8ecc3a..bf2250a 100644
>>>>> --- a/libgfortran/io/file_pos.c
>>>>> +++ b/libgfortran/io/file_pos.c
>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, 
>>>>> gfc_unit *u)
>>>>>       }
>>>>>        else
>>>>>       {
>>>>> +       uint32_t u32;
>>>>> +       uint64_t u64;
>>>>>         switch (length)
>>>>>           {
>>>>>           case sizeof(GFC_INTEGER_4):
>>>>> -           reverse_memcpy (&m4, p, sizeof (m4));
>>>>> +           memcpy (&u32, p, sizeof (u32));
>>>>> +           u32 = __builtin_bswap32 (u32);
>>>>> +           m4 = *(GFC_INTEGER_4*)&u32;
>>>>
>>>> Isn't that an aliasing violation?
>>>
>>> It looks like one.  Why not simply do
>>>
>>>    m4 = (GFC_INTEGER_4) u32;
>>>
>>> ?  I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>>
>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
>> the range of the signed variable, the result is
>> implementation-defined. Though I suppose the sensible
>> "implementation-defined behavior" in this case on a two's complement
>> target is to just do a bitwise copy.
>>
>> Anyway, to be really safe one could use memcpy instead; the compiler
>> optimizes small fixed size memcpy's just fine. Updated patch attached.
>>
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

Attachment: bswap4.diff
Description: Binary data

Reply via email to