Re: Memset/memcpy patch

Michael Zolotukhin Mon, 21 Nov 2011 08:36:57 -0800

Hi,

Continuing investigation of fails on bootstrap I found next problem
(besides the problem with unknown alignment described above): there is
a mess with size_needed and epilogue_size_needed when we generate
epilogue loop which also use SSE-moves, but no unrolled - that's
probably the reason of the fails we saw.


Please check the attached patch - though the full testing isn't over
yet. bootstraps seem to be ok as well as arrayarg.f90-test (with
sse_loop enabled).

On 19 November 2011 05:38, Jan Hubicka <hubi...@ucw.cz> wrote:
>> Given that x86 memset/memcpy is still broken, I think we should revert
>> it for now.
>
> Well, looking into the code, the SSE alignment issues needs work - the
> alignment test merely tests whether some alignmnet is known not whether 16 
> byte
> alignment is known that is the cause of failures in 32bit bootstrap.  I 
> originally
> convinced myself that this is safe since we soot for unaligned load/stores 
> anyway.
>
>
> I've commited the following patch that disabled SSE codegen and unbreaks atom
> bootstrap.  This seems more sensible to me given that the patch cumulated some
> good improvements on the non-SSE path as well and we could return into the SSE
> alignment issues incremntally.  There is still falure in the fortran testcase
> that I am convinced is previously latent issue.
>
> I will be offline tomorrow.  If there are futher serious problems, just fell
> free to revert the changes and we could look into them for next stage1.
>
> Honza
>
>        * i386.c (atom_cost): Disable SSE loop until alignment issues are 
> fixed.
> Index: i386.c
> ===================================================================
> --- i386.c      (revision 181479)
> +++ i386.c      (working copy)
> @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
>   /* stringop_algs for memcpy.
>      SSE loops works best on Atom, but fall back into non-SSE unrolled loop 
> variant
>      if that fails.  */
> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
> Known alignment.  */
> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
> -   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
> Unknown alignment.  */
> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
> */
> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
> +   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. 
>  */
> +    {libcall, {{2048, unrolled_loop},
>               {-1, libcall}}}}},
>
>   /* stringop_algs for memset.  */
> -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
> Known alignment.  */
> -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}},
> -   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},         /* Unknown 
> alignment.  */
> +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
> */
> +    {libcall, {{4096, unrolled_loop}, {-1, libcall}}}},
> +   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
>               {-1, libcall}}},
> -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
> +    {libcall, {{2048, unrolled_loop},
>               {-1, libcall}}}}},
>   1,                                   /* scalar_stmt_cost.  */
>   1,                                   /* scalar load_cost.  */



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

memfunc_epilogue_loops.patch
Description: Binary data

Re: Memset/memcpy patch

Reply via email to