Re: Memset/memcpy patch

2012-09-26 Thread Michael Zolotukhin
Hi HJ,
The last-year patch is currently almost useless, as efforts needed for
its rebase seem to be almost the same as efforts needed for writing it
from scratch. I hoped to make a patch covering at least subset of
cases, but unfortunately haven't had time even for it yet.

What time do we have for it now, when does stage1 finish?

Thanks, Michael

On 26 September 2012 19:00, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 31, 2012 at 1:54 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Any update?
 
  I will look into it today, but anyway I think it is stage1 material, so 
  we have some time to progress on it.
 
  Honza

 Hi Honza,

 The old patch was reverted and the new patch was posted at

 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html

 Have you got a chance to review it?

 I am in China till 5th, I will try to return to it shortly after returning.
 Ping me again if not - there seems to be a lot of work left on this patch...


 Hi Honza, Michael,

 Any changes to get it into GCC 4.8?

 Thanks.


 --
 H.J.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: Memset/memcpy patch

2012-08-31 Thread Jan Hubicka
 On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Any update?
 
  I will look into it today, but anyway I think it is stage1 material, so we 
  have some time to progress on it.
 
  Honza
 
 Hi Honza,
 
 The old patch was reverted and the new patch was posted at
 
 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html
 
 Have you got a chance to review it?

I am in China till 5th, I will try to return to it shortly after returning.
Ping me again if not - there seems to be a lot of work left on this patch...

Honza


Re: Memset/memcpy patch

2012-08-30 Thread H.J. Lu
On Mon, Dec 12, 2011 at 6:02 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Any update?

 I will look into it today, but anyway I think it is stage1 material, so we 
 have some time to progress on it.

 Honza

Hi Honza,

The old patch was reverted and the new patch was posted at

http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00336.html

Have you got a chance to review it?

Thanks.


-- 
H.J.


Re: Memset/memcpy patch

2011-12-12 Thread Michael Zolotukhin
Any update?

On 5 December 2011 15:14, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 Hi Jan,
 I debugged the changes, and I think I've hunted down all the bugs.
 I slightly refactored the code - now all new SSE-related code is more
 localized. Also, I fixed some alignment issues.
 Please find the new patch in the attachment (it's made against rev
 181709) - is it ok for trunk?

 Bootstrap and 'make check' passed on Atom and Corei7 (32,64 bits). I
 also checked specs2000, eembc1_1 and eembc2_0 on Atom.

 On 26 November 2011 09:18, Jan Hubicka hubi...@ucw.cz wrote:
 On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
  I found and fixed another problem in the latest memcpy/memest changes
  - with this fix all the failing tests mentioned in #51134 started
  passing. Bootstraps are also ok.
  Though I still see fails in 32-bit make check, so probably, it'd be
  better to revert the changes till these fails are fixed.
 

 I will revert it for now.

 OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
 could revisit this for next stage1. Probably not by adding all the features
 together, but extending prologues/epilogues first and adding SSE loops with
 the new alignment logic next.

 Honza

 --
 H.J.



 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: Memset/memcpy patch

2011-12-12 Thread Jan Hubicka
 Any update?

I will look into it today, but anyway I think it is stage1 material, so we have 
some time to progress on it.

Honza


Re: Memset/memcpy patch

2011-11-25 Thread Jan Hubicka
 On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
  I found and fixed another problem in the latest memcpy/memest changes
  - with this fix all the failing tests mentioned in #51134 started
  passing. Bootstraps are also ok.
  Though I still see fails in 32-bit make check, so probably, it'd be
  better to revert the changes till these fails are fixed.
 
 
 I will revert it for now.

OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
could revisit this for next stage1. Probably not by adding all the features
together, but extending prologues/epilogues first and adding SSE loops with
the new alignment logic next.

Honza
 
 -- 
 H.J.


Re: Memset/memcpy patch

2011-11-24 Thread H.J. Lu
On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 I found and fixed another problem in the latest memcpy/memest changes
 - with this fix all the failing tests mentioned in #51134 started
 passing. Bootstraps are also ok.
 Though I still see fails in 32-bit make check, so probably, it'd be
 better to revert the changes till these fails are fixed.


I will revert it for now.

-- 
H.J.


Re: Memset/memcpy patch

2011-11-23 Thread Michael Zolotukhin
I found and fixed another problem in the latest memcpy/memest changes
- with this fix all the failing tests mentioned in #51134 started
passing. Bootstraps are also ok.
Though I still see fails in 32-bit make check, so probably, it'd be
better to revert the changes till these fails are fixed.

On 21 November 2011 20:36, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 Hi,

 Continuing investigation of fails on bootstrap I found next problem
 (besides the problem with unknown alignment described above): there is
 a mess with size_needed and epilogue_size_needed when we generate
 epilogue loop which also use SSE-moves, but no unrolled - that's
 probably the reason of the fails we saw.

 Please check the attached patch - though the full testing isn't over
 yet. bootstraps seem to be ok as well as arrayarg.f90-test (with
 sse_loop enabled).

 On 19 November 2011 05:38, Jan Hubicka hubi...@ucw.cz wrote:
 Given that x86 memset/memcpy is still broken, I think we should revert
 it for now.

 Well, looking into the code, the SSE alignment issues needs work - the
 alignment test merely tests whether some alignmnet is known not whether 16 
 byte
 alignment is known that is the cause of failures in 32bit bootstrap.  I 
 originally
 convinced myself that this is safe since we soot for unaligned load/stores 
 anyway.


 I've commited the following patch that disabled SSE codegen and unbreaks atom
 bootstrap.  This seems more sensible to me given that the patch cumulated 
 some
 good improvements on the non-SSE path as well and we could return into the 
 SSE
 alignment issues incremntally.  There is still falure in the fortran testcase
 that I am convinced is previously latent issue.

 I will be offline tomorrow.  If there are futher serious problems, just fell
 free to revert the changes and we could look into them for next stage1.

 Honza

        * i386.c (atom_cost): Disable SSE loop until alignment issues are 
 fixed.
 Index: i386.c
 ===
 --- i386.c      (revision 181479)
 +++ i386.c      (working copy)
 @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
   /* stringop_algs for memcpy.
      SSE loops works best on Atom, but fall back into non-SSE unrolled loop 
 variant
      if that fails.  */
 -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
 Known alignment.  */
 -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
 -   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
 Unknown alignment.  */
 -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
 +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
 */
 +    {libcall, {{4096, unrolled_loop}, {-1, libcall,
 +   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown 
 alignment.  */
 +    {libcall, {{2048, unrolled_loop},
               {-1, libcall},

   /* stringop_algs for memset.  */
 -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
 Known alignment.  */
 -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
 -   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},         /* Unknown 
 alignment.  */
 +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
 */
 +    {libcall, {{4096, unrolled_loop}, {-1, libcall,
 +   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
               {-1, libcall}}},
 -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
 +    {libcall, {{2048, unrolled_loop},
               {-1, libcall},
   1,                                   /* scalar_stmt_cost.  */
   1,                                   /* scalar load_cost.  */



 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: Memset/memcpy patch

2011-11-21 Thread Michael Zolotukhin
Hi,

Continuing investigation of fails on bootstrap I found next problem
(besides the problem with unknown alignment described above): there is
a mess with size_needed and epilogue_size_needed when we generate
epilogue loop which also use SSE-moves, but no unrolled - that's
probably the reason of the fails we saw.

Please check the attached patch - though the full testing isn't over
yet. bootstraps seem to be ok as well as arrayarg.f90-test (with
sse_loop enabled).

On 19 November 2011 05:38, Jan Hubicka hubi...@ucw.cz wrote:
 Given that x86 memset/memcpy is still broken, I think we should revert
 it for now.

 Well, looking into the code, the SSE alignment issues needs work - the
 alignment test merely tests whether some alignmnet is known not whether 16 
 byte
 alignment is known that is the cause of failures in 32bit bootstrap.  I 
 originally
 convinced myself that this is safe since we soot for unaligned load/stores 
 anyway.


 I've commited the following patch that disabled SSE codegen and unbreaks atom
 bootstrap.  This seems more sensible to me given that the patch cumulated some
 good improvements on the non-SSE path as well and we could return into the SSE
 alignment issues incremntally.  There is still falure in the fortran testcase
 that I am convinced is previously latent issue.

 I will be offline tomorrow.  If there are futher serious problems, just fell
 free to revert the changes and we could look into them for next stage1.

 Honza

        * i386.c (atom_cost): Disable SSE loop until alignment issues are 
 fixed.
 Index: i386.c
 ===
 --- i386.c      (revision 181479)
 +++ i386.c      (working copy)
 @@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
   /* stringop_algs for memcpy.
      SSE loops works best on Atom, but fall back into non-SSE unrolled loop 
 variant
      if that fails.  */
 -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
 Known alignment.  */
 -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
 -   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
 Unknown alignment.  */
 -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
 +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
 */
 +    {libcall, {{4096, unrolled_loop}, {-1, libcall,
 +   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment. 
  */
 +    {libcall, {{2048, unrolled_loop},
               {-1, libcall},

   /* stringop_algs for memset.  */
 -  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
 Known alignment.  */
 -    {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
 -   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop},         /* Unknown 
 alignment.  */
 +  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  
 */
 +    {libcall, {{4096, unrolled_loop}, {-1, libcall,
 +   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
               {-1, libcall}}},
 -    {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
 +    {libcall, {{2048, unrolled_loop},
               {-1, libcall},
   1,                                   /* scalar_stmt_cost.  */
   1,                                   /* scalar load_cost.  */



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


memfunc_epilogue_loops.patch
Description: Binary data


Re: Memset/memcpy patch

2011-11-18 Thread Michael Zolotukhin
I found another bug in current implementation. A patch for it doesn't
cure i686-linux- bootstrap, but fixes fails on some tests (see
attached).

The problem was that we tried to add runtime tests for alignment even
if both SRC and DST had unknown alignment - in this case it could be
impossible to make them both aligned simultaneously, so I think it's
easier to even not try to use aligned SSE-moves at all. Generation of
prologues with runtime tests could be used only if at least one
alignment is known - otherwise it's incorrect. Probably, generation of
such prologues could be removed from MEMMOV at all for now.

Though, even with this fix i686-bootstrap still fails. Configure for
the bootstrap-fail reproducing:
CC=gcc -m32 CXX=g++ -m32 ../configure --with-arch=core2
--with-cpu=atom --prefix=`pwd` i686-linux --with-fpmath=sse
--enable-languages=c,c++,fortran

On 18 November 2011 06:23, Jan Hubicka hubi...@ucw.cz wrote:
  
   The current x86 memset/memcpy expansion is broken. It miscompiles
   many programs, including GCC itself.  Should it be reverted for now?
 
  There was problem in the new code doing loopy epilogues.
  I am currently testing the following patch that shold fix the problem.
  We could either revert now and I will apply combined patch or I hope to 
  fix that
  tonight.

 To expand little bit. I was looking into the code for most of the day today 
 and
 the patch combines several fixes
    1) the new loopy epilogue code was quite broken. It did not work for 
 memset at all because
       the promoted value was not always initialized that I fixed in the 
 version of patch
       that is in mainline now. It however also miss bound check in some 
 cases.  This is fixed
       by the expand_set_or_movmem_via_loop_with_iter change.
    2) I misupdated atom description so 32bit memset was not expanded inline, 
 this is fixed
       by memset changes
    3) decide_alg was broken in two ways - first it gives complex algorithms 
 for -O0
       and it chose wrong variant when sse_loop is used.
    4) the epilogue loop was output even in the case it is not needed - i.e. 
 when unrolled
       loops handled 16 bytes at once, and block size is 39. This is the 
 ix86_movmem
       and ix86_setmem change
    5) The implementation of ix86_movmem/ix86_setmem diverged for no reason 
 so I got it back
       to sync. For some reason SSE code in movmem is not output for 64bit 
 unaligned memcpy
       that is fixed too.
    6) it seems that both bdver and core is good enough on handling 
 misaligned blocks that
       the alignmnet prologues can be ommited. This greatly improves and 
 reduces size of the
       inline sequence. I will however break this out into independent patch.

 Life would be easier if the changes was made in multiple incremental steps, 
 stringops expansion
 is relatively tricky busyness and realively easy to get wrong in some cases 
 since there are so
 many of them depending on knowledge of size/alignmnet and target 
 architecture.

 Hi,
 this is the patch I comitted after bootstrapping/regstesting x86_64-linux and
 --with-arch=core2 --with-cpu=atom 
 gfortran.fortran-torture/execute/arrayarg.f90
 failure stays. As I've explained in the PR log, I believe it is previously
 latent problem elsewhere that is now triggered by inline memset expansion that
 is later unrolled.  I would welcome help from someone who understand the
 testcase on whether it is aliasing safe or not.

 Honza

        PR bootstrap/51134
        * i386.c (atom_cost): Fix 32bit memset description.
        (expand_set_or_movmem_via_loop_with_iter): Output proper bounds check 
 for epilogue loops.
        (expand_movmem_epilogue): Handle epilogues up to size 15 w/o producing 
 byte loop.
        (decide_alg): sse_loop is not useable wthen SSE2 is disabled; when not 
 optimizing always
        use rep movsb or lincall; do not produce word sized loops when 
 optimizing memset for
        size (to avoid need for large constants).
        (ix86_expand_movmem): Get into sync with ix86_expand_setmem; choose 
 unroll factors
        better; always do 128bit moves when producing SSE loops; do not 
 produce loopy epilogue
        when size is too small.
        (promote_duplicated_reg_to_size): Do not look into desired alignments 
 when
        doing vector expansion.
        (ix86_expand_setmem): Track better when promoted value is available; 
 choose unroll factors
        more sanely.; output loopy epilogue only when needed.
 Index: config/i386/i386.c
 ===
 *** config/i386/i386.c  (revision 181407)
 --- config/i386/i386.c  (working copy)
 *** struct processor_costs atom_cost = {
 *** 1785,1791 
       if that fails.  */
    {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
 Known alignment.  */
      {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
 !    {{libcall, {{-1, libcall}}},                      

Re: Memset/memcpy patch

2011-11-18 Thread Jan Hubicka
 I found another bug in current implementation. A patch for it doesn't
 cure i686-linux- bootstrap, but fixes fails on some tests (see
 attached).
 
 The problem was that we tried to add runtime tests for alignment even
 if both SRC and DST had unknown alignment - in this case it could be
 impossible to make them both aligned simultaneously, so I think it's
 easier to even not try to use aligned SSE-moves at all. Generation of
 prologues with runtime tests could be used only if at least one
 alignment is known - otherwise it's incorrect. Probably, generation of
 such prologues could be removed from MEMMOV at all for now.

The prologues always align the destination as it helps more than aligning
source on most chips.  I do not see problem with that.  But for SSE either we
should arrange unaligned load opcodes (that is what I see in generated code, 
but I
guess it depends on -march setting) or simply disqualify the sse_loop algorithm
in decide_alg when alignment is not know.
 
 Though, even with this fix i686-bootstrap still fails. Configure for
 the bootstrap-fail reproducing:
 CC=gcc -m32 CXX=g++ -m32 ../configure --with-arch=core2
 --with-cpu=atom --prefix=`pwd` i686-linux --with-fpmath=sse
 --enable-languages=c,c++,fortran

Default i686-linux bootstrap was working for me. I will try your setting but
my time today evening and at weekend is limited.

Honza


Re: Memset/memcpy patch

2011-11-18 Thread Jan Hubicka
 Given that x86 memset/memcpy is still broken, I think we should revert
 it for now.

Well, looking into the code, the SSE alignment issues needs work - the
alignment test merely tests whether some alignmnet is known not whether 16 byte
alignment is known that is the cause of failures in 32bit bootstrap.  I 
originally
convinced myself that this is safe since we soot for unaligned load/stores 
anyway.


I've commited the following patch that disabled SSE codegen and unbreaks atom
bootstrap.  This seems more sensible to me given that the patch cumulated some
good improvements on the non-SSE path as well and we could return into the SSE
alignment issues incremntally.  There is still falure in the fortran testcase
that I am convinced is previously latent issue.

I will be offline tomorrow.  If there are futher serious problems, just fell
free to revert the changes and we could look into them for next stage1.

Honza

* i386.c (atom_cost): Disable SSE loop until alignment issues are fixed.
Index: i386.c
===
--- i386.c  (revision 181479)
+++ i386.c  (working copy)
@@ -1783,18 +1783,18 @@ struct processor_costs atom_cost = {
   /* stringop_algs for memcpy.  
  SSE loops works best on Atom, but fall back into non-SSE unrolled loop 
variant
  if that fails.  */
-  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
Known alignment.  */
-{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
-   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
Unknown alignment.  */
-{libcall, {{2048, sse_loop}, {2048, unrolled_loop},
+  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
+{libcall, {{4096, unrolled_loop}, {-1, libcall,
+   {{libcall, {{2048, unrolled_loop}, {-1, libcall}}}, /* Unknown alignment.  
*/
+{libcall, {{2048, unrolled_loop},
   {-1, libcall},
 
   /* stringop_algs for memset.  */
-  {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
Known alignment.  */
-{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
-   {{libcall, {{1024, sse_loop}, {1024, unrolled_loop}, /* Unknown 
alignment.  */
+  {{{libcall, {{4096, unrolled_loop}, {-1, libcall}}}, /* Known alignment.  */
+{libcall, {{4096, unrolled_loop}, {-1, libcall,
+   {{libcall, {{1024, unrolled_loop},   /* Unknown alignment.  */
   {-1, libcall}}},
-{libcall, {{2048, sse_loop}, {2048, unrolled_loop},
+{libcall, {{2048, unrolled_loop},
   {-1, libcall},
   1,   /* scalar_stmt_cost.  */
   1,   /* scalar load_cost.  */


Re: Memset/memcpy patch

2011-11-17 Thread H.J. Lu
On Mon, Nov 14, 2011 at 3:48 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this is hopefully final variant of patch. The epilogue code was broken in 
 some
 scenarios for memset, but should work safely now.  I also fixed the tables 
 for
 core/buldozer/amdfam10 chips.

 But before it can be comitted, we need to reoslve copyright assignment 
 issues.
 You don't seem to be liested as having copyright assignment, does you company
 have one?  Otherwise, please try to get one soon.

 Honza

 2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
            Jan Hubicka  j...@suse.cz

        * gcc.target/i386/sw-1.c: Force rep;movsb.

        * config/i386/i386.h (processor_costs): Add second dimension to
        stringop_algs array.
        * config/i386/i386.c (cost models): Initialize second dimension of
        stringop_algs arrays.
        (core_cost): New costs based on generic64 costs with updated stringop
        values.
        (promote_duplicated_reg): Add support for vector modes, add
        declaration.
        (promote_duplicated_reg_to_size): Likewise.
        (processor_target): Set core costs for core variants.
        (expand_set_or_movmem_via_loop_with_iter): New function.
        (expand_set_or_movmem_via_loop): Enable reuse of the same iters in
        different loops, produced by this function.
        (emit_strset): New function.
        (expand_movmem_epilogue): Add epilogue generation for bigger sizes,
        use SSE-moves where possible.
        (expand_setmem_epilogue): Likewise.
        (expand_movmem_prologue): Likewise for prologue.
        (expand_setmem_prologue): Likewise.
        (expand_constant_movmem_prologue): Likewise.
        (expand_constant_setmem_prologue): Likewise.
        (decide_alg): Add new argument align_unknown.  Fix algorithm of
        strategy selection if TARGET_INLINE_ALL_STRINGOPS is set; Skip 
 sse_loop
        (decide_alignment): Update desired alignment according to chosen move
        mode.
        (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves.
        (ix86_expand_setmem): Likewise.
        (ix86_slow_unaligned_access): Implementation of new hook
        slow_unaligned_access.
        * config/i386/i386.md (strset): Enable half-SSE moves.
        * config/i386/sse.md (vec_dupv4si): Add expand for vec_dupv4si.
        (vec_dupv2di): Add expand for vec_dupv2di.

 This may have caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134


The current x86 memset/memcpy expansion is broken. It miscompiles
many programs, including GCC itself.  Should it be reverted for now?

-- 
H.J.


Re: Memset/memcpy patch

2011-11-17 Thread Jan Hubicka
 
 The current x86 memset/memcpy expansion is broken. It miscompiles
 many programs, including GCC itself.  Should it be reverted for now?

There was problem in the new code doing loopy epilogues.
I am currently testing the following patch that shold fix the problem.
We could either revert now and I will apply combined patch or I hope to fix that
tonight.

Honza

Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 181442)
+++ config/i386/i386.h  (working copy)
@@ -276,6 +276,7 @@ enum ix86_tune_indices {
   X86_TUNE_PROMOTE_QIMODE,
   X86_TUNE_FAST_PREFIX,
   X86_TUNE_SINGLE_STRINGOP,
+  X86_TUNE_ALIGN_STRINGOP,
   X86_TUNE_QIMODE_MATH,
   X86_TUNE_HIMODE_MATH,
   X86_TUNE_PROMOTE_QI_REGS,
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 181442)
+++ config/i386/i386.md (working copy)
@@ -15944,6 +15944,17 @@
  (clobber (reg:CC FLAGS_REG))])]
   
 {
+  rtx vec_reg;
+  enum machine_mode mode = GET_MODE (operands[2]);
+  if (vector_extensions_used_for_mode (mode)
+   CONSTANT_P (operands[2]))
+{
+  if (mode == DImode)
+   mode = TARGET_64BIT ? V2DImode : V4SImode;
+  vec_reg = gen_reg_rtx (mode);
+  emit_move_insn (vec_reg, operands[2]);
+  operands[2] = vec_reg;
+}
   if (GET_MODE (operands[1]) != GET_MODE (operands[2]))
 operands[1] = adjust_address_nv (operands[1], GET_MODE (operands[2]), 0);
 
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 181442)
+++ config/i386/i386.c  (working copy)
@@ -1785,7 +1785,7 @@ struct processor_costs atom_cost = {
  if that fails.  */
   {{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
Known alignment.  */
 {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
-   {{libcall, {{-1, libcall}}},   /* Unknown 
alignment.  */
+   {{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
Unknown alignment.  */
 {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
   {-1, libcall},
 
@@ -2178,6 +2178,9 @@ static unsigned int initial_ix86_tune_fe
   /* X86_TUNE_SINGLE_STRINGOP */
   m_386 | m_P4_NOCONA,
 
+  /* X86_TUNE_ALIGN_STRINGOP */
+  ~(m_BDVER | m_CORE2I7),
+
   /* X86_TUNE_QIMODE_MATH */
   ~0,
 
@@ -3724,6 +3727,14 @@ ix86_option_override_internal (bool main
 target_flags |= MASK_NO_RED_ZONE;
 }
 
+  if (!(target_flags_explicit  MASK_NO_ALIGN_STRINGOPS))
+{
+  if (ix86_tune_features[X86_TUNE_ALIGN_STRINGOP]  ix86_tune_mask)
+target_flags = ~MASK_NO_ALIGN_STRINGOPS;
+  else
+target_flags |= MASK_NO_ALIGN_STRINGOPS;
+}
+
   /* Keep nonleaf frame pointers.  */
   if (flag_omit_frame_pointer)
 target_flags = ~MASK_OMIT_LEAF_FRAME_POINTER;
@@ -21149,20 +21160,25 @@ expand_set_or_movmem_via_loop_with_iter
 
   top_label = gen_label_rtx ();
   out_label = gen_label_rtx ();
-  if (!reuse_iter)
-iter = gen_reg_rtx (iter_mode);
-
   size = expand_simple_binop (iter_mode, AND, count, piece_size_mask,
- NULL, 1, OPTAB_DIRECT);
-  /* Those two should combine.  */
-  if (piece_size == const1_rtx)
+  NULL, 1, OPTAB_DIRECT);
+  if (!reuse_iter)
 {
-  emit_cmp_and_jump_insns (size, const0_rtx, EQ, NULL_RTX, iter_mode,
+  iter = gen_reg_rtx (iter_mode);
+  /* Those two should combine.  */
+  if (piece_size == const1_rtx)
+   {
+ emit_cmp_and_jump_insns (size, const0_rtx, EQ, NULL_RTX, iter_mode,
+  true, out_label);
+ predict_jump (REG_BR_PROB_BASE * 10 / 100);
+   }
+  emit_move_insn (iter, const0_rtx);
+}
+  else
+{
+  emit_cmp_and_jump_insns (iter, size, GE, NULL_RTX, iter_mode,
   true, out_label);
-  predict_jump (REG_BR_PROB_BASE * 10 / 100);
 }
-  if (!reuse_iter)
-emit_move_insn (iter, const0_rtx);
 
   emit_label (top_label);
 
@@ -21588,17 +21604,28 @@ expand_setmem_epilogue (rtx destmem, rtx
 Remaining part we'll move using Pmode and narrower modes.  */
 
   if (promoted_to_vector_value)
-   while (remainder_size = 16)
- {
-   if (GET_MODE (destmem) != move_mode)
- destmem = adjust_automodify_address_nv (destmem, move_mode,
- destptr, offset);
-   emit_strset (destmem, promoted_to_vector_value, destptr,
-move_mode, offset);
+   {
+ if (promoted_to_vector_value)
+   {
+ if (max_size = GET_MODE_SIZE (V4SImode))
+   move_mode = V4SImode;
+ else if (max_size = GET_MODE_SIZE (DImode))
+   move_mode = DImode;
+   }
+ while (remainder_size = 

Re: Memset/memcpy patch

2011-11-17 Thread Jan Hubicka
  
  The current x86 memset/memcpy expansion is broken. It miscompiles
  many programs, including GCC itself.  Should it be reverted for now?
 
 There was problem in the new code doing loopy epilogues.
 I am currently testing the following patch that shold fix the problem.
 We could either revert now and I will apply combined patch or I hope to fix 
 that
 tonight.

To expand little bit. I was looking into the code for most of the day today and
the patch combines several fixes
   1) the new loopy epilogue code was quite broken. It did not work for memset 
at all because
  the promoted value was not always initialized that I fixed in the version 
of patch
  that is in mainline now. It however also miss bound check in some cases.  
This is fixed
  by the expand_set_or_movmem_via_loop_with_iter change.
   2) I misupdated atom description so 32bit memset was not expanded inline, 
this is fixed
  by memset changes
   3) decide_alg was broken in two ways - first it gives complex algorithms for 
-O0
  and it chose wrong variant when sse_loop is used.
   4) the epilogue loop was output even in the case it is not needed - i.e. 
when unrolled
  loops handled 16 bytes at once, and block size is 39. This is the 
ix86_movmem
  and ix86_setmem change
   5) The implementation of ix86_movmem/ix86_setmem diverged for no reason so I 
got it back
  to sync. For some reason SSE code in movmem is not output for 64bit 
unaligned memcpy
  that is fixed too.
   6) it seems that both bdver and core is good enough on handling misaligned 
blocks that 
  the alignmnet prologues can be ommited. This greatly improves and reduces 
size of the
  inline sequence. I will however break this out into independent patch.

Life would be easier if the changes was made in multiple incremental steps, 
stringops expansion
is relatively tricky busyness and realively easy to get wrong in some cases 
since there are so 
many of them depending on knowledge of size/alignmnet and target architecture.

Honza


Re: Memset/memcpy patch

2011-11-17 Thread Jan Hubicka
   
   The current x86 memset/memcpy expansion is broken. It miscompiles
   many programs, including GCC itself.  Should it be reverted for now?
  
  There was problem in the new code doing loopy epilogues.
  I am currently testing the following patch that shold fix the problem.
  We could either revert now and I will apply combined patch or I hope to fix 
  that
  tonight.
 
 To expand little bit. I was looking into the code for most of the day today 
 and
 the patch combines several fixes
1) the new loopy epilogue code was quite broken. It did not work for 
 memset at all because
   the promoted value was not always initialized that I fixed in the 
 version of patch
   that is in mainline now. It however also miss bound check in some 
 cases.  This is fixed
   by the expand_set_or_movmem_via_loop_with_iter change.
2) I misupdated atom description so 32bit memset was not expanded inline, 
 this is fixed
   by memset changes
3) decide_alg was broken in two ways - first it gives complex algorithms 
 for -O0
   and it chose wrong variant when sse_loop is used.
4) the epilogue loop was output even in the case it is not needed - i.e. 
 when unrolled
   loops handled 16 bytes at once, and block size is 39. This is the 
 ix86_movmem
   and ix86_setmem change
5) The implementation of ix86_movmem/ix86_setmem diverged for no reason so 
 I got it back
   to sync. For some reason SSE code in movmem is not output for 64bit 
 unaligned memcpy
   that is fixed too.
6) it seems that both bdver and core is good enough on handling misaligned 
 blocks that 
   the alignmnet prologues can be ommited. This greatly improves and 
 reduces size of the
   inline sequence. I will however break this out into independent patch.
 
 Life would be easier if the changes was made in multiple incremental steps, 
 stringops expansion
 is relatively tricky busyness and realively easy to get wrong in some cases 
 since there are so 
 many of them depending on knowledge of size/alignmnet and target architecture.

Hi,
this is the patch I comitted after bootstrapping/regstesting x86_64-linux and
--with-arch=core2 --with-cpu=atom gfortran.fortran-torture/execute/arrayarg.f90
failure stays. As I've explained in the PR log, I believe it is previously
latent problem elsewhere that is now triggered by inline memset expansion that
is later unrolled.  I would welcome help from someone who understand the
testcase on whether it is aliasing safe or not.

Honza

PR bootstrap/51134
* i386.c (atom_cost): Fix 32bit memset description.
(expand_set_or_movmem_via_loop_with_iter): Output proper bounds check 
for epilogue loops.
(expand_movmem_epilogue): Handle epilogues up to size 15 w/o producing 
byte loop.
(decide_alg): sse_loop is not useable wthen SSE2 is disabled; when not 
optimizing always
use rep movsb or lincall; do not produce word sized loops when 
optimizing memset for
size (to avoid need for large constants).
(ix86_expand_movmem): Get into sync with ix86_expand_setmem; choose 
unroll factors
better; always do 128bit moves when producing SSE loops; do not produce 
loopy epilogue
when size is too small.
(promote_duplicated_reg_to_size): Do not look into desired alignments 
when
doing vector expansion.
(ix86_expand_setmem): Track better when promoted value is available; 
choose unroll factors
more sanely.; output loopy epilogue only when needed.
Index: config/i386/i386.c
===
*** config/i386/i386.c  (revision 181407)
--- config/i386/i386.c  (working copy)
*** struct processor_costs atom_cost = {
*** 1785,1791 
   if that fails.  */
{{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
Known alignment.  */
  {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
!{{libcall, {{-1, libcall}}},  /* Unknown 
alignment.  */
  {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
   {-1, libcall},
  
--- 1785,1791 
   if that fails.  */
{{{libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall}}}, /* 
Known alignment.  */
  {libcall, {{4096, sse_loop}, {4096, unrolled_loop}, {-1, libcall,
!{{libcall, {{2048, sse_loop}, {2048, unrolled_loop}, {-1, libcall}}}, /* 
Unknown alignment.  */
  {libcall, {{2048, sse_loop}, {2048, unrolled_loop},
   {-1, libcall},
  
*** expand_set_or_movmem_via_loop_with_iter 
*** 21149,21168 
  
top_label = gen_label_rtx ();
out_label = gen_label_rtx ();
-   if (!reuse_iter)
- iter = gen_reg_rtx (iter_mode);
- 
size = expand_simple_binop (iter_mode, AND, count, piece_size_mask,
! NULL, 1, OPTAB_DIRECT);
!   /* Those two should combine.  */
!   if (piece_size == 

Re: Memset/memcpy patch

2011-11-15 Thread Michael Zolotukhin
 Looks like we have a bootstrap issue, thus sorry if may message may appear 
 stupid nitpicking: why Zolotukhin Michael instead of Michael Zolotukhin in 
 the ChangeLog? Is Michael the family name?
Michael is the first name, Zolotukhin - last name. I probably swapped
them accidentally in the changelog.

Michael


Re: Memset/memcpy patch

2011-11-15 Thread Paolo Carlini

On 11/15/2011 04:12 PM, Michael Zolotukhin wrote:

Looks like we have a bootstrap issue, thus sorry if may message may appear 
stupid nitpicking: why Zolotukhin Michael instead of Michael Zolotukhin in the 
ChangeLog? Is Michael the family name?

Michael is the first name, Zolotukhin - last name. I probably swapped
them accidentally in the changelog.
Ah, ok, thanks. Many years ago I learned this funny (from my parochial 
Italian point of view, sorry) story:


http://en.wikipedia.org/wiki/Bui_Tuong_Phong

and I'm still quite sensitive to the issue.

Paolo.


Re: Memset/memcpy patch

2011-11-14 Thread H.J. Lu
On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this is hopefully final variant of patch. The epilogue code was broken in some
 scenarios for memset, but should work safely now.  I also fixed the tables for
 core/buldozer/amdfam10 chips.

 But before it can be comitted, we need to reoslve copyright assignment issues.
 You don't seem to be liested as having copyright assignment, does you company
 have one?  Otherwise, please try to get one soon.

 Honza

 2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
            Jan Hubicka  j...@suse.cz


Zolotukhin Michael works for Intel and has copyright assignment with FSF.

-- 
H.J.


Re: Memset/memcpy patch

2011-11-14 Thread Jan Hubicka
 On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  this is hopefully final variant of patch. The epilogue code was broken in 
  some
  scenarios for memset, but should work safely now.  I also fixed the tables 
  for
  core/buldozer/amdfam10 chips.
 
  But before it can be comitted, we need to reoslve copyright assignment 
  issues.
  You don't seem to be liested as having copyright assignment, does you 
  company
  have one?  Otherwise, please try to get one soon.
 
  Honza
 
  2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
             Jan Hubicka  j...@suse.cz
 
 
 Zolotukhin Michael works for Intel and has copyright assignment with FSF.

Thank you.  I went ahead and comitted the patch then.

Honza


Re: Memset/memcpy patch

2011-11-14 Thread H.J. Lu
2011/11/14 Jan Hubicka hubi...@ucw.cz:
 On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  this is hopefully final variant of patch. The epilogue code was broken in 
  some
  scenarios for memset, but should work safely now.  I also fixed the tables 
  for
  core/buldozer/amdfam10 chips.
 
  But before it can be comitted, we need to reoslve copyright assignment 
  issues.
  You don't seem to be liested as having copyright assignment, does you 
  company
  have one?  Otherwise, please try to get one soon.
 
  Honza
 
  2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
             Jan Hubicka  j...@suse.cz
 

 Zolotukhin Michael works for Intel and has copyright assignment with FSF.

 Thank you.  I went ahead and comitted the patch then.


GCC failed to bootstrap:

../../src-trunk/libiberty/sort.c:100:14: internal compiler error: in
decide_alg, at config/i386/i386.c:22094
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
make[6]: *** [sort.o] Error 1

-- 
H.J.


Re: Memset/memcpy patch

2011-11-14 Thread Iain Sandoe


On 14 Nov 2011, at 20:36, H.J. Lu wrote:


2011/11/14 Jan Hubicka hubi...@ucw.cz:

On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:

Hi,
this is hopefully final variant of patch. The epilogue code was  
broken in some
scenarios for memset, but should work safely now.  I also fixed  
the tables for

core/buldozer/amdfam10 chips.

But before it can be comitted, we need to reoslve copyright  
assignment issues.
You don't seem to be liested as having copyright assignment, does  
you company

have one?  Otherwise, please try to get one soon.

Honza

2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
   Jan Hubicka  j...@suse.cz



Zolotukhin Michael works for Intel and has copyright assignment  
with FSF.


Thank you.  I went ahead and comitted the patch then.



GCC failed to bootstrap:

../../src-trunk/libiberty/sort.c:100:14: internal compiler error: in
decide_alg, at config/i386/i386.c:22094
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
make[6]: *** [sort.o] Error 1



Assuming that the target is a core processor:

I'm testing a patch from Honza for this - which he has asked to be  
checked in if it works out OK.


just a pasto...




Index: i386.c
===
--- i386.c  (revision 181360)
+++ i386.c  (working copy)
@@ -1877,10 +1877,10 @@ struct processor_costs core_cost = {
{libcall, {{16, loop}, {24, unrolled_loop}, {1024,  
rep_prefix_8_byte}, {-1, libcall},


  /* stringop_algs for memset.  */
-  {{{libcall, {{256, rep_prefix_4_byte}}}, /* Known alignment.  */
-{libcall, {{256, rep_prefix_8_byte,
-   {{libcall, {{256, rep_prefix_4_byte}}}, /* Unknown alignment.  */
-{libcall, {{256, rep_prefix_8_byte},
+  {{{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Known  
alignment.  */

+{libcall, {{256, rep_prefix_8_byte}, {-1 libcall,
+   {{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Unknown  
alignment.  */

+{libcall, {{256, rep_prefix_8_byte}, {-1 libcall},
  1,/* scalar_stmt_cost.  */
  1,/* scalar load_cost.  */
  1,/* scalar_store_cost.  */


Re: Memset/memcpy patch

2011-11-14 Thread H.J. Lu
On Mon, Nov 14, 2011 at 12:40 PM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:

 On 14 Nov 2011, at 20:36, H.J. Lu wrote:

 2011/11/14 Jan Hubicka hubi...@ucw.cz:

 On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:

 Hi,
 this is hopefully final variant of patch. The epilogue code was broken
 in some
 scenarios for memset, but should work safely now.  I also fixed the
 tables for
 core/buldozer/amdfam10 chips.

 But before it can be comitted, we need to reoslve copyright assignment
 issues.
 You don't seem to be liested as having copyright assignment, does you
 company
 have one?  Otherwise, please try to get one soon.

 Honza

 2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
           Jan Hubicka  j...@suse.cz


 Zolotukhin Michael works for Intel and has copyright assignment with
 FSF.

 Thank you.  I went ahead and comitted the patch then.


 GCC failed to bootstrap:

 ../../src-trunk/libiberty/sort.c:100:14: internal compiler error: in
 decide_alg, at config/i386/i386.c:22094
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See http://gcc.gnu.org/bugs.html for instructions.
 make[6]: *** [sort.o] Error 1


 Assuming that the target is a core processor:

 I'm testing a patch from Honza for this - which he has asked to be checked
 in if it works out OK.

 just a pasto...




 Index: i386.c
 ===
 --- i386.c      (revision 181360)
 +++ i386.c      (working copy)
 @@ -1877,10 +1877,10 @@ struct processor_costs core_cost = {
    {libcall, {{16, loop}, {24, unrolled_loop}, {1024, rep_prefix_8_byte},
 {-1, libcall},

  /* stringop_algs for memset.  */
 -  {{{libcall, {{256, rep_prefix_4_byte}}}, /* Known alignment.  */
 -    {libcall, {{256, rep_prefix_8_byte,
 -   {{libcall, {{256, rep_prefix_4_byte}}}, /* Unknown alignment.  */
 -    {libcall, {{256, rep_prefix_8_byte},
 +  {{{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Known
 alignment.  */
 +    {libcall, {{256, rep_prefix_8_byte}, {-1 libcall,
 +   {{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Unknown
 alignment.  */
 +    {libcall, {{256, rep_prefix_8_byte}, {-1 libcall},
  1,                                    /* scalar_stmt_cost.  */
  1,                                    /* scalar load_cost.  */
  1,                                    /* scalar_store_cost.  */


It looks reasonable.

-- 
H.J.


Re: Memset/memcpy patch

2011-11-14 Thread Paolo Carlini
Hi,
 
 2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
Jan Hubicka  j...@suse.cz
 
 
 Zolotukhin Michael works for Intel and has copyright assignment with FSF.

Looks like we have a bootstrap issue, thus sorry if may message may appear 
stupid nitpicking: why Zolotukhin Michael instead of Michael Zolotukhin in the 
ChangeLog? Is Michael the family name?

Thanks,
Paolo


Re: Memset/memcpy patch

2011-11-14 Thread Iain Sandoe


On 14 Nov 2011, at 20:44, H.J. Lu wrote:


On Mon, Nov 14, 2011 at 12:40 PM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:


On 14 Nov 2011, at 20:36, H.J. Lu wrote:


2011/11/14 Jan Hubicka hubi...@ucw.cz:


On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz  
wrote:


Hi,
this is hopefully final variant of patch. The epilogue code was  
broken

in some
scenarios for memset, but should work safely now.  I also fixed  
the

tables for
core/buldozer/amdfam10 chips.

But before it can be comitted, we need to reoslve copyright  
assignment

issues.
You don't seem to be liested as having copyright assignment,  
does you

company
have one?  Otherwise, please try to get one soon.

Honza

2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
  Jan Hubicka  j...@suse.cz



Zolotukhin Michael works for Intel and has copyright assignment  
with

FSF.


Thank you.  I went ahead and comitted the patch then.



GCC failed to bootstrap:

../../src-trunk/libiberty/sort.c:100:14: internal compiler error: in
decide_alg, at config/i386/i386.c:22094
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
make[6]: *** [sort.o] Error 1



Assuming that the target is a core processor:

I'm testing a patch from Honza for this - which he has asked to be  
checked

in if it works out OK.

just a pasto...




Index: i386.c
===
--- i386.c  (revision 181360)
+++ i386.c  (working copy)
@@ -1877,10 +1877,10 @@ struct processor_costs core_cost = {
   {libcall, {{16, loop}, {24, unrolled_loop}, {1024,  
rep_prefix_8_byte},

{-1, libcall},

 /* stringop_algs for memset.  */
-  {{{libcall, {{256, rep_prefix_4_byte}}}, /* Known alignment.  */
-{libcall, {{256, rep_prefix_8_byte,
-   {{libcall, {{256, rep_prefix_4_byte}}}, /* Unknown alignment.  */
-{libcall, {{256, rep_prefix_8_byte},
+  {{{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Known
alignment.  */
+{libcall, {{256, rep_prefix_8_byte}, {-1 libcall,
+   {{libcall, {{256, rep_prefix_4_byte}, {-1 libcall}}}, /* Unknown
alignment.  */
+{libcall, {{256, rep_prefix_8_byte}, {-1 libcall},
 1,/* scalar_stmt_cost.  */
 1,/* scalar load_cost.  */
 1,/* scalar_store_cost.  */



It looks reasonable.


bootstrap completed on i686-darwin9, so I've applied the following as  
requested,

Iain

gcc:

2011-11-14  Jan Hubicka  j...@suse.cz

* config/i386/i386.c (core cost model): Correct pasto.

 ndex: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 181364)
+++ gcc/config/i386/i386.c  (working copy)
@@ -1877,10 +1877,10 @@ struct processor_costs core_cost = {
 {libcall, {{16, loop}, {24, unrolled_loop}, {1024,  
rep_prefix_8_byte}, {-1, libcall},


   /* stringop_algs for memset.  */
-  {{{libcall, {{256, rep_prefix_4_byte}}}, /* Known alignment.  */
-{libcall, {{256, rep_prefix_8_byte,
-   {{libcall, {{256, rep_prefix_4_byte}}}, /* Unknown alignment.  */
-{libcall, {{256, rep_prefix_8_byte},
+  {{{libcall, {{256, rep_prefix_4_byte}, {-1, libcall}}}, /* Known  
alignment.  */

+{libcall, {{256, rep_prefix_8_byte}, {-1, libcall,
+   {{libcall, {{256, rep_prefix_4_byte}, {-1, libcall}}}, /* Unknown  
alignment.  */

+{libcall, {{256, rep_prefix_8_byte}, {-1, libcall},
   1,   /* scalar_stmt_cost.  */
   1,   /* scalar load_cost.  */
   1,   /* scalar_store_cost.  */




Re: Memset/memcpy patch

2011-11-14 Thread Jan Hubicka
 bootstrap completed on i686-darwin9, so I've applied the following as  
 requested,
Thank you and my apologizes for the breakage!
Honza
 Iain

 gcc:

 2011-11-14  Jan Hubicka  j...@suse.cz

   * config/i386/i386.c (core cost model): Correct pasto.

  ndex: gcc/config/i386/i386.c
 ===
 --- gcc/config/i386/i386.c(revision 181364)
 +++ gcc/config/i386/i386.c(working copy)
 @@ -1877,10 +1877,10 @@ struct processor_costs core_cost = {
  {libcall, {{16, loop}, {24, unrolled_loop}, {1024,  
 rep_prefix_8_byte}, {-1, libcall},

/* stringop_algs for memset.  */
 -  {{{libcall, {{256, rep_prefix_4_byte}}}, /* Known alignment.  */
 -{libcall, {{256, rep_prefix_8_byte,
 -   {{libcall, {{256, rep_prefix_4_byte}}}, /* Unknown alignment.  */
 -{libcall, {{256, rep_prefix_8_byte},
 +  {{{libcall, {{256, rep_prefix_4_byte}, {-1, libcall}}}, /* Known  
 alignment.  */
 +{libcall, {{256, rep_prefix_8_byte}, {-1, libcall,
 +   {{libcall, {{256, rep_prefix_4_byte}, {-1, libcall}}}, /* Unknown  
 alignment.  */
 +{libcall, {{256, rep_prefix_8_byte}, {-1, libcall},
1, /* scalar_stmt_cost.  */
1, /* scalar load_cost.  */
1, /* scalar_store_cost.  */



Re: Memset/memcpy patch

2011-11-14 Thread H.J. Lu
On Mon, Nov 14, 2011 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this is hopefully final variant of patch. The epilogue code was broken in some
 scenarios for memset, but should work safely now.  I also fixed the tables for
 core/buldozer/amdfam10 chips.

 But before it can be comitted, we need to reoslve copyright assignment issues.
 You don't seem to be liested as having copyright assignment, does you company
 have one?  Otherwise, please try to get one soon.

 Honza

 2011-11-14  Zolotukhin Michael  michael.v.zolotuk...@gmail.com
            Jan Hubicka  j...@suse.cz

        * gcc.target/i386/sw-1.c: Force rep;movsb.

        * config/i386/i386.h (processor_costs): Add second dimension to
        stringop_algs array.
        * config/i386/i386.c (cost models): Initialize second dimension of
        stringop_algs arrays.
        (core_cost): New costs based on generic64 costs with updated stringop
        values.
        (promote_duplicated_reg): Add support for vector modes, add
        declaration.
        (promote_duplicated_reg_to_size): Likewise.
        (processor_target): Set core costs for core variants.
        (expand_set_or_movmem_via_loop_with_iter): New function.
        (expand_set_or_movmem_via_loop): Enable reuse of the same iters in
        different loops, produced by this function.
        (emit_strset): New function.
        (expand_movmem_epilogue): Add epilogue generation for bigger sizes,
        use SSE-moves where possible.
        (expand_setmem_epilogue): Likewise.
        (expand_movmem_prologue): Likewise for prologue.
        (expand_setmem_prologue): Likewise.
        (expand_constant_movmem_prologue): Likewise.
        (expand_constant_setmem_prologue): Likewise.
        (decide_alg): Add new argument align_unknown.  Fix algorithm of
        strategy selection if TARGET_INLINE_ALL_STRINGOPS is set; Skip sse_loop
        (decide_alignment): Update desired alignment according to chosen move
        mode.
        (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves.
        (ix86_expand_setmem): Likewise.
        (ix86_slow_unaligned_access): Implementation of new hook
        slow_unaligned_access.
        * config/i386/i386.md (strset): Enable half-SSE moves.
        * config/i386/sse.md (vec_dupv4si): Add expand for vec_dupv4si.
        (vec_dupv2di): Add expand for vec_dupv2di.

This may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134

-- 
H.J.