[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-30 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-12-30 
13:21 ---
.

-- 
   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-30 Thread cvs-commit at gcc dot gnu dot org

--- Additional Comments From cvs-commit at gcc dot gnu dot org  2004-12-30 
13:16 ---
Subject: Bug 18019

CVSROOT:/cvs/gcc
Module name:gcc
Changes by: [EMAIL PROTECTED]   2004-12-30 13:16:14

Modified files:
gcc: ChangeLog 
gcc/config/i386: i386.md 

Log message:
PR target/18019
* i386.md (movqi_1): Fix -Os instruction choice.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6985&r2=2.6986
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/i386/i386.md.diff?cvsroot=gcc&r1=1.594&r2=1.595



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-22 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-12-22 
12:11 ---
One of these days, someone will have to explain to me how such a small
patch for a bug marked "critical regression" can stay unreviewed for
three days...  This kind of patches should *not* need pinging.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-19 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-19 
19:58 ---
Patch here: .

-- 
   What|Removed |Added

   Keywords||patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-19 Thread jh at suse dot cz

--- Additional Comments From jh at suse dot cz  2004-12-19 16:16 ---
Subject: Re:  [4.0 Regression] -march=pentium4 generates word fetch instead of 
byte fetch

> 
> --- Additional Comments From steven at gcc dot gnu dot org  2004-12-18 
> 22:49 ---
> Honza, 
>  
> PING 

I am still backlogged but I will look into it now.  What about cleaning up 
tree-inline.c and other
mess?  That thing is hell to merge...

Honza
>  
>  
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
> 
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-18 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-12-18 
22:49 ---
Honza, 
 
PING 
 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-10 Thread neroden at gcc dot gnu dot org

--- Additional Comments From neroden at gcc dot gnu dot org  2004-12-10 
17:09 ---
Oh, cool.  :-)  Did you get the patch linked up? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-02 Thread jh at suse dot cz

--- Additional Comments From jh at suse dot cz  2004-12-02 09:02 ---
Subject: Re:  [4.0 Regression] -march=pentium4 generates word fetch instead of 
byte fetch

> 
> --- Additional Comments From neroden at gcc dot gnu dot org  2004-12-02 
> 03:35 ---
> Jan's message quoted in the previous comment seems to be orthogonal to the 
> main problem in this bug.  The problem is that a word fetch is being 
> generated 
> which *reads out of bounds memory*. 
>  
> If the last byte in a page is being fetched, you must not extend that to 
> fetch 
> the next byte; if the first byte in a page is being fetched, you must not 
> extend that to fetch the previous byte.  Those are the key failure situations 
> which must be prevented, regardless of whether this is -Os, -O2, or -O0. 
>  
> It doesn't appear to me that the changes proposed in Jan's message actually 
> address that, since none of them seem to feature a alignment or other type of 
> correctness check before converting a QImode move into a (possibly illegal) 
> SImode move. 

We don't need to check that since the code always use zero extension
conversion (unless it hit the buggy -Os conditional) and zero extension
still reads byte, just writes whole register.  It is not good idea to
use full sized read even for aligned sources since we will get partial
memory stalls.

We use full sized moves only for register to register moves where it is
safe.

I will prepare the patch later today.
Honza
>  
> In particular, this statement looks like it's wrong: 
> >while for 
> >TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use 
> >the full moves as long as they don't encode longer.  
>  
> I believe a check is required that the full moves don't cause a segmentation 
> violation.  An alignment check would be sufficient (it would prohibit more 
> cases than strictly necessary, but this is a correctness issue, and I haven't 
> thought of a better check). 
>  
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019
> 
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-01 Thread neroden at gcc dot gnu dot org

--- Additional Comments From neroden at gcc dot gnu dot org  2004-12-02 
03:35 ---
Jan's message quoted in the previous comment seems to be orthogonal to the 
main problem in this bug.  The problem is that a word fetch is being generated 
which *reads out of bounds memory*. 
 
If the last byte in a page is being fetched, you must not extend that to fetch 
the next byte; if the first byte in a page is being fetched, you must not 
extend that to fetch the previous byte.  Those are the key failure situations 
which must be prevented, regardless of whether this is -Os, -O2, or -O0. 
 
It doesn't appear to me that the changes proposed in Jan's message actually 
address that, since none of them seem to feature a alignment or other type of 
correctness check before converting a QImode move into a (possibly illegal) 
SImode move. 
 
In particular, this statement looks like it's wrong: 
>while for 
>TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use 
>the full moves as long as they don't encode longer.  
 
I believe a check is required that the full moves don't cause a segmentation 
violation.  An alignment check would be sufficient (it would prohibit more 
cases than strictly necessary, but this is a correctness issue, and I haven't 
thought of a better check). 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-12-01 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-12-02 01:07 
---
Jan emailed this to me privately.  Appended here for completeness.  - stuart

Just to clarify things a bit.  TARGET_MOVX and
TARGET_PARTIAL_REG_DEPENDENCY is not about supporting some feature but
about a way the CPU deals with dependencies on partial registers.  Some
CPUs (Athlon+,P4+) deal with partial register writes as read-modify
operation of the whole thing (TARGET_PARTIAL_REG_DEPENDENCT) and for
some of these it is profitable to do dummy zero extend (TARGET_MOVX)
instead of loads to avoid the dependency, while others (K6, P3) give it
another internal name and don't see the false dependency
(TARGET_PARTIAL_REG_STALL). On the other hand they get penalty if the
result is used as a whole register.  There is unlikely to be ever CPU
spoiled up in both directions..

However the Roger's patch, as I understand it, is about avoiding movx as
it encodes longer on -Os.  It seems to me that for targets not defining
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we should always
produce the straighforward movq as expected, while for
TARGET_PARTIAL_REG_STALL/TARGET_PARTIAL_REG_DEPENDENCY we can still use
the full moves as long as they don't encode longer.  I can't check right
now, but i believe it is only the movl imm, register that comes out
longer and that is the alternative 2.

We can also probably kill the TARGET_QIMODE_MATH as it is no longer
used.

There is the type and mode argument not only to choose the particular
instruction but also to drive scheduling (K6 for instance has limited
supply of units that do 8bit operations).  imovx is 32bit operation so
it needs to get SImode.  I would simply break out the alternative 2 from
both conditionals and would additionally check optimize_size to be
nonzero

I can prepare patch later next week unless someone beats me :)

Honza

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-27 Thread neroden at gcc dot gnu dot org

--- Additional Comments From neroden at gcc dot gnu dot org  2004-11-27 
18:28 ---
I suggest excising the TARGET_PARTIAL clauses ASAP, and reinstating them only 
when alignment checking (and/or something else) is used in them to verify that 
the transform is safe.  That seems to be the most correct thing to do. 
If someone is up to writing the alignment checking right now, of course, 
that's great, but I'm not! 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-16 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-11-16 19:39 
---
Here is the body of an email I sent to Jan Hubicka concerning this bug.  In the 
body of the message, 
'you' refers to Jan.
--
For discussion, here is the pattern in question as it exists on the FSF 
mainline today:

  1503  ;; Situation is quite tricky about when to choose full sized (SImode) 
move
  1504  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
  1505  ;; partial register dependency machines (such as AMD Athlon), where 
QImode
  1506  ;; moves issue extra dependency and for partial register stalls machines
  1507  ;; that don't use QImode patterns (and QImode move cause stall on the 
next
  1508  ;; instruction).
  1509  ;;
  1510  ;; For loads of Q_REG to NONQ_REG we use full sized moves except for 
partial
  1511  ;; register stall machines with, where we use QImode instructions, since
  1512  ;; partial register stall can be caused there.  Then we use movzx.
  1513  (define_insn "*movqi_1"
  1514[(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
  1515  (match_operand:QI 1 "general_operand"  " 
q,qn,qm,q,rn,qm,qn"))]
  1516"GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
  1517  {
  1518switch (get_attr_type (insn))
  1519  {
  1520  case TYPE_IMOVX:
  1521if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
  1522  abort ();
  1523return "movz{bl|x}\t{%1, %k0|%k0, %1}";
  1524  default:
  1525if (get_attr_mode (insn) == MODE_SI)
  1526  return "mov{l}\t{%k1, %k0|%k0, %k1}";
  1527else
  1528  return "mov{b}\t{%1, %0|%0, %1}";
  1529  }
  1530  }
  1531[(set (attr "type")
  1532   (cond [(ne (symbol_ref "optimize_size") (const_int 0))
  1533(const_string "imov")
  1534  (and (eq_attr "alternative" "3")
  1535   (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1536(const_int 0))
  1537(eq (symbol_ref "TARGET_QIMODE_MATH")
  1538(const_int 0
  1539(const_string "imov")
  1540  (eq_attr "alternative" "3,5")
  1541(const_string "imovx")
  1542  (and (ne (symbol_ref "TARGET_MOVX")
  1543   (const_int 0))
  1544   (eq_attr "alternative" "2"))
  1545(const_string "imovx")
  1546 ]
  1547 (const_string "imov")))
  1548 (set (attr "mode")
  1549(cond [(eq_attr "alternative" "3,4,5")
  1550 (const_string "SI")
  1551   (eq_attr "alternative" "6")
  1552 (const_string "QI")
  1553   (eq_attr "type" "imovx")
  1554 (const_string "SI")
  1555   (and (eq_attr "type" "imov")
  1556(and (eq_attr "alternative" "0,1,2")
  1557 (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
  1558 (const_int 0
  1559 (const_string "SI")
  1560   ;; Avoid partial register stalls when not using QImode 
arithmetic
  1561   (and (eq_attr "type" "imov")
  1562(and (eq_attr "alternative" "0,1,2")
  1563 (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1564  (const_int 0))
  1565  (eq (symbol_ref "TARGET_QIMODE_MATH")
  1566  (const_int 0)
  1567 (const_string "SI")
  1568 ]
  1569 (const_string "QI")))])

Roger added lines 1532-1533 in January of this year.  It looks like you added 
lines 1555-1567 in 2000.

The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use 
SImode if "imov" and 
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". 
 (The testcase is 
strcpy(); see the Bugzilla.)  For the following discussion, note that GCC 
currently matches "movqi_1" 
alternative #2 ("q" and "qm" in the attribute list) on the critical 
byte-fetch-from-memory in the strcpy() 
testcase.

It appears to me that the 1555-1559 clause depends upon any CPU with 
TARGET_MOVX always 
choosing "movx|movbl" over "imove".  If some not-yet-existing CPU had 
TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think 
this would 
generate a 'movl' where a 'movb' is required.  (Yes, I agree no such CPU 
exists, nor is one likely to be 
built.)  Roger's patch made it choose "imove" because it's a smaller 
instruction than imovx (one byte 
versus two, plus whatever additional mod/r/m glop).

Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use 
imovx, what do we gain by 
marking it with SImode?  It appears to g

[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-09 Thread roger at eyesopen dot com

--- Additional Comments From roger at eyesopen dot com  2004-11-09 20:56 
---
I agree with Stuart :>

Although both of the patches he investigated will cure the problem, reverting
mine will generate a movzbl, whilst disabling the TARGET_PARTIAL_... bits will
generate a "movb".  As hinted in the original problem description, when
optimizing for size we ideally want the smaller "movb" (2 bytes vs. 3 bytes). 
But I still don't think disabling the TARGET_PARTIAL_... bits completely is
an ideal solution.  It would be better to take advantage of any alignment
information on the MEM to determine whether this transformation is safe, and
still benefit from this optimization when it is.

I'm also still not convinced that there aren't any non -Os cases where the
existing TARGET_PARTIAL_... code is incorrect without checking alignment.
Perhaps initially we should disable TARGET_PARTIAL_... only when optimizing
for size? I'm not convinced its perfect, but it would resolve this failure
and not hurt performance at -O2/-O3.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-09 Thread stuart at apple dot com

--- Additional Comments From stuart at apple dot com  2004-11-09 19:34 
---
I agree with Roger.

I'm suspicious of the "0,1,2 ... TARGET_PARTIAL_xx" clauses of the "*movqi_1" 
pattern.  (Also the 
analogous parts of "*movhi_1".)

I've tried reverting Roger's patch, and excising the TARGET_PARTIAL_xx clauses; 
either change appears 
to fix the problem.  I've also successfully regression-tested the excision.

I will invite Jan Hubicka, author of the TARGET_PARTIAL_xx clauses (i386.md, 
v1.503), to look at this.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-09 Thread roger at eyesopen dot com

--- Additional Comments From roger at eyesopen dot com  2004-11-09 18:10 
---
I believe that the bug is a latent problem that was only exposed by my patch.
The *movqi_1 (and *movhi_1) pattern(s) have two attributes "type" and "mode".
The mode attribute indicates the RTL machine mode the move should be done in
QImode or SImode, and the type attribute is either IMOVX or IMOV indicating with
or without extension respectively.  My patch tweaked the type attribute, such
that when optimizing for size IMOV is smaller on x86 than IMOVX.  The
independent underlying problem is that the mode is incorrectly set to SImode,
even if the alignment is unknown on partial register stall machines.

IMOVX  SImode   -> movzbl
IMOVX  QImode   -> movzbl
IMOV   SImode   -> movl
IMOV   QImode   -> movb

This issue was hidden prior to my patch as fortuituously even if the mode was
incorrectly SImode, using a IMOVX instruction would only read a single byte.

The issue appears to be when to use SImode and when QImode.  Clearly, partial
register stall machines get a performance benefit from using SImode, but it
looks like there's no attempt to check the alignment of the access when making
this optimization.

I'm happy to revert my patch, clearly a minor code size tweak is much less
important than a correctness issue.  But I'm trying to convince myself that
this isn't papering over the more fundamental problem in mode selection.

Any help/advice/opinions would be greatly appreciated.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-11-08 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-09 
04:50 ---
No this is not related to PR 17949 at all.

The behavior changed between 2004-01-18 and 2004-01-23.

Which means most likely this:
2004-01-19  Roger Sayle  <[EMAIL PROTECTED]>

* config/i386/i386.md (*movhi_1, *movqi_1): When optimizing for
size, don't use the larger zero-extending loads.

Roger this patch can cause bad code if done on a page boundary and the next 
page is protected.

-- 
   What|Removed |Added

 CC||roger at eyesopen dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-10-28 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-10-28 15:22 
---
Is this perhaps related to Bug 17949? 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-10-28 Thread steven at gcc dot gnu dot org

--- Additional Comments From steven at gcc dot gnu dot org  2004-10-28 15:19 
---
vrong quode 

-- 
   What|Removed |Added

   Severity|normal  |critical


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019


[Bug target/18019] [4.0 Regression] -march=pentium4 generates word fetch instead of byte fetch

2004-10-15 Thread pinskia at gcc dot gnu dot org

--- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-15 18:36 
---
Confirmed, something is wrong in the .md file.

-- 
   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed||1
   Keywords||wrong-code
   Last reconfirmed|-00-00 00:00:00 |2004-10-15 18:36:39
   date||
Summary|-march=pentium4 generates   |[4.0 Regression] -
   |word fetch instead of byte  |march=pentium4 generates
   |fetch   |word fetch instead of byte
   ||fetch
   Target Milestone|--- |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019