------- Additional Comments From stuart at apple dot com  2004-11-16 19:39 
-------
Here is the body of an email I sent to Jan Hubicka concerning this bug.  In the 
body of the message, 
'you' refers to Jan.
--------------------------------------------------------------
For discussion, here is the pattern in question as it exists on the FSF 
mainline today:

  1503  ;; Situation is quite tricky about when to choose full sized (SImode) 
move
  1504  ;; over QImode moves.  For Q_REG -> Q_REG move we use full size only for
  1505  ;; partial register dependency machines (such as AMD Athlon), where 
QImode
  1506  ;; moves issue extra dependency and for partial register stalls machines
  1507  ;; that don't use QImode patterns (and QImode move cause stall on the 
next
  1508  ;; instruction).
  1509  ;;
  1510  ;; For loads of Q_REG to NONQ_REG we use full sized moves except for 
partial
  1511  ;; register stall machines with, where we use QImode instructions, since
  1512  ;; partial register stall can be caused there.  Then we use movzx.
  1513  (define_insn "*movqi_1"
  1514    [(set (match_operand:QI 0 "nonimmediate_operand" "=q,q ,q ,r,r ,?r,m")
  1515          (match_operand:QI 1 "general_operand"      " 
q,qn,qm,q,rn,qm,qn"))]
  1516    "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
  1517  {
  1518    switch (get_attr_type (insn))
  1519      {
  1520      case TYPE_IMOVX:
  1521        if (!ANY_QI_REG_P (operands[1]) && GET_CODE (operands[1]) != MEM)
  1522          abort ();
  1523        return "movz{bl|x}\t{%1, %k0|%k0, %1}";
  1524      default:
  1525        if (get_attr_mode (insn) == MODE_SI)
  1526          return "mov{l}\t{%k1, %k0|%k0, %k1}";
  1527        else
  1528          return "mov{b}\t{%1, %0|%0, %1}";
  1529      }
  1530  }
  1531    [(set (attr "type")
  1532       (cond [(ne (symbol_ref "optimize_size") (const_int 0))
  1533                (const_string "imov")
  1534              (and (eq_attr "alternative" "3")
  1535                   (ior (eq (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1536                            (const_int 0))
  1537                        (eq (symbol_ref "TARGET_QIMODE_MATH")
  1538                            (const_int 0))))
  1539                (const_string "imov")
  1540              (eq_attr "alternative" "3,5")
  1541                (const_string "imovx")
  1542              (and (ne (symbol_ref "TARGET_MOVX")
  1543                       (const_int 0))
  1544                   (eq_attr "alternative" "2"))
  1545                (const_string "imovx")
  1546             ]
  1547             (const_string "imov")))
  1548     (set (attr "mode")
  1549        (cond [(eq_attr "alternative" "3,4,5")
  1550                 (const_string "SI")
  1551               (eq_attr "alternative" "6")
  1552                 (const_string "QI")
  1553               (eq_attr "type" "imovx")
  1554                 (const_string "SI")
  1555               (and (eq_attr "type" "imov")
  1556                    (and (eq_attr "alternative" "0,1,2")
  1557                         (ne (symbol_ref "TARGET_PARTIAL_REG_DEPENDENCY")
  1558                             (const_int 0))))
  1559                 (const_string "SI")
  1560               ;; Avoid partial register stalls when not using QImode 
arithmetic
  1561               (and (eq_attr "type" "imov")
  1562                    (and (eq_attr "alternative" "0,1,2")
  1563                         (and (ne (symbol_ref "TARGET_PARTIAL_REG_STALL")
  1564                                  (const_int 0))
  1565                              (eq (symbol_ref "TARGET_QIMODE_MATH")
  1566                                  (const_int 0)))))
  1567                 (const_string "SI")
  1568             ]
  1569             (const_string "QI")))])

Roger added lines 1532-1533 in January of this year.  It looks like you added 
lines 1555-1567 in 2000.

The combination of lines 1532-1533 (use "imov" if -Os) and lines 1555-1559 (use 
SImode if "imov" and 
byte-load and K8/P4/Nocona) means we generate a "movl" that should be a "movb". 
 (The testcase is 
strcpy(); see the Bugzilla.)  For the following discussion, note that GCC 
currently matches "movqi_1" 
alternative #2 ("q" and "qm" in the attribute list) on the critical 
byte-fetch-from-memory in the strcpy() 
testcase.

It appears to me that the 1555-1559 clause depends upon any CPU with 
TARGET_MOVX always 
choosing "movx|movbl" over "imove".  If some not-yet-existing CPU had 
TARGET_PARTIAL_REG_DEPENDENCY but did /not/ have TARGET_MOVX support, I think 
this would 
generate a 'movl' where a 'movb' is required.  (Yes, I agree no such CPU 
exists, nor is one likely to be 
built.)  Roger's patch made it choose "imove" because it's a smaller 
instruction than imovx (one byte 
versus two, plus whatever additional mod/r/m glop).

Furthermore, as Roger pointed out in the Bugzilla, if we've chosen to use 
imovx, what do we gain by 
marking it with SImode?  It appears to generate the same "movx/movzbl" 
instruction either way.  (I am 
really confused by this.)

I can think of several potential fixes.  All are suspect, because I'm not 
convinced I fully understand 
what's going on:

a) revert Rogers patch (I'd prefer to keep it)
b) remove the ",2" from line 1556 (clause won't apply to byte loads)
c) remove lines 1555-1559 (allow byte-register/byte-register mov insns on 
P4-class CPUs? not clear to 
me)
d) mark the bug "behaves correctly"
e) some idea of Jan's that is much smarter than any of the above

If you immediately see how this should work, please make a suggestion, or even 
commit a fix.  I claim 
no expertise in this area.

As a side note, I'm curious about this usage of TARGET_QIMODE_MATH; CVS 
suggests that you made it 
synonymous with TRUE in March 2000.  In the subsequent April, you added lines 
1563-1566.  If 
TARGET_QIMODE_MATH is always true, will the clause on lines 1561-1567 ever be 
used?  Again, I just 
don't understand how this is supposed to work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18019

Reply via email to