------- Additional Comments From tbptbp at gmail dot com 2005-01-30 08:07 ------- I'm sorry for providing such a poor testcase. Here's the kind of *48 sequence i'm seeing, k8 codegen; that's happening at a point where's there's quite some register pressure and it really doesn't help that another register is needed (gcc has to push/pop whereas icc doesn't).
402520: mov (%esi),%ecx 402522: mov $0x30,%eax 402527: mov 0x3c(%esp),%ebx 40252b: add $0x4,%esi 40252e: imul %eax,%ecx 402531: add 0x94(%esp),%ecx 402538: mov (%ecx),%edi After that there's a long string of vector operations where gcc computes all addresses upfront (with shifts) and then use a *1 scale factor; ICC only use *8 for that and is faster. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680