Re: Missing gen_sse2_cvtdq2p in convert splitter?
On Wed, Oct 8, 2008 at 8:29 AM, H.J. Lu <[EMAIL PROTECTED]> wrote: > I386.md has > > (define_split > [(set (match_operand:MODEF 0 "register_operand" "") >(float:MODEF (match_operand:SI 1 "register_operand" "")))] > "TARGET_SSE2 && TARGET_SSE_MATH > && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) > && reload_completed > && (SSE_REG_P (operands[0]) > || (GET_CODE (operands[0]) == SUBREG > && SSE_REG_P (operands[0])))" > [(const_int 0)] > { > rtx op1 = operands[1]; > > operands[3] = simplify_gen_subreg (mode, operands[0], > mode, 0); > if (GET_CODE (op1) == SUBREG) >op1 = SUBREG_REG (op1); > > if (GENERAL_REG_P (op1) && TARGET_INTER_UNIT_MOVES) >{ > operands[4] = simplify_gen_subreg (V4SImode, operands[0], mode, 0); > emit_insn (gen_sse2_loadld (operands[4], > CONST0_RTX (V4SImode), operands[1])); >} > /* We can ignore possible trapping value in the > high part of SSE register for non-trapping math. */ > else if (SSE_REG_P (op1) && !flag_trapping_math) >operands[4] = simplify_gen_subreg (V4SImode, operands[1], SImode, 0); > else >gcc_unreachable (); > }) > > Aren't > > emit_insn >(gen_sse2_cvtdq2p (operands[3], operands[4])); > DONE; > > missing at the end? Uh, yes. The patch is pre-approved as obvious. Thanks, Uros.
Missing gen_sse2_cvtdq2p in convert splitter?
Hi, I386.md has (define_split [(set (match_operand:MODEF 0 "register_operand" "") (float:MODEF (match_operand:SI 1 "register_operand" "")))] "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun) && reload_completed && (SSE_REG_P (operands[0]) || (GET_CODE (operands[0]) == SUBREG && SSE_REG_P (operands[0])))" [(const_int 0)] { rtx op1 = operands[1]; operands[3] = simplify_gen_subreg (mode, operands[0], mode, 0); if (GET_CODE (op1) == SUBREG) op1 = SUBREG_REG (op1); if (GENERAL_REG_P (op1) && TARGET_INTER_UNIT_MOVES) { operands[4] = simplify_gen_subreg (V4SImode, operands[0], mode, 0); emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode), operands[1])); } /* We can ignore possible trapping value in the high part of SSE register for non-trapping math. */ else if (SSE_REG_P (op1) && !flag_trapping_math) operands[4] = simplify_gen_subreg (V4SImode, operands[1], SImode, 0); else gcc_unreachable (); }) Aren't emit_insn (gen_sse2_cvtdq2p (operands[3], operands[4])); DONE; missing at the end? -- H.J.
register class constraints question
I've got this code: (define_insn "andhi3_24" [(set (match_operand:HI 0 "mra_operand" "=Sd,Sd,*Rhl,*Rhl,RhiSd,??Rmm,RhiSd,??Rmm") (and:HI (match_operand:HI 1 "mra_operand" "%0,0,*0,*0,0,0,0,0") (match_operand:HI 2 "mrai_operand" "Imb,Imw,*Imb,*Imw,iRhiSd,?Rmm,?Rmm,iRhiSd")))] "TARGET_A24" "@ bclr\t%B2,%0 bclr\t%B2-8,1+%0 bclr\t%B2,%h0 bclr\t%B2-8,%H0 and.w\t%X2,%0 and.w\t%X2,%0 and.w\t%X2,%0 and.w\t%X2,%0" [(set_attr "flags" "n,n,n,n,sz,sz,sz,sz")] ) Originally, the '*' constraints were missing. It failed: /greed/dj/ges/gnupro/head/gnupro/gcc/testsuite/gcc.c-torture/execute/pr17133.c: In function 'pure_alloc': /greed/dj/ges/gnupro/head/gnupro/gcc/testsuite/gcc.c-torture/execute/pr17133.c:19: error: unable to find a register to spill in class 'HL_REGS' /greed/dj/ges/gnupro/head/gnupro/gcc/testsuite/gcc.c-torture/execute/pr17133.c:19: error: this is the insn: (insn 31 30 32 6 /greed/dj/ges/gnupro/head/gnupro/gcc/testsuite/gcc.c-torture/execute/pr17133.c:13 (set (reg:HI 0 r0 [41]) (and:HI (subreg:HI (reg/f:PSI 5 a1 [orig:29 bar.0 ] [29]) 0) (const_int -2 [0xfffe]))) 26 {andhi3_24} (expr_list:REG_DEAD (reg/f:PSI 5 a1 [orig:29 bar.0 ] [29]) (nil))) /greed/dj/ges/gnupro/head/gnupro/gcc/testsuite/gcc.c-torture/execute/pr17133.c:19: internal compiler error: in spill_failure, at reload1.c:2093 I added the '*' constraints to keep it from using HL_REGS class (HL includes R0 and R1, HI includes R0 through R3) but it seems to be ignoring them. If I remove those alternatives completely, the code compiles properly. How can I get register allocation to use HI_REGS as the allocation class? I still want the bclr opcodes to be used *if* the constraints hold. I just don't want them to limit register choices. What am I missing?
Re: cpp found limits.h in FIXED_INCLUDE_DIR, but not in STANDARD_INCLUDE_DIR
Hi, all, I think I've found the reason. It all comes from this gentoo patch: http://sources.gentoo.org/viewcvs.py/gentoo-x86/sys-devel/gcc/files/4.1.0/gcc-4.1.0-cross-compile.patch?rev=1.1&view=markup Specifically: -elif test "x$TARGET_SYSTEM_ROOT" != x; then +elif test "x$TARGET_SYSTEM_ROOT" != x -o $build != $host; then SYSTEM_HEADER_DIR=$build_system_header_dir fi BTW, I haven't got time to learn more about that debate, but I will do it. Since my build != host, so SYSTEM_HEADER_DIR=$build_system_header_dir, which is in turn CROSS_SYSTEM_HEADER_DIR. So this test will fail LIMITS_H_TEST = [ -f $(SYSTEM_HEADER_DIR)/limits.h ] And then: if $(LIMITS_H_TEST) ; then \ cat $(srcdir)/limitx.h $(srcdir)/glimits.h $(srcdir)/limity.h > tmp-xlimits.h; \ else \ cat $(srcdir)/glimits.h > tmp-xlimits.h; \ fi; \ And the solution is easy, just turn on 'vanilla' USE flag in Gentoo. Sorry for the noise. Zhang Le
Re: [PATCH]: bump minimum MPFR version, (includes some fortranbits)
On Mon, Oct 06, 2008 at 04:10:04PM -0700, Kaveh R. Ghazi wrote: > From: "Adrian Bunk" <[EMAIL PROTECTED]> > >> On Sat, Oct 04, 2008 at 09:33:48PM -0400, Kaveh R. GHAZI wrote: >>> Since we're in stage3, I'm raising the issue of the MPFR version we >>> require for GCC, just as in last year's stage3 for gcc-4.3: >>> http://gcc.gnu.org/ml/gcc/2007-12/msg00298.html >>> >>> I'd like to increase the "minimum" MPFR version to 2.3.0, (which has been >>> released since Aug 2007). The "recommended" version of MPFR can be >>> bumped >>> to the latest which is 2.3.2. >>> ... >> >> Considering that your patch removes the conditionals on MPFR versions >> from the code (good!), is there any reason for gcc to keep this unusual >> minimum/recommended split in the requirement? >> >> Either 2.3.0 is good enough, or 2.3.2 contains some critical fix >> and should be the minimum version. > > The last time this came up, the consensus was that we should not hard > fail the configure script even if the user would then be missing some > mpfr bugfix in the latest/greatest release. That's why we have the > minimum/recommended split. I see the point for the 2.2.1/2.3.0 versions since 2.3.0 introduced additional functionality gcc can use. > But I see no reason not to encourage people and/or make them aware of the > need to upgrade if they are so inclined. Whether a particular fix is > "critical" can be in the eye of the beholder. But is there any "need to upgrade" to 2.3.2 since it would fix a bug gcc ran into? IMHO it's not "in the eye of the beholder" whether 2.3.2 contains a "critical" fix _for usage by gcc_. >--Kaveh cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
Re: Status of the DLX backend for GCC?
On Sat, 2008-10-04 at 18:48 +0200, Gerald Pfeifer wrote: > Thanks for the background on this, Peter, and the background on this > site disappearing. > > The reason I asked was that we have that reference from our site to that > URL and I failed to find any replacement so far. The first two hits that > I get in Google actually are mails by you in the gcc archives. ;-) > > I guess we'll just have to remove that reference? I talked with Aaron Sawdey and he still had the tarballs which he has given me. Let me go through a build process with them to make sure they still build and then I'll post them somewhere you can link to. Peter
Re: Help with IA64 profiling bug - g++.dg/tree-prof/indir-call-prof.C
Steve Ellcey <[EMAIL PROTECTED]> writes: > This is about as far as I have gotten. I am not sure why there is this > difference or how to fix it. I *think* it may be related to the fact > that IA64 GCC defines TARGET_VTABLE_USES_DESCRIPTORS but my only reason > for thinking that is that IA64 is the only platform that defines this > macro and I think that the profiler must be getting callee addresses out > of the vtable (though I am not sure about that and I don't know where it > would be doing it from). I think to make that work tree_gen_ic_profiler and tree_gen_ic_func_profiler would have to dereference the function descriptor to extract the code address, which would then have the necessary uniqueness which the vtable function descriptor lacks. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Help with IA64 profiling bug - g++.dg/tree-prof/indir-call-prof.C
Steve Ellcey <[EMAIL PROTECTED]> writes: > Comparing x86 (where things work) and IA64 (where they do not), I see > the test case, when compiled with -fprofile-generate, has calls > __gcov_indirect_call_profiler in both cases. But on IA64, cur_func is > never equal to callee_func That's because cur_func points to the function address, but callee_func to the function descriptor. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: A question regarding recognition of nop
Revital1 Eres <[EMAIL PROTECTED]> writes: > Is there a general way to recognize a nop insn in RTL (using attributes?), > or should I add a target hook for that? > > For example, I would like to recognize the following spu insn as a nop:?), > > (insn 555 210 203 11 (unspec_volatile [ > (const_int 0 [0x0]) > ] 14) 393 {lnop} (nil)) There is no general way to recognize a nop as such. There are various ways to recognize an insn which does nothing; such insns are normally removed (there is still some code which checks and preserves set_noop_p, but I have a feeling that code is now obsolete). By gcc's standards your insn is not a nop, because it is volatile. Ian
Re: Autovectorizing does not work with classes
Georg Martius wrote: > Dear gcc developers, > > I am new to this list. > I tried to use the auto-vectorization (4.2.1 (SUSE Linux)) but unfortunately > with limited success. > My code is bassically a matrix library in C++. The vectorizer does not like > the member variables. Consider this code compiled with > gcc -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5 > -funsafe-math-optimizations > that gives basically "not vectorized: unhandled data-ref" > > class P{ > public: > P() : m(5),n(3) { > double *d = data; > for (int i=0; i d[i] = i/10.2; > } > void test(const double& sum); > private: > int m; > int n; > double data[15]; > }; > > void P::test(const double& sum) { > double *d = this->data; > for(int i=0; i d[i]+=sum; > } > } > > whereas the more or less equivalent C version works just fine: > > int m=5; > int n=3; > double data[15]; > > void test(const double& sum) { > int mn = m*n; > for(int i=0; i data[i]+=sum; > } > } > > > Is there a fundamental problem in using the vectorizer in C++? > I don't see any C code above. As another reply indicated, the most likely C idiom would be to pass sum by value. Alternatively, you could use a local copy of sum, in cases where that is a problem. The only fundamental vectorization problem I can think of which is specific to C++ is the lack of a standard restrict keyword. In g++, __restrict__ is available. A local copy (or value parameter) of sum avoids a need for the compiler to recognize const or restrict as an assurance of no value modification. The loop has to have known fixed bounds at entry, in order to vectorize. If your C++ style doesn't support that, e.g. by calculating the end value outside the loop, as you show in your latter version, then you do have a problem with vectorization.
Issue in building the libgcc-Os-4-200.a library for SH target
Hi, We have built a cross compiled toolchain for SH target using the following sources, gcc-4.3.1 [released], newlib-1.16.0 [released] binutils-2.18.50 [snapshot dated 30th July 2008], We have experienced following error, when building a C++ application using a toolchain built with the above mentioned sources, / "sh-elf-ld.exe: sh-elf\lib\gcc\sh-elf\4.3.1\ml\m2\libgcc-Os-4-200.a (unwind-dw2-Os-4-200.o): compiled for a big endian system and target is little endian sh-elf-ld.exe: \sh-elf\lib\gcc\sh-elf\4.3.1\ml\m2\libgcc-Os-4-200.a (unwind-dw2-Os-4-200.o): uses instructions which are incompatible with instructions used in previous modules sh-elf-ld.exe: failed to merge target specific data of file sh-elf\lib\gcc\sh-elf\4.3.1\ml\m2\libgcc-Os-4-200.a(unwind-dw2-Os-4-200. o)" / The libgcc-Os-4-200.a archive built for SH target consists of the following object files, udivsi3_i4i-Os-4-200.o sdivsi3_i4i-Os-4-200.o unwind-dw2-Os-4-200.o It has been observed that the object file "unwind-dw2-Os-4-200.o" gets built for big endian instead of little endian target.Whereas the other two object files, "udivsi3_i4i-Os-4-200.o" and "sdivsi3_i4i-Os-4-200.o" from the same archive are successfully built for little endian target of SH. The libraries built for little endian SH-2/SH-3 target series resides at the following path, sh-elf/lib/gcc/sh-elf/4.3.1/ml/m2 We have also observed that the target specific options such as ml, m2, ml m2 etc are not passed to the compiler while building the "unwind-dw2-Os-4-200" object. It appears that, somewhere in the GCC makefiles, the required options have been missed while building the "unwind-dw2-Os-4-200.o" component. Has anyone faced a similar problem? Any possible workaround? Regards, Cecilia Rodrigues KPIT Cummins Infosystems Ltd. Pune, India
Re: Autovectorizing does not work with classes
[EMAIL PROTECTED] wrote on 07/10/2008 10:48:29: > Dear gcc developers, > > I am new to this list. > I tried to use the auto-vectorization (4.2.1 (SUSE Linux)) but unfortunately > with limited success. > My code is bassically a matrix library in C++. The vectorizer does not like > the member variables. Consider this code compiled with > gcc -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5 -funsafe- > math-optimizations > that gives basically "not vectorized: unhandled data-ref" The unhandled data-ref here is sum. It is invariant in the loop, and invariant data-refs are currently unsupported by the data dependence analysis. If you can change your code to pass sum by value, it will get vectorized (at least with gcc 4.3). This is not C++ specific problem (for me your C version does not get vectorized either because of the same reason). HTH, Ira, > > class P{ > public: > P() : m(5),n(3) { > double *d = data; > for (int i=0; i d[i] = i/10.2; > } > void test(const double& sum); > private: > int m; > int n; > double data[15]; > }; > > void P::test(const double& sum) { > double *d = this->data; > for(int i=0; i d[i]+=sum; > } > } > > whereas the more or less equivalent C version works just fine: > > int m=5; > int n=3; > double data[15]; > > void test(const double& sum) { > int mn = m*n; > for(int i=0; i data[i]+=sum; > } > } > > > Is there a fundamental problem in using the vectorizer in C++? > > Regards! >Georg > [attachment "signature.asc" deleted by Ira Rosen/Haifa/IBM]
A question regarding recognition of nop
Hello, Is there a general way to recognize a nop insn in RTL (using attributes?), or should I add a target hook for that? For example, I would like to recognize the following spu insn as a nop:?), (insn 555 210 203 11 (unspec_volatile [ (const_int 0 [0x0]) ] 14) 393 {lnop} (nil)) Thanks, Revital
Autovectorizing does not work with classes
Dear gcc developers, I am new to this list. I tried to use the auto-vectorization (4.2.1 (SUSE Linux)) but unfortunately with limited success. My code is bassically a matrix library in C++. The vectorizer does not like the member variables. Consider this code compiled with gcc -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5 -funsafe-math-optimizations that gives basically "not vectorized: unhandled data-ref" class P{ public: P() : m(5),n(3) { double *d = data; for (int i=0; idata; for(int i=0; i whereas the more or less equivalent C version works just fine: int m=5; int n=3; double data[15]; void test(const double& sum) { int mn = m*n; for(int i=0; i Is there a fundamental problem in using the vectorizer in C++? Regards! Georg signature.asc Description: This is a digitally signed message part.
Re: [PATCH]: bump minimum MPFR version, (includes some fortran bits)
On Tue, Oct 7, 2008 at 2:15 AM, Kaveh R. Ghazi <[EMAIL PROTECTED]> wrote: > From: "Richard Guenther" <[EMAIL PROTECTED]> > >> On Sun, Oct 5, 2008 at 3:33 AM, Kaveh R. GHAZI <[EMAIL PROTECTED]> >> wrote: >>> >>> Okay for mainline? >> >> Ok if there are no objections within the week. >> >> Thanks, >> Richard. > > Great, thanks. Can I get an explicit ack from a fortran maintainer as well? Ok. -- Janne Blomqvist
Re: Help with IA64 profiling bug - g++.dg/tree-prof/indir-call-prof.C
On Tue, Oct 7, 2008 at 1:18 AM, Steve Ellcey <[EMAIL PROTECTED]> wrote: > I have been looking at why g++.dg/tree-prof/indir-call-prof.C fails on > IA64 (HP-UX and Linux). It looks like the optimization (turning an > indirect call into a direct call) does not happen because the initial > run with -fprofile-generate is not generating any count data about > indirect calls. > > Comparing x86 (where things work) and IA64 (where they do not), I see > the test case, when compiled with -fprofile-generate, has calls > __gcov_indirect_call_profiler in both cases. But on IA64, cur_func is > never equal to callee_func and so __gcov_one_value_profiler_body is > never called. On x86 we do have cur_func equal to callee_func and so > __gcov_one_value_profiler_body is called to write out profile > information. > > This is about as far as I have gotten. I am not sure why there is this > difference or how to fix it. I *think* it may be related to the fact > that IA64 GCC defines TARGET_VTABLE_USES_DESCRIPTORS but my only reason > for thinking that is that IA64 is the only platform that defines this > macro and I think that the profiler must be getting callee addresses out > of the vtable (though I am not sure about that and I don't know where it > would be doing it from). > > So this is a request to anyone who might know the profiling code to help > me with some advise about what I should look at next or about how to go > about fixing this bug. If these testcases never worked on IA64 I suggest you XFAIL them for IA64 and file a missed-optimization bugreport. Richard. > Steve Ellcey > [EMAIL PROTECTED] >