The right longer term fix is suggested by Richard. For now you can probably override the peel parameter for your target (in the target option_override function).
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, 0, opts->x_param_values, opts_set->x_param_values); David On Fri, Nov 15, 2013 at 7:21 AM, Bingfeng Mei <b...@broadcom.com> wrote: > Hi, Richard, > Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop > peeling is also slower for our processors. > > By vectorization_cost, do you mean > TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST hook? > > In our case, it is easy to make decision. But generally, if peeling loop is > faster but bigger, what should be right balance? How to do with cases that > are a bit faster and a lot bigger? > > Thanks, > Bingfeng > -----Original Message----- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: 15 November 2013 14:02 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: Vectorization: Loop peeling with misaligned support. > > On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei <b...@broadcom.com> wrote: >> Hi, >> In loop vectorization, I found that vectorizer insists on loop peeling even >> our target supports misaligned memory access. This results in much bigger >> code size for a very simple loop. I defined >> TARGET_VECTORIZE_SUPPORT_VECTOR_MISALGINMENT and also >> TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST to make misaligned accesses >> almost as cheap as an aligned one. But the vectorizer still does peeling >> anyway. >> >> In vect_enhance_data_refs_alignment function, it seems that result of >> vect_supportable_dr_alignment is not used in decision of whether to do >> peeling. >> >> supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); >> do_peeling = vector_alignment_reachable_p (dr); >> >> Later on, there is code to compare load/store costs. But it only decides >> whether to do peeling for load or store, not whether to do peeling. >> >> Currently I have a workaround. For the following simple loop, the size is >> 80bytes vs. 352 bytes without patch (-O2 -ftree-vectorize gcc 4.8.3 20131114) > > What's the speed difference? > >> int A[100]; >> int B[100]; >> void foo2() { >> int i; >> for (i = 0; i < 100; ++i) >> A[i] = B[i] + 100; >> } >> >> What is the best way to tell vectorizer not to do peeling in such situation? > > Well, the vectorizer should compute the cost without peeling and then, > when the cost with peeling is not better then do not peel. That's > very easy to check with the vectorization_cost hook by comparing > vector_load / unaligned_load and vector_store / unaligned_store cost. > > Richard. > >> >> Thanks, >> Bingfeng Mei >> Broadcom UK >> > >