bcl5980 wrote:

> I think we should follow this principle: if a loop required to be unroll 
> later, we should not distroy the loop count info.

The ideal is right. But I think what nikic say is loop unroll should handle the 
case( upper bound unrolling). But It doesn't work. We need to find why loop 
unroll doesn't work (maybe in UnrollRuntimeLoopRemainder), then can check if it 
can do in loop unroll or stop the SimpilfyCfg's transform.

> > AMDGPU can not unorll this case:
> > https://godbolt.org/z/4Pq3bnzTT
> > But the same code in X86 looks can unroll:
> > https://godbolt.org/z/zr8aTG1KW
> > We may need to continue debug on it.
> 
> X86 do very conservative unroll too,its upper bound send to 4 (default is 8), 
> if we not fold the loop branch, it can fully unroll (16)
So where is the different X86 can partial unroll but AMDGPU can not unroll at 
all?

https://github.com/llvm/llvm-project/pull/74268
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to