On Sat, 2022-11-12 at 15:37 +0800, Lulu Cheng wrote:
> Co-Authored-By: xujiahao <xujia...@loongson.cn>
> 
> gcc/ChangeLog:
> 
>         * config/loongarch/loongarch-def.c: Initial number of parallel
> prefetch.
>         * config/loongarch/loongarch-tune.h (struct loongarch_cache):
>         Define number of parallel prefetch.
>         * config/loongarch/loongarch.cc
> (loongarch_option_override_internal):
>         Set up parameters to be used in prefetching algorithm.
>         (loongarch_prefetch_cookie): Select load or store based on the
> value of write.
>         * config/loongarch/loongarch.md (prefetch): New template.
>         (*prefetch_indexed_<mode>): New template.

Missing config/loongarch/constraints.md.

/* snip */

>  rtx
>  loongarch_prefetch_cookie (rtx write, rtx locality)
>  {
> -  /* store_streamed / load_streamed.  */
> -  if (INTVAL (locality) <= 0)
> -    return GEN_INT (INTVAL (write) + 4);
> +  if (INTVAL (locality) == 1 && INTVAL (write) == 0)
> +    return GEN_INT (INTVAL (write) + 2);

So __builtin_prefetch(ptr, 0, 1) will produce
"preld 2,$r4,0", while the document says

   hint has 32 optional values (0 to 31), 0 represents load to level 1
   Cache, and 8 represents store to level 1 Cache. The remaining hint
   values are not defined and are processed for nop instructions when the
   processor executes.
   
OTOH hint 2 is documented in preldx.  So does preld also support hint 2?

/* snip */


> +(define_insn "prefetch"
> +  [(prefetch (match_operand 0 "address_operand" "ZD,ZE")
> +            (match_operand 1 "const_int_operand" "n,n")
> +            (match_operand 2 "const_int_operand" "n,n"))]
> +  ""
> +{
> +  operands[1] = loongarch_prefetch_cookie (operands[1], operands[2]);
> +
> +  switch (which_alternative)
> +    {
> +    case 0:
> +      return "preld\t%1,%a0";
> +    case 1:
> +      return "preldx\t%1,%a0";

void prefetch(char *ptr, int off)
{
        return __builtin_prefetch(ptr + off);
}

It's compiled to "preldx 0,$r4,$r5".  I don't think it's correct because
according to the doc, rk should contains several bit-fields instead of
an offset.

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to