On 13/05/2021 08:31, Richard Henderson wrote:
On 5/12/21 1:54 PM, matheus.fe...@eldorado.org.br wrote:
+ while (i) {
+ n = ctz64(mask);
+ if (n > i) {
+ n = i;
+ }
+
+ m = (1ll << n) - 1;
+ if (bit) {
+ right = ror64(right | (src & m), n);
+ } else {
+ left = ror64(left | (src & m), n);
+ }
+
+ src >>= n;
+ mask >>= n;
+ i -= n;
+ bit = !bit;
+ mask = ~mask;
+ }
+
+ if (bit) {
+ n = ctpop64(mask);
+ } else {
+ n = 64 - ctpop64(mask);
+ }
+
+ return left | (right >> n);
+}
This doesn't correspond to the algorithm presented in the manual. Thus
this requires lots of extra commentary.
I guess I see how you're trying to process blocks at a time, instead of
single bits at a time. But I don't think the merging of data into
"right" and "left" looks right. I would have expected
right = (right << n) | (src & m);
and similarly for left.
It doesn't look like that the ctpop at the end is correct, given how
mask has been modified. I would have thought that
n = ctpop64(orig_mask);
return (left << n) | right;
would be the correct answer.
I could be wrong about the above, but that's what the missing commentary
should have helped me understand.
It sure worth more comments. Yes, the idea is to process in blocks, and
we negate the mask to avoid deciding between ctz and cto inside the
loop. We use rotate instead of shift so it don't change the number of
zeros and ones, and then we don't need orig_mask.
You'll find my test cases for cfuged and vcfuged on
https://github.com/PPC64/qemu/blob/ferst-tcg-cfuged/tests/tcg/ppc64le/ .
I got the same results by running them with this implementation and with
the Power10 Functional Simulator.
+static bool trans_CFUGED(DisasContext *ctx, arg_X *a)
+{
+ REQUIRE_64BIT(ctx);
+ REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+#if defined(TARGET_PPC64)
+ gen_helper_cfuged(cpu_gpr[a->ra], cpu_gpr[a->rt], cpu_gpr[a->rb]);
+#else
+ gen_invalid(ctx);
+#endif
+ return true;
+}
Given that this helper will also be used by vcfuged, there's no point in
hiding it in a TARGET_PPC64 block, and thus you can drop the ifdefs.
r~
If I remove it, the build for ppc will fail, because cpu_gpr is declared
as TCGv, and the helper uses i64 to match {get,set}_cpu_vsr{l,h}.
REQUIRE_64BIT makes the helper call unreachable for ppc, but it's a
runtime check. At build time, the compiler will check the types anyway,
and give us an error.
--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software Júnior
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>