vsr implementation?

Stefan Brankovic Thu, 03 Oct 2019 04:12:31 -0700

Please take a look at the following patchhttps://lists.nongnu.org/archive/html/qemu-ppc/2019-10/msg00133.html andlet me know if problem is solved.


On 2.10.19. 16:08, Stefan Brankovic wrote:

Hi Mark,
Thank you for reporting this bug. I was away from office for couple ofdays, so that's why I am answering you a bit late, sorry about that. Iwill start working on a solution and try to fix this problem in nextcouple of days.
On 1.10.19. 20:24, Mark Cave-Ayland wrote:
On 28/09/2019 18:45, Aleksandar Markovic wrote:

Hi Aleksandar,

Thanks for taking a look at this!
Mark and Paul (and Stefan),
Thanks for spotting this and pinpointing the culprit commit. I guessStefan is goingto respond soon, but, in the meantime, I took a look at the commitin question:
https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
I don't have at the moment any dev/test environment handy, but I didmanualinspection of the code, and here is what I found (in order ofimportance, perceived
by me):
1. The code will not work correctly if the shift ammount (variable'sh') is 0. Thisis because, in that case, one of succeeding invocations of TCG shiftfunctions willbe required to shift a 64-bit TCG variable by 64 bits, and theresult of such TCG
operation is undefined (shift amount must be 63 or less) - see
https://github.com/qemu/qemu/blob/master/tcg/README.
Yes I think you're right here - the old helper got around this bydoing an explicitcopy from a to r if the shift value is zero. In fact the case thatPaul reported is
exactly this:

    vsl VRT, VRA, VRB

=> 0x100006e0 <vec_slq+132>: vsl v0,v0,v1
(gdb) p $vr0.uint128
$21 = 0x10111213141516172021222324252650
(gdb) p $vr1.uint128
$22 = 0x0
(gdb) stepi
0x00000000100006e4 in vec_slq ()
1: x/i $pc
=> 0x100006e4 <vec_slq+136>: xxlor vs0,vs32,vs32
(gdb) p $vr0.uint128
$23 = 0x10111213141516172021222324252650
I guess the solution is check for sh == 0 and if this is the case,execute a copy
instead.
I agree with you. This will be changed in upcoming patch.
2. Variable naming is better in the old helper than in the newtranslator. In thatlight, I would advise Stefan to change 'sh' to 'shift', and'shifted' to 'carry'.
It looks like the name "sh" comes from the ISA documentation, sowhilst it's a littletricky to compare with the previous implementation it does make sensewhen comparingwith the algorithm shown there. Note: this implementation also dropsthe check foreach byte of VRB having the same shift value - should we care aboutthis?
"sh" is taken from the ISA documentation, so I would leave that as itis now, but I can change some other variable names to be consistentwith previous implementation (e.g. "shifted" -> "carry").
I don't think that we should check each byte of VRB, because we careonly about "defined" behavior. If shift values doesn't match, resultis "undefined" so it doesn't matter what is inside resulting register.
3. Lines

tcg_gen_andi_i64(shifted, shifted, 0x7fULL);

and

tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
appear to be spurious (albait in a harmless way). Therefore, theyshould be deleted,
or, alternatevely, a justification for them should be provided.
I'm not sure why they are needed either - there's certainly nomention of it in the
ISA documentation. Stefan?
This will be removed in upcoming patch.
4. In the commit message, variable names were used without quotationmark, resulting
in puzzling and unclear wording.
5. (a question for Mark) After all recent changes, doesget_avr64(..., ..., true)always (for any endian configuration) return the "high" half of anAltivec register,
and get_avr64(..., ..., false) the "low" one?
Yes - the new functions always return the MSB (high) and LSB (low)correctly
regardless of host endian.
Given all these circumstances, perhaps the most reasonable solutionwould be torevert the commit in question, and allow Stefan enough dev and testtime to hopefully
submit a new, better, version later on.
Given that it has been broken for 3 months now, I don't think we'rein any major rushto revert ASAP. I'd prefer to give Stefan a bit more time first sincehe does report
some substantial speed improvements from these new implementations.


ATB,

Mark.
Best Regards,

Stefan

Re: target/ppc: bug in optimised vsl/vsr implementation?

Reply via email to