On 05.08.22 15:01, Jason A. Donenfeld wrote: > Hi David, > > On Fri, Aug 05, 2022 at 01:28:18PM +0200, David Hildenbrand wrote: >> On 03.08.22 19:15, Jason A. Donenfeld wrote: >>> In order to fully support MSA_EXT_5, we have to also support the SHA-512 >>> special instructions. So implement those. >>> >>> The implementation began as something TweetNacl-like, and then was >>> adjusted to be useful here. It's not very beautiful, but it is quite >>> short and compact, which is what we're going for. >>> >> >> NIT: we could think about reversing the order of patches. IIRC, patch #1 >> itself would trigger a warning when starting QEMU. Having this patch >> first make sense logically. > > Good idea. Will do. > >>> +static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t >>> parameter_block, >>> + uint64_t *message_reg, uint64_t *len_reg, uint8_t >>> *stack_buffer) >>> +{ >>> + enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep >>> interactivity. */ >> >> I'd just use a #define outside of the function for that. > > Why? What does leaking this into file-level scope do? >
I'd say common coding practice in QEMU, but I might be wrong ;) >> >>> + uint64_t z[8], b[8], a[8], w[16], t; >>> + uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, >>> processed = 0; >>> + int i, j, reg_len = 64, blocks = 0, cc = 0; >>> + >>> + if (!(env->psw.mask & PSW_MASK_64)) { >>> + len = (uint32_t)len; >>> + reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24; >>> + } >> [...] > >>> + for (i = 0; i < 8; ++i) { >>> + cpu_stq_be_data_ra(env, wrap_address(env, parameter_block + 8 * >>> i), z[i], ra); >> >> I wonder what happens if we get an exception somewhere in the middle >> here ... fortunately we can only involve 2 pages. > > If this fails, then message_reg and len_reg won't be updated, so it will > have to start over. If it fails part way through, though, then things > are inconsistent. I don't think we want to hassle with trying to restore > the previous state or something insane though. That seems a bit much. Okay, but there could be scenarios where we mess up? > >>> + cc = kimd_sha512(env, ra, parameter_block, message_reg, len_reg, NULL); >>> + if (cc) { >>> + return cc; >>> + } >> >> Doesn't kimd_sha512() update the length register? And if we return with >> cc=3, we'd be in trouble, no? > > cc=3 means partial completion. In that case, klmd also returns with a > partial completion. That's good and expected! It means that the next > time it's called, it'll keep going where it left off. > > I've actually tried this with the Linux implementation, and it works as > expected. > >> One idea could be to simply only process one block at a time. Read all >> inputs first for that block and handle it completely without any >> register modifications. Perform all memory writes in a single call. > > That *is* what already happens. Actually, the memory writes only ever > happen at the very end of kimd_sha512. > >> Further, I wonder if we should factor out the core of kimd_sha512() to >> only work on temp buffers without any loading/storing of memory, and let >> only kimd_sha512/klmd_sha512 perform all loading/storing. Then it's much >> cleaner who modifies what. > > That's not necessary and will complicate things ultimately. See the > above; this is already working as expected. I'll have a closer look and see if I might improve it in the upcomming weeks. I'll be on vacation for ~1.5 weeks. And as history has shown, I need some days afterwards to dig through my overflowing mailbox :) -- Thanks, David / dhildenb