Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()

Robin Murphy Thu, 05 Aug 2021 10:15:11 -0700

On 2021-08-05 16:16, John Garry wrote:

On 05/08/2021 15:41, Robin Murphy wrote:
I suppose they could be combined into a smaller sub-struct and loadedin a single operation, but it looks messy, and prob without much gain.
Indeed I wouldn't say that saving memory is the primary concern here,and any more convoluted code is hardly going to help performance. Plusit still wouldn't help the other cases where we're just copying thesize into a fake queue to do some prod arithmetic - I hadn't fullyclocked what was going on there when I skimmed through things earlier.
Disregarding the bogus layout change, though, do you reckon the restof my idea makes sense?
I tried the similar change to avoid zero-init the padding inarm_smmu_cmdq_write_entries() and the_arm_smmu_cmdq_poll_set_valid_map(), but the disassembly was the same.So the compiler must have got smart there.

Yeah, in my build __arm_smmu_cmdq_poll_set_valid_map() only uses 32bytes of stack, so clearly it's managed to see through the macro magiconce queue_inc_prod_n() is inlined and elide the whole struct.arm_smmu_cmdq_write_entries() is inlined already, but logically must bethe same deal since it's a similarly inlined queue_inc_prod_n().

However, that may all change if different compiler flags or a differentcompiler lead to different inlining decisions, so I'd argue that if thiscan matter anywhere then it's worth treating consistently everywhere.

But for the original change in this patch, it did make a difference.It's nice to remove what was a memcpy:
     1770: a9077eff stp xzr, xzr, [x23, #112]
}, head = llq;
     1774: 94000000 bl 0 <memcpy>

And performance was very fractionally better.


Heh, mine was this beauty:

        struct arm_smmu_ll_queue llq = {
    17d4:       a9017f7f        stp     xzr, xzr, [x27, #16]
    17d8:       a9027f7f        stp     xzr, xzr, [x27, #32]
    17dc:       a9037f7f        stp     xzr, xzr, [x27, #48]
    17e0:       a9047f7f        stp     xzr, xzr, [x27, #64]
        }, head = llq;
    17e4:       b900c340        str     w0, [x26, #192]
{
    17e8:       290d0be1        stp     w1, w2, [sp, #104]
        }, head = llq;
    17ec:       a9440f62        ldp     x2, x3, [x27, #64]
    17f0:       a9007f5f        stp     xzr, xzr, [x26]
    17f4:       a9017f5f        stp     xzr, xzr, [x26, #16]
    17f8:       a9027f5f        stp     xzr, xzr, [x26, #32]
    17fc:       a9037f5f        stp     xzr, xzr, [x26, #48]
    1800:       a9040f42        stp     x2, x3, [x26, #64]
        struct arm_smmu_ll_queue llq = {
    1804:       a9057f7f        stp     xzr, xzr, [x27, #80]
        }, head = llq;
    1808:       a9057f5f        stp     xzr, xzr, [x26, #80]
        struct arm_smmu_ll_queue llq = {
    180c:       a9067f7f        stp     xzr, xzr, [x27, #96]
        }, head = llq;
    1810:       a9067f5f        stp     xzr, xzr, [x26, #96]
        struct arm_smmu_ll_queue llq = {
    1814:       a9077f7f        stp     xzr, xzr, [x27, #112]
        }, head = llq;
    1818:       a9077f5f        stp     xzr, xzr, [x26, #112]
        struct arm_smmu_ll_queue llq = {
    181c:       a9087f5f        stp     xzr, xzr, [x26, #128]

As for pre-evaluating "nents", I'm not sure how much that can help, butI am not too optimistic. I can try some testing when I get a chance.Having said that, I would need to check the disassembly also.

It'll just turn MOV,LDR,LSL sequences into plain LDRs - a small savingbut with no real downside, and a third of it is in the place where doingless work matters most:


add/remove: 0/0 grow/shrink: 0/8 up/down: 0/-100 (-100)
Function                                     old     new   delta
arm_smmu_priq_thread                         532     528      -4
arm_smmu_evtq_thread                         368     364      -4
arm_smmu_device_probe                       4564    4556      -8
__arm_smmu_cmdq_poll_set_valid_map.isra      316     308      -8
arm_smmu_init_one_queue.isra                 320     308     -12
queue_remove_raw                             192     176     -16
arm_smmu_gerror_handler                      752     736     -16
arm_smmu_cmdq_issue_cmdlist                 1812    1780     -32
Total: Before=23776, After=23676, chg -0.42%


Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()

Reply via email to