spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

Bharata B Rao Mon, 16 May 2016 22:34:12 -0700

On Mon, May 16, 2016 at 11:55 AM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
> On 05/13/2016 06:41 PM, Bharata B Rao wrote:
>>
>> On Wed, May 4, 2016 at 12:22 PM, Alexey Kardashevskiy <a...@ozlabs.ru>
>> wrote:
>
>
>>
>>> +
>>> +    avail = SPAPR_PCI_DMA_MAX_WINDOWS -
>>> spapr_phb_get_active_win_num(sphb);
>>> +
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +    rtas_st(rets, 1, avail);
>>> +    rtas_st(rets, 2, max_window_size);
>>> +    rtas_st(rets, 3, pgmask);
>>> +    rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
>>> +
>>> +    trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size,
>>> pgmask);
>>> +    return;
>>> +
>>> +param_error_exit:
>>> +    rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>> +}
>>> +
>>> +static void rtas_ibm_create_pe_dma_window(PowerPCCPU *cpu,
>>> +                                          sPAPRMachineState *spapr,
>>> +                                          uint32_t token, uint32_t
>>> nargs,
>>> +                                          target_ulong args,
>>> +                                          uint32_t nret, target_ulong
>>> rets)
>>> +{
>>> +    sPAPRPHBState *sphb;
>>> +    sPAPRTCETable *tcet = NULL;
>>> +    uint32_t addr, page_shift, window_shift, liobn;
>>> +    uint64_t buid;
>>> +
>>> +    if ((nargs != 5) || (nret != 4)) {
>>> +        goto param_error_exit;
>>> +    }
>>> +
>>> +    buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
>>> +    addr = rtas_ld(args, 0);
>>> +    sphb = spapr_pci_find_phb(spapr, buid);
>>> +    if (!sphb || !sphb->ddw_enabled) {
>>> +        goto param_error_exit;
>>> +    }
>>> +
>>> +    page_shift = rtas_ld(args, 3);
>>> +    window_shift = rtas_ld(args, 4);
>>
>>
>> Kernel has a bug due to which wrong window_shift gets returned here. I
>> have posted possible fix here:
>> https://patchwork.ozlabs.org/patch/621497/
>>
>> I have tried to work around this issue in QEMU too
>> https://lists.nongnu.org/archive/html/qemu-ppc/2016-04/msg00226.html
>>
>> But the above work around involves changing the memory representation
>> in DT.
>
>
> What is wrong with this workaround?


The above workaround will result in different representations for
memory in DT before and after the workaround.

Currently for -m 2G, -numa node,nodeid=0,mem=1G -numa
node,nodeid=1,mem=0.5G, we will have the following nodes in DT:

memory@0
memory@40000000
ibm,dynamic-reconfiguration-memory

ibm,dynamic-memory will have only DR LMBs:

[root@localhost ibm,dynamic-reconfiguration-memory]# hexdump ibm,dynamic-memory
0000000 0000 000a 0000 0000 8000 0000 8000 0008
0000010 0000 0000 ffff ffff 0000 0000 0000 0000
0000020 9000 0000 8000 0009 0000 0000 ffff ffff
0000030 0000 0000 0000 0000 a000 0000 8000 000a
0000040 0000 0000 ffff ffff 0000 0000 0000 0000
0000050 b000 0000 8000 000b 0000 0000 ffff ffff
0000060 0000 0000 0000 0000 c000 0000 8000 000c
0000070 0000 0000 ffff ffff 0000 0000 0000 0000
0000080 d000 0000 8000 000d 0000 0000 ffff ffff
0000090 0000 0000 0000 0000 e000 0000 8000 000e
00000a0 0000 0000 ffff ffff 0000 0000 0000 0000
00000b0 f000 0000 8000 000f 0000 0000 ffff ffff
00000c0 0000 0000 0000 0001 0000 0000 8000 0010
00000d0 0000 0000 ffff ffff 0000 0000 0000 0001
00000e0 1000 0000 8000 0011 0000 0000 ffff ffff
00000f0 0000 0000

The memory region looks like this:

memory-region: system
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram
    0000000080000000-000000011fffffff (prio 0, RW): hotplug-memory

After this workaround, all this will change like below:

memory@0
ibm,dynamic-reconfiguration-memory

All LMBs in ibm,dynamic-memory:

[root@localhost ibm,dynamic-reconfiguration-memory]# hexdump ibm,dynamic-memory

0000000 0000 0010 0000 0000 0000 0000 8000 0000
0000010 0000 0000 0000 0000 0000 0080 0000 0000
0000020 1000 0000 8000 0001 0000 0000 0000 0000
0000030 0000 0080 0000 0000 2000 0000 8000 0002
0000040 0000 0000 0000 0000 0000 0080 0000 0000
0000050 3000 0000 8000 0003 0000 0000 0000 0000
0000060 0000 0080 0000 0000 4000 0000 8000 0004
0000070 0000 0000 0000 0001 0000 0008 0000 0000
0000080 5000 0000 8000 0005 0000 0000 0000 0001
0000090 0000 0008 0000 0000 6000 0000 8000 0006
00000a0 0000 0000 ffff ffff 0000 0000 0000 0000
00000b0 7000 0000 8000 0007 0000 0000 ffff ffff
00000c0 0000 0000 0000 0000 8000 0000 8000 0008
00000d0 0000 0000 ffff ffff 0000 0000 0000 0000
00000e0 9000 0000 8000 0009 0000 0000 ffff ffff
00000f0 0000 0000 0000 0000 a000 0000 8000 000a
0000100 0000 0000 ffff ffff 0000 0000 0000 0000
0000110 b000 0000 8000 000b 0000 0000 ffff ffff
0000120 0000 0000 0000 0000 c000 0000 8000 000c
0000130 0000 0000 ffff ffff 0000 0000 0000 0000
0000140 d000 0000 8000 000d 0000 0000 ffff ffff
0000150 0000 0000 0000 0000 e000 0000 8000 000e
0000160 0000 0000 ffff ffff 0000 0000 0000 0000
0000170 f000 0000 8000 000f 0000 0000 ffff ffff
0000180 0000 0000

Hotplug memory region gets a new address range now:

memory-region: system
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-000000005fffffff (prio 0, RW): ppc_spapr.ram
    0000000060000000-00000000ffffffff (prio 0, RW): hotplug-memory


So when a guest that was booted with older QEMU is migrated to a newer
QEMU that has this workaround, then it will start exhibiting the above
changes after first reboot post migration.

If user has done memory hotplug by explicitly specifying address in
the source, then even migration would fail because the addr specified
at the target will not be part of hotplug-memory range.

Hence I believe we shoudn't workaround in this manner but have the
workaround in the DDW code where the window can be easily fixed.

>
>> Hence I feel until the guest kernel changes are available, a
>> simpler work around would be to discard the window_shift value above
>> and recalculate the right value as below:
>>
>> if (machine->ram_size == machine->maxram_size) {
>>     max_window_size = machine->ram_size;
>> } else {
>>      MemoryHotplugState *hpms = &spapr->hotplug_memory;
>>      max_window_size = hpms->base + memory_region_size(&hpms->mr);
>> }
>> window_shift = max_window_size >> SPAPR_TCE_PAGE_SHIFT;
>>
>> and create DDW based on this calculated window_shift value. Does that
>> sound reasonable ?
>
>
> The workaround should only do that for the second window, at least, or for
> the default one but with page size at least 64K; otherwise it is going to be
> a waste of memory (2MB per each 1GB of guest RAM).

Ok, will sync up with you separately to understand more about the
'two' windows here.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH qemu v16 19/19] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

Reply via email to