Re: [Qemu-devel] [PATCH v3] s390x/pci: add common function measurement block
Patchew URL: https://patchew.org/QEMU/1544796678-12736-1-git-send-email-pmo...@linux.ibm.com/ Hi, This series seems to have some coding style problems. See output below for more information: Message-id: 1544796678-12736-1-git-send-email-pmo...@linux.ibm.com Type: series Subject: [Qemu-devel] [PATCH v3] s390x/pci: add common function measurement block === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 3664584 s390x/pci: add common function measurement block === OUTPUT BEGIN === Checking PATCH 1/1: s390x/pci: add common function measurement block... ERROR: code indent should never use tabs #80: FILE: hw/s390x/s390-pci-bus.h:304: +#define ZPCI_FMB_FORMAT^I0$ ERROR: code indent should never use tabs #213: FILE: hw/s390x/s390-pci-inst.c:950: +^Iret = MEMTX_ERROR;$ WARNING: line over 80 characters #241: FILE: hw/s390x/s390-pci-inst.c:978: +if (fmb_do_update(pbdev, offset, pbdev->fmb.sample++, sizeof(pbdev->fmb.sample))) { total: 2 errors, 1 warnings, 257 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/1544796678-12736-1-git-send-email-pmo...@linux.ibm.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
[Qemu-devel] [Bug 1809304] Re: qemu-img convert is freezing for some DMG files.
Because of lacking zero chunk table, reading zero sector will return EIO. I have submitted a series to fix this problem. Please refer to this series: http://lists.nongnu.org/archive/html/qemu- devel/2018-12/msg05637.html Thanks, Yu-Chen Lin -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1809304 Title: qemu-img convert is freezing for some DMG files. Status in QEMU: New Bug description: Recently, I created a file using hdiutil from MacOS (using Zlib compression): $ hdiutil create -volname MyVolName -srcfolder /path/to/my/vol/ -ov -format UDZO myvolname.dmg But, when I try to convert this volume using qemu-img convert, this command is freezing. I'm using the upstream version to test it. It is freezing inside the binary search method to retrieve the chunk. But, I still don't know why. I'm attaching the file as an example. It can be mounted using MacOS or other Linux apps like hfsleuth and darling-dmg. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1809304/+subscriptions
[Qemu-devel] [PATCH v2 1/3] dmg: fix binary search
There is a possible hang in original binary search implementation. That is if chunk1 = 4, chunk2 = 5, chunk3 = 4, and we go else case. The chunk1 will be still 4, and so on. Signed-off-by: yuchenlin --- block/dmg.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 50e91aef6d..0e05702f5d 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -572,14 +572,14 @@ static inline uint32_t search_chunk(BDRVDMGState *s, uint64_t sector_num) { /* binary search */ uint32_t chunk1 = 0, chunk2 = s->n_chunks, chunk3; -while (chunk1 != chunk2) { +while (chunk1 <= chunk2) { chunk3 = (chunk1 + chunk2) / 2; if (s->sectors[chunk3] > sector_num) { -chunk2 = chunk3; +chunk2 = chunk3 - 1; } else if (s->sectors[chunk3] + s->sectorcounts[chunk3] > sector_num) { return chunk3; } else { -chunk1 = chunk3; +chunk1 = chunk3 + 1; } } return s->n_chunks; /* error */ -- 2.17.1
[Qemu-devel] [PATCH v2 2/3] dmg: use enumeration type instead of hard coding number
Signed-off-by: yuchenlin --- block/dmg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 0e05702f5d..6b0a057bf8 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -267,7 +267,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, /* all-zeroes sector (type 2) does not need to be "uncompressed" and can * therefore be unbounded. */ -if (s->types[i] != 2 && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) { +if (s->types[i] != UDIG && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) { error_report("sector count %" PRIu64 " for chunk %" PRIu32 " is larger than max (%u)", s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX); @@ -706,7 +706,7 @@ dmg_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, /* Special case: current chunk is all zeroes. Do not perform a memcpy as * s->uncompressed_chunk may be too small to cover the large all-zeroes * section. dmg_read_chunk is called to find s->current_chunk */ -if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */ +if (s->types[s->current_chunk] == UDIG) { /* all zeroes block entry */ qemu_iovec_memset(qiov, i * 512, 0, 512); continue; } -- 2.17.1
[Qemu-devel] [PATCH v2 3/3] dmg: don't skip zero chunk
The dmg file has many tables which describe: "start from sector XXX to sector XXX, the compression method is XXX and where the compressed data resides on". Each sector in the expanded file should be covered by a table. The table will describe the offset of compressed data (or raw depends on the type) in the dmg. For example: [---The expanded file] [---bzip table ---]/* zeros */[---zlib---] ^ | if we want to read this sector. we will find bzip table which contains this sector, and get the compressed data offset, read it from dmg, uncompress it, finally write to expanded file. If we skip zero chunk (table), some sector cannot find the table which will cause search_chunk() return s->n_chunks, dmg_read_chunk() return -1 and finally causing dmg_co_preadv() return EIO. See: [---The expanded file] [---bzip table ---]/* zeros */[---zlib---] ^ | if we want to read this sector. Oops, we cannot find the table contains it... In the original implementation, we don't have zero table. When we try to read sector inside the zero chunk. We will get EIO, and skip reading. After this patch, we treat zero chunk the same as ignore chunk, it will directly write zero and avoid some sector may not find the table. After this patch: [---The expanded file] [---bzip table ---][--zeros--][---zlib---] Signed-off-by: yuchenlin --- block/dmg.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/block/dmg.c b/block/dmg.c index 6b0a057bf8..137fe9c1ff 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -130,7 +130,8 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk, case UDRW: /* copy */ uncompressed_sectors = DIV_ROUND_UP(s->lengths[chunk], 512); break; -case UDIG: /* zero */ +case UDZE: /* zero */ +case UDIG: /* ignore */ /* as the all-zeroes block may be large, it is treated specially: the * sector is not copied from a large buffer, a simple memset is used * instead. Therefore uncompressed_sectors does not need to be set. */ @@ -199,8 +200,9 @@ typedef struct DmgHeaderState { static bool dmg_is_known_block_type(uint32_t entry_type) { switch (entry_type) { +case UDZE:/* zeros */ case UDRW:/* uncompressed */ -case UDIG:/* zeroes */ +case UDIG:/* ignore */ case UDZO:/* zlib */ return true; case UDBZ:/* bzip2 */ @@ -265,9 +267,10 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds, /* sector count */ s->sectorcounts[i] = buff_read_uint64(buffer, offset + 0x10); -/* all-zeroes sector (type 2) does not need to be "uncompressed" and can - * therefore be unbounded. */ -if (s->types[i] != UDIG && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) { +/* all-zeroes sector (type UDZE and UDIG) does not need to be + * "uncompressed" and can therefore be unbounded. */ +if (s->types[i] != UDZE && s->types[i] != UDIG +&& s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) { error_report("sector count %" PRIu64 " for chunk %" PRIu32 " is larger than max (%u)", s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX); @@ -671,7 +674,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num) return -1; } break; -case UDIG: /* zero */ +case UDZE: /* zeros */ +case UDIG: /* ignore */ /* see dmg_read, it is treated specially. No buffer needs to be * pre-filled, the zeroes can be set directly. */ break; @@ -706,7 +710,8 @@ dmg_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, /* Special case: current chunk is all zeroes. Do not perform a memcpy as * s->uncompressed_chunk may be too small to cover the large all-zeroes * section. dmg_read_chunk is called to find s->current_chunk */ -if (s->types[s->current_chunk] == UDIG) { /* all zeroes block entry */ +if (s->types[s->current_chunk] == UDZE +|| s->types[s->current_chunk] == UDIG) { /* all zeroes block entry */ qemu_iovec_memset(qiov, i * 512, 0, 512); continue; } -- 2.17.1
[Qemu-devel] [PATCH v2 0/3] dmg: fixing reading in dmg
There are two bugs in dmg reading. First, it may hang in binary search. this problem is solved by patch 1. Second, because of lacking zero chunk table, reading zero sector will return EIO. thie problem is solved by patch 2 and 3. Thanks v1 - >v2: * fix typos in patch 1 * add patch 2 and patch 3 yuchenlin (3): dmg: fix binary search dmg: use enumeration type instead of hard coding number dmg: don't skip zero chunk block/dmg.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) -- 2.17.1
Re: [Qemu-devel] [PATCH v3 8/9] target/ppc: move FP and VMX registers into aligned vsr register array
On 12/22/18 10:09 AM, Mark Cave-Ayland wrote: > Do you want these helpers used just within > linux-user/ppc/signal.c or also within the other files touched by this patch > e.g. > arch_dump.c, gdbstub.c etc.? Everywhere. Thanks! r~
[Qemu-devel] [Bug 1793275] Re: Hosts fail to start after update to QEMU 3.0
This bug is not present in QEMU-3.1.0. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793275 Title: Hosts fail to start after update to QEMU 3.0 Status in QEMU: New Bug description: Host OS: Archlinux Host Architecture: AMD64 Guest OS: FreeBSD-11.2 (x2) and Archlinux (x1) Guest Architecture: AMD64 I have been using QEMU 2.x without issue for a number of years but since updating to QEMU 3.0 my guests do not complete startup. FreeBSD 11.2 guest failure symptom: The two FreeBSD-11.2 guests output repeated messages of "unexpected cache type 4". This appears to be an internal error message and I've not found any instances of it through Google search. Archlinux guest failure symptom: The single Archlinux guest gets no further than the message "uncompressing initial ramdisk". The guests are started by a qemu-kvm invokation. No virtual machine managers are used. The command lines used (from ps awx) to launch the VMs are: [neil@optimus ~]$ ps awx |grep qemu 1492 ?Sl 3:19 /usr/bin/qemu-system-x86_64 -daemonize -pidfile /run/qemu_vps1.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive file=/dev/system/vps1,cache=none,format=raw,if=virtio,index=0,media=disk -m 1024 -name FreeBSD_1 -net nic,macaddr=52:54:AD:86:64:00,model=virtio -net vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.2:23,server,nowait -vnc 192.168.0.1:0 1510 ?Sl 0:54 /usr/bin/qemu-system-x86_64 -daemonize -pidfile /run/qemu_vps2.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive file=/dev/system/vps2,cache=none,format=raw,if=virtio,index=0,media=disk -m 1024 -name Archlinux -net nic,macaddr=52:54:AD:86:64:01,model=virtio -net vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.3:23,server,nowait -vnc 192.168.0.1:1 1529 ?Sl 3:07 /usr/bin/qemu-system-x86_64 -daemonize -pidfile /run/qemu_vps3.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive file=/dev/system/vps3,cache=none,format=raw,if=virtio,index=0,media=disk -m 1024 -name FreeBSD_2 -net nic,macaddr=52:54:AD:86:64:02,model=virtio -net vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.4:23,server,nowait -vnc 192.168.0.1:2 The VMs were installed to LVM volumes on the host machine (hence the /dev/system/vpsN device names). Networking is over a Linux tap interface connected to a VDE2 virtual network switch. Currently working version of QEMU: qemu-headless 2.12.1-1 Failing version of QEMU: qemu-headless-3.0.0-1 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1793275/+subscriptions
Re: [Qemu-devel] [PATCH v3 8/9] target/ppc: move FP and VMX registers into aligned vsr register array
On 20/12/2018 17:57, Richard Henderson wrote: > On 12/20/18 8:31 AM, Mark Cave-Ayland wrote: >> The VSX register array is a block of 64 128-bit registers where the first 32 >> registers consist of the existing 64-bit FP registers extended to 128-bit >> using new VSR registers, and the last 32 registers are the VMX 128-bit >> registers as show below: >> >> 64-bit 64-bit >> +++ >> |FP0 || VSR0 >> +++ >> |FP1 || VSR1 >> +++ >> |... |... | ... >> +++ >> |FP30|| VSR30 >> +++ >> |FP31|| VSR31 >> +++ >> | VMX0 | VSR32 >> +-+ >> | VMX1 | VSR33 >> +-+ >> | ...| ... >> +-+ >> | VMX30 | VSR62 >> +-+ >> | VMX31 | VSR63 >> +-+ >> >> In order to allow for future conversion of VSX instructions to use TCG vector >> operations, recreate the same layout using an aligned version of the existing >> vsr register array. >> >> Since the old fpr and avr register arrays are removed, the existing callers >> must also be updated to use the correct offset in the vsr register array. >> This >> also includes switching the relevant VMState fields over to using subarrays >> to make sure that migration is preserved. >> >> Signed-off-by: Mark Cave-Ayland >> Reviewed-by: Richard Henderson >> Acked-by: David Gibson >> --- >> linux-user/ppc/signal.c | 24 ++--- >> target/ppc/arch_dump.c | 12 +++ >> target/ppc/cpu.h| 9 ++--- >> target/ppc/gdbstub.c| 8 ++--- >> target/ppc/internal.h | 18 +++--- >> target/ppc/machine.c| 72 >> ++--- >> target/ppc/monitor.c| 4 +-- >> target/ppc/translate.c | 14 >> target/ppc/translate/dfp-impl.inc.c | 2 +- >> target/ppc/translate/vmx-impl.inc.c | 7 +++- >> target/ppc/translate/vsx-impl.inc.c | 4 +-- >> target/ppc/translate_init.inc.c | 24 ++--- >> 12 files changed, 126 insertions(+), 72 deletions(-) >> >> diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c >> index 2ae120a2bc..a053dd5b84 100644 >> --- a/linux-user/ppc/signal.c >> +++ b/linux-user/ppc/signal.c >> @@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct >> target_mcontext *frame) >> /* Save Altivec registers if necessary. */ >> if (env->insns_flags & PPC_ALTIVEC) { >> uint32_t *vrsave; >> -for (i = 0; i < ARRAY_SIZE(env->avr); i++) { >> -ppc_avr_t *avr = >avr[i]; >> +for (i = 0; i < 32; i++) { >> +ppc_avr_t *avr = >vsr[32 + i]; > > Because of our subsequent discussion re endianness within these vectors, I > think it would be helpful add some helpers here. > > static inline ppc_avr_t *cpu_avr_ptr(CPUPPCState *env, int i) > { > return >vsr[32 + i]; > } > >> /* Save VSX second halves */ >> if (env->insns_flags2 & PPC2_VSX) { >> uint64_t *vsregs = (uint64_t *)>mc_vregs.altivec[34]; >> -for (i = 0; i < ARRAY_SIZE(env->vsr); i++) { >> -__put_user(env->vsr[i], [i]); >> +for (i = 0; i < 32; i++) { >> +__put_user(env->vsr[i].u64[1], [i]); > > static inline uint64_t *cpu_vsrl_ptr(CPUPPCState *env, int i) > { > return >vsr[i].u64[1]; > } > >> /* Save floating point registers. */ >> if (env->insns_flags & PPC_FLOAT) { >> -for (i = 0; i < ARRAY_SIZE(env->fpr); i++) { >> -__put_user(env->fpr[i], >mc_fregs[i]); >> +for (i = 0; i < 32; i++) { >> +__put_user(env->vsr[i].u64[0], >mc_fregs[i]); > > static inline uint64_t *cpu_fpr_ptr(CPUPPCState *env, int i) > { > return >vsr[i].u64[0]; > } > > > Eventually, we will want to make these last two functions be dependent on host > endianness, so that we can remove getVSR and putVSR. At which point VSR and > AVX registers will have the same representation. Because at present they > don't, which IMO is, if not a bug, at least a severe mis-feature. Okay I can do that. Do you want these helpers used just within linux-user/ppc/signal.c or also within the other files touched by this patch e.g. arch_dump.c,
Re: [Qemu-devel] [PATCH 0/2] Fix TABs in many files
Patchew URL: https://patchew.org/QEMU/20181213223737.11793-1-pbonz...@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Message-id: 20181213223737.11793-1-pbonz...@redhat.com Type: series Subject: [Qemu-devel] [PATCH 0/2] Fix TABs in many files === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 303caa2 avoid TABs in files that only contain a few aab0b24 remove space-tab sequences === OUTPUT BEGIN === Checking PATCH 1/2: remove space-tab sequences... ERROR: code indent should never use tabs #21: FILE: bsd-user/x86_64/target_syscall.h:15: +^Iabi_ulong r11;$ ERROR: code indent should never use tabs #34: FILE: crypto/aes.c:1074: +^Iint i = 0;$ ERROR: code indent should never use tabs #43: FILE: crypto/aes.c:1163: +^I^I}$ ERROR: code indent should never use tabs #52: FILE: crypto/aes.c:1250: +^I/* round 2: */$ ERROR: code indent should never use tabs #61: FILE: crypto/aes.c:1260: +^I/* round 4: */$ ERROR: code indent should never use tabs #70: FILE: crypto/aes.c:1270: +^I/* round 6: */$ ERROR: code indent should never use tabs #79: FILE: crypto/aes.c:1280: +^I/* round 8: */$ ERROR: code indent should never use tabs #88: FILE: crypto/aes.c:1572: +^Is0 =$ ERROR: code indent should never use tabs #94: FILE: crypto/aes.c:1577: +^I^Irk[0];$ ERROR: code indent should never use tabs #97: FILE: crypto/aes.c:1579: +^Is1 =$ ERROR: code indent should never use tabs #103: FILE: crypto/aes.c:1584: +^I^Irk[1];$ ERROR: code indent should never use tabs #106: FILE: crypto/aes.c:1586: +^Is2 =$ ERROR: code indent should never use tabs #112: FILE: crypto/aes.c:1591: +^I^Irk[2];$ ERROR: code indent should never use tabs #115: FILE: crypto/aes.c:1593: +^Is3 =$ ERROR: code indent should never use tabs #121: FILE: crypto/aes.c:1598: +^I^Irk[3];$ ERROR: code indent should never use tabs #134: FILE: disas/alpha.c:675: +^I^Iwhich bits in the actual opcode must match OPCODE.$ ERROR: code indent should never use tabs #143: FILE: disas/alpha.c:702: +^I^Ibut with defined results on previous implementations.$ ERROR: code indent should never use tabs #147: FILE: disas/alpha.c:705: +^I^Ipresumably undefined results on previous implementations$ ERROR: code indent should never use tabs #156: FILE: disas/alpha.c:835: +^I^I^I0xFFE0, BASE, { RC } },^I^I/* ev56 but */$ ERROR: code indent should never use tabs #169: FILE: disas/arm.c:1080: +^I^I^I^I(top bit of range being the sign bit)$ ERROR: code indent should never use tabs #182: FILE: disas/i386.c:6078: +^I}$ ERROR: code indent should never use tabs #191: FILE: disas/i386.c:6115: +^I}$ ERROR: code indent should never use tabs #204: FILE: disas/m68k.c:353: +^I^I^I^I^I^I(not 0,1,7.2-4)$ ERROR: space required after that ',' (ctx:VxV) #204: FILE: disas/m68k.c:353: + (not 0,1,7.2-4) ^ ERROR: space required after that ',' (ctx:VxV) #204: FILE: disas/m68k.c:353: + (not 0,1,7.2-4) ^ ERROR: spaces required around that '-' (ctx:VxV) #204: FILE: disas/m68k.c:353: + (not 0,1,7.2-4) ^ ERROR: code indent should never use tabs #213: FILE: disas/m68k.c:1650: +^I case 0x18: name = "%psr"; break;$ ERROR: trailing statements should be on next line #213: FILE: disas/m68k.c:1650: + case 0x18: name = "%psr"; break; ERROR: code indent should never use tabs #241: FILE: include/hw/elf_ops.h:346: +^I*pentry = (uint64_t)(elf_sword)ehdr.e_entry;$ ERROR: code indent should never use tabs #254: FILE: linux-user/linuxload.c:57: +^Ibprm->e_uid = st.st_uid;$ ERROR: code indent should never use tabs #267: FILE: linux-user/syscall.c:905: +^Ireturn target_brk;$ ERROR: code indent should never use tabs #280: FILE: linux-user/syscall_defs.h:1810: +^Iabi_long^Ist_blocks;^I/* Number 512-byte blocks allocated. */$ ERROR: code indent should never use tabs #289: FILE: linux-user/syscall_defs.h:1819: +^Iabi_long^I__unused[3];$ ERROR: code indent should never use tabs #302: FILE: linux-user/x86_64/target_syscall.h:15: +^Iabi_ulong r11;$ ERROR: suspect code indent for conditional statements (24, 27) #313: FILE: slirp/ip_input.c:195: if
Re: [Qemu-devel] [PATCH 0/3] Allow hw/audio drivers to pass raw DB values to audio/ drivers
Sorry for a mess, forgot to add maintainer. Gerd, please, take a look at these patches. пт, 21 дек. 2018 г. в 22:40, Yaroslav Isakov : > > This patch series introduces the ability for virtual audio drivers to pass > information about guest-chosen DB values to backend audio drivers. > > For now, supported virtual driver is hda-codec, and backend is pulseaudio, as > they both support DB values. > > Without these patches, emulated Windows has a very short range of hearable > sound, as range in the guest is much smaller than in Pulseaudio. > > Yaroslav Isakov (3): > Allow audio driver to pass DB value to underlying drivers > Pass raw DB values from hda-codec.c to audio driver > If raw DB values are known, use them in paaudio > > audio/audio.c | 15 +-- > audio/audio.h | 6 -- > audio/mixeng.h | 6 -- > audio/paaudio.c | 9 +++-- > hw/audio/ac97.c | 4 ++-- > hw/audio/hda-codec-common.h | 2 +- > hw/audio/hda-codec.c| 12 ++-- > hw/audio/lm4549.c | 2 +- > hw/audio/wm8750.c | 18 -- > hw/display/xlnx_dp.c| 3 ++- > hw/usb/dev-audio.c | 6 -- > 11 files changed, 60 insertions(+), 23 deletions(-) > > -- > 2.18.1 >
[Qemu-devel] [Bug 1809546] Re: Writing a byte to a pl011 SFR overwrites the whole SFR
Adding the link script. ** Attachment added: "linkscript.ld" https://bugs.launchpad.net/qemu/+bug/1809546/+attachment/5224337/+files/linkscript.ld -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1809546 Title: Writing a byte to a pl011 SFR overwrites the whole SFR Status in QEMU: New Bug description: The bug is present in QEMU 2.8.1 and, if my analysis is correct, also on master. I first noticed that a PL011 UART driver, which is fine on real hardware, fails to enable the RX interrupt in the IMSC register when running in QEMU. However, the problem only comes up if the code is compiled without optimizations. I think I've narrowed it down to a minimal example that will exhibit the problem if run as a bare-metal application. Given: pl011_addr: .word 0x10009000 The following snippet will be problematic: ldr r3, pl011_addr ldrb r2, [r3, #0x38]// IMSC mov r2, #0 orr r2, r2, #0x10 // R2 == 0x10 strb r2, [r3, #0x38]// Whole word reads correctly after this ldrb r2, [r3, #0x39] mov r2, #0 strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as well After the first strb instruction, which writes to 0x10009038, everything is fine. It can be seen in the QEMU monitor: (qemu) xp 0x10009038 10009038: 0x0010 After the second strb instruction, the write to 0x10009039 clears the entire word: (qemu) xp 0x10009038 10009038: 0x QEMU command-line, using the vexpress-a9 which has the PL011 at 0x10009000: qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic -monitor telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin -gdb tcp::2159 -serial mon:stdio Compiling the original C code with optimizations makes the driver work. It compiles down to assembly that only does a single write: ldr r3, pl011_addr mov r2, #0x10 str r2, [r3, #0x38] Attached is the an assembly file, and linkscript, that shows the problem, and also includes the working code. I haven't debugged inside of QEMU itself but it seems to me that the problem is in pl011_write in pl011.c - the functions looks at which offset is being written, and then writes the entire SFR that offset falls under, which means that changing a single byte will change the whole SFR. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1809546/+subscriptions
[Qemu-devel] [Bug 1809546] [NEW] Writing a byte to a pl011 SFR overwrites the whole SFR
Public bug reported: The bug is present in QEMU 2.8.1 and, if my analysis is correct, also on master. I first noticed that a PL011 UART driver, which is fine on real hardware, fails to enable the RX interrupt in the IMSC register when running in QEMU. However, the problem only comes up if the code is compiled without optimizations. I think I've narrowed it down to a minimal example that will exhibit the problem if run as a bare-metal application. Given: pl011_addr: .word 0x10009000 The following snippet will be problematic: ldr r3, pl011_addr ldrb r2, [r3, #0x38]// IMSC mov r2, #0 orr r2, r2, #0x10 // R2 == 0x10 strb r2, [r3, #0x38]// Whole word reads correctly after this ldrb r2, [r3, #0x39] mov r2, #0 strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as well After the first strb instruction, which writes to 0x10009038, everything is fine. It can be seen in the QEMU monitor: (qemu) xp 0x10009038 10009038: 0x0010 After the second strb instruction, the write to 0x10009039 clears the entire word: (qemu) xp 0x10009038 10009038: 0x QEMU command-line, using the vexpress-a9 which has the PL011 at 0x10009000: qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic -monitor telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin -gdb tcp::2159 -serial mon:stdio Compiling the original C code with optimizations makes the driver work. It compiles down to assembly that only does a single write: ldr r3, pl011_addr mov r2, #0x10 str r2, [r3, #0x38] Attached is the an assembly file, and linkscript, that shows the problem, and also includes the working code. I haven't debugged inside of QEMU itself but it seems to me that the problem is in pl011_write in pl011.c - the functions looks at which offset is being written, and then writes the entire SFR that offset falls under, which means that changing a single byte will change the whole SFR. ** Affects: qemu Importance: Undecided Status: New ** Attachment added: "startup.s" https://bugs.launchpad.net/bugs/1809546/+attachment/5224336/+files/startup.s -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1809546 Title: Writing a byte to a pl011 SFR overwrites the whole SFR Status in QEMU: New Bug description: The bug is present in QEMU 2.8.1 and, if my analysis is correct, also on master. I first noticed that a PL011 UART driver, which is fine on real hardware, fails to enable the RX interrupt in the IMSC register when running in QEMU. However, the problem only comes up if the code is compiled without optimizations. I think I've narrowed it down to a minimal example that will exhibit the problem if run as a bare-metal application. Given: pl011_addr: .word 0x10009000 The following snippet will be problematic: ldr r3, pl011_addr ldrb r2, [r3, #0x38]// IMSC mov r2, #0 orr r2, r2, #0x10 // R2 == 0x10 strb r2, [r3, #0x38]// Whole word reads correctly after this ldrb r2, [r3, #0x39] mov r2, #0 strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as well After the first strb instruction, which writes to 0x10009038, everything is fine. It can be seen in the QEMU monitor: (qemu) xp 0x10009038 10009038: 0x0010 After the second strb instruction, the write to 0x10009039 clears the entire word: (qemu) xp 0x10009038 10009038: 0x QEMU command-line, using the vexpress-a9 which has the PL011 at 0x10009000: qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic -monitor telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin -gdb tcp::2159 -serial mon:stdio Compiling the original C code with optimizations makes the driver work. It compiles down to assembly that only does a single write: ldr r3, pl011_addr mov r2, #0x10 str r2, [r3, #0x38] Attached is the an assembly file, and linkscript, that shows the problem, and also includes the working code. I haven't debugged inside of QEMU itself but it seems to me that the problem is in pl011_write in pl011.c - the functions looks at which offset is being written, and then writes the entire SFR that offset falls under, which means that changing a single byte will change the whole SFR. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1809546/+subscriptions
Re: [Qemu-devel] [PATCH v12 01/31] block: Use bdrv_refresh_filename() to pull
On Mon 17 Dec 2018 11:43:18 PM CET, Max Reitz wrote: > @@ -3327,6 +,7 @@ static int img_rebase(int argc, char **argv) > qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true); > } > > +bdrv_refresh_filename(bs); > overlay_filename = bs->exact_filename[0] ? bs->exact_filename > : bs->filename; > out_real_path = g_malloc(PATH_MAX); > -- > 2.19.2 I also doubt that these new hunks are necessary, but it doesn't hurt to be consistent :) Reviewed-by: Alberto Garcia Berto
Re: [Qemu-devel] [PATCH PULL 00/31] RDMA queue
Hi Peter, On 12/22/18 3:59 PM, Peter Maydell wrote: On Sat, 22 Dec 2018 at 09:50, Marcel Apfelbaum wrote: The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8: Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into staging (2018-12-21 15:49:59 +) are available in the Git repository at: https://github.com/marcel-apf/qemu tags/rdma-pull-request for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b: pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 11:09:57 +0200) RDMA queue * Add support for RDMA MAD * Various fixes for the pvrdma backend Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0 for any user-visible changes. Done. Thanks, Marcel -- PMM
Re: [Qemu-devel] [PATCH PULL 00/31] RDMA queue
On Sat, 22 Dec 2018 at 09:50, Marcel Apfelbaum wrote: > > The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8: > > Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' > into staging (2018-12-21 15:49:59 +) > > are available in the Git repository at: > > https://github.com/marcel-apf/qemu tags/rdma-pull-request > > for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b: > > pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 > 11:09:57 +0200) > > > RDMA queue > * Add support for RDMA MAD > * Various fixes for the pvrdma backend > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0 for any user-visible changes. -- PMM
Re: [Qemu-devel] [PULL v4 00/35] Misc patches for 2018-12-21
On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini wrote: > > On 21/12/18 22:09, Peter Maydell wrote: > > I don't really understand what's going on here, or why > > it only happens with this one system (my main x86-64 > > Linux Ubuntu 16.04.5 box) and not the various others I'm > > running test builds on. But it does seem to be 100% > > reliable with any of these pullreqs with the new test > > driver in them :-( > > I'm afraid something in your setup is causing make's stdout to have > O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be > something above it. I also checked Perl with strace and, at least here, > it doesn't set O_NONBLOCK. > > So here are some ideas... First, can you try applying something like > this to reproduce? OK, I'll give these a go, but it'll have to be in January now. thanks -- PMM
[Qemu-devel] [PATCH v2] qemu: avoid memory leak while remove disk
Memset vhost_dev to zero in the vhost_dev_cleanup function. This causes dev.vqs to be NULL, so that vqs does not free up space when calling the g_free function. This will result in a memory leak. But you can't release vqs directly in the vhost_dev_cleanup function, because vhost_net will also call this function, and vhost_net's vqs is assigned by array. In order to solve this problem, we first save the pointer of vqs, and release the space of vqs after vhost_dev_cleanup is called. Signed-off-by: Jian Wang --- hw/block/vhost-user-blk.c | 7 +-- hw/scsi/vhost-scsi.c | 3 ++- hw/scsi/vhost-user-scsi.c | 3 ++- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 1451940..c3af28f 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -250,6 +250,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserBlk *s = VHOST_USER_BLK(vdev); VhostUserState *user; +struct vhost_virtqueue *vqs = NULL; int i, ret; if (!s->chardev.chr) { @@ -288,6 +289,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs); s->dev.vq_index = 0; s->dev.backend_features = 0; +vqs = s->dev.vqs; vhost_dev_set_config_notifier(>dev, _ops); @@ -314,7 +316,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) vhost_err: vhost_dev_cleanup(>dev); virtio_err: -g_free(s->dev.vqs); +g_free(vqs); virtio_cleanup(vdev); vhost_user_cleanup(user); @@ -326,10 +328,11 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserBlk *s = VHOST_USER_BLK(dev); +struct vhost_virtqueue *vqs = s->dev.vqs; vhost_user_blk_set_status(vdev, 0); vhost_dev_cleanup(>dev); -g_free(s->dev.vqs); +g_free(vqs); virtio_cleanup(vdev); if (s->vhost_user) { diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 7f21b4f..61e2e57 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -215,6 +215,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error **errp) { VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostSCSICommon *vsc = VHOST_SCSI_COMMON(dev); +struct vhost_virtqueue *vqs = vsc->dev.vqs; migrate_del_blocker(vsc->migration_blocker); error_free(vsc->migration_blocker); @@ -223,7 +224,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error **errp) vhost_scsi_set_status(vdev, 0); vhost_dev_cleanup(>dev); -g_free(vsc->dev.vqs); +g_free(vqs); virtio_scsi_common_unrealize(dev, errp); } diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index 2e1ba4a..6728878 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -121,12 +121,13 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VHostUserSCSI *s = VHOST_USER_SCSI(dev); VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); +struct vhost_virtqueue *vqs = vsc->dev.vqs; /* This will stop the vhost backend. */ vhost_user_scsi_set_status(vdev, 0); vhost_dev_cleanup(>dev); -g_free(vsc->dev.vqs); +g_free(vqs); virtio_scsi_common_unrealize(dev, errp); -- 1.8.3.1
[Qemu-devel] [PATCH PULL 24/31] docs: Update pvrdma device documentation
From: Yuval Shaia Interface with the device is changed with the addition of support for MAD packets. Adjust documentation accordingly. While there fix a minor mistake which may lead to think that there is a relation between using RXE on host and the compatibility with bare-metal peers. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- docs/pvrdma.txt | 126 1 file changed, 107 insertions(+), 19 deletions(-) diff --git a/docs/pvrdma.txt b/docs/pvrdma.txt index 5599318159..5175251b47 100644 --- a/docs/pvrdma.txt +++ b/docs/pvrdma.txt @@ -9,8 +9,9 @@ It works with its Linux Kernel driver AS IS, no need for any special guest modifications. While it complies with the VMware device, it can also communicate with bare -metal RDMA-enabled machines and does not require an RDMA HCA in the host, it -can work with Soft-RoCE (rxe). +metal RDMA-enabled machines as peers. + +It does not require an RDMA HCA in the host, it can work with Soft-RoCE (rxe). It does not require the whole guest RAM to be pinned allowing memory over-commit and, even if not implemented yet, migration support will be @@ -78,29 +79,116 @@ the required RDMA libraries. 3. Usage + + +3.1 VM Memory settings +== Currently the device is working only with memory backed RAM and it must be mark as "shared": -m 1G \ -object memory-backend-ram,id=mb1,size=1G,share \ -numa node,memdev=mb1 \ -The pvrdma device is composed of two functions: - - Function 0 is a vmxnet Ethernet Device which is redundant in Guest - but is required to pass the ibdevice GID using its MAC. - Examples: - For an rxe backend using eth0 interface it will use its mac: - -device vmxnet3,addr=.0,multifunction=on,mac= - For an SRIOV VF, we take the Ethernet Interface exposed by it: - -device vmxnet3,multifunction=on,mac= - - Function 1 is the actual device: - -device pvrdma,addr=.1,backend-dev=,backend-gid-idx=,backend-port= - where the ibdevice can be rxe or RDMA VF (e.g. mlx5_4) - Note: Pay special attention that the GID at backend-gid-idx matches vmxnet's MAC. - The rules of conversion are part of the RoCE spec, but since manual conversion - is not required, spotting problems is not hard: -Example: GID: fe80::::7efe:90ff:fecb:743a - MAC: 7c:fe:90:cb:74:3a -Note the difference between the first byte of the MAC and the GID. + +3.2 MAD Multiplexer +=== +MAD Multiplexer is a service that exposes MAD-like interface for VMs in +order to overcome the limitation where only single entity can register with +MAD layer to send and receive RDMA-CM MAD packets. + +To build rdmacm-mux run +# make rdmacm-mux + +The application accepts 3 command line arguments and exposes a UNIX socket +to pass control and data to it. +-d rdma-device-name Name of RDMA device to register with +-s unix-socket-path Path to unix socket to listen (default /var/run/rdmacm-mux) +-p rdma-device-port Port number of RDMA device to register with (default 1) +The final UNIX socket file name is a concatenation of the 3 arguments so +for example for device mlx5_0 on port 2 this /var/run/rdmacm-mux-mlx5_0-2 +will be created. + +pvrdma requires this service. + +Please refer to contrib/rdmacm-mux for more details. + + +3.3 Service exposed by libvirt daemon += +The control over the RDMA device's GID table is done by updating the +device's Ethernet function addresses. +Usually the first GID entry is determined by the MAC address, the second by +the first IPv6 address and the third by the IPv4 address. Other entries can +be added by adding more IP addresses. The opposite is the same, i.e. +whenever an address is removed, the corresponding GID entry is removed. +The process is done by the network and RDMA stacks. Whenever an address is +added the ib_core driver is notified and calls the device driver add_gid +function which in turn update the device. +To support this in pvrdma device the device hooks into the create_bind and +destroy_bind HW commands triggered by pvrdma driver in guest. + +Whenever changed is made to the pvrdma port's GID table a special QMP +messages is sent to be processed by libvirt to update the address of the +backend Ethernet device. + +pvrdma requires that libvirt service will be up. + + +3.4 PCI devices settings + +RoCE device exposes two functions - an Ethernet and RDMA. +To support it, pvrdma device is composed of two PCI functions, an Ethernet +device of type vmxnet3 on PCI slot 0 and a PVRDMA device on PCI slot 1. The +Ethernet function can be used for other Ethernet purposes such as IP. + + +3.5 Device parameters += +- netdev: Specifies the Ethernet device function name on the host for + example enp175s0f0. For Soft-RoCE device (rxe) this would be the Ethernet + device used to create it. +-
[Qemu-devel] [PATCH PULL 25/31] pvrdma: release device resources in case of an error
From: Prasad J Pandit If during pvrdma device initialisation an error occurs, pvrdma_realize() does not release memory resources, leading to memory leakage. Reported-by: Li Qiang Signed-off-by: Prasad J Pandit Message-Id: <20181212175817.815-1-ppan...@redhat.com> Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index 23dc9926e3..64de16fb52 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -573,7 +573,7 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque) static void pvrdma_realize(PCIDevice *pdev, Error **errp) { -int rc; +int rc = 0; PVRDMADev *dev = PVRDMA_DEV(pdev); Object *memdev_root; bool ram_shared = false; @@ -649,6 +649,7 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp) out: if (rc) { +pvrdma_fini(pdev); error_append_hint(errp, "Device fail to load\n"); } } -- 2.17.1
[Qemu-devel] [PATCH PULL 23/31] hw/rdma: Do not call rdma_backend_del_gid on an empty gid
From: Yuval Shaia When device goes down the function fini_ports loops over all entries in gid table regardless of the fact whether entry is valid or not. In case that entry is not valid we'd like to skip from any further processing in backend device. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_rm.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c index ca127c8c26..f5b1295890 100644 --- a/hw/rdma/rdma_rm.c +++ b/hw/rdma/rdma_rm.c @@ -555,6 +555,10 @@ int rdma_rm_del_gid(RdmaDeviceResources *dev_res, RdmaBackendDev *backend_dev, { int rc; +if (!dev_res->port.gid_tbl[gid_idx].gid.global.interface_id) { +return 0; +} + rc = rdma_backend_del_gid(backend_dev, ifname, _res->port.gid_tbl[gid_idx].gid); if (rc) { -- 2.17.1
[Qemu-devel] [PATCH PULL 31/31] pvrdma: check return value from pvrdma_idx_ring_has_ routines
From: Prasad J Pandit pvrdma_idx_ring_has_[data/space] routines also return invalid index PVRDMA_INVALID_IDX[=-1], if ring has no data/space. Check return value from these routines to avoid plausible infinite loops. Reported-by: Li Qiang Signed-off-by: Prasad J Pandit Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_dev_ring.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/hw/rdma/vmw/pvrdma_dev_ring.c b/hw/rdma/vmw/pvrdma_dev_ring.c index 01247fc041..e8e5b502f6 100644 --- a/hw/rdma/vmw/pvrdma_dev_ring.c +++ b/hw/rdma/vmw/pvrdma_dev_ring.c @@ -73,23 +73,16 @@ out: void *pvrdma_ring_next_elem_read(PvrdmaRing *ring) { +int e; unsigned int idx = 0, offset; -/* -pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail, - ring->ring_state->cons_head); -*/ - -if (!pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, )) { +e = pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, ); +if (e <= 0) { pr_dbg("No more data in ring\n"); return NULL; } offset = idx * ring->elem_sz; -/* -pr_dbg("idx=%d\n", idx); -pr_dbg("offset=%d\n", offset); -*/ return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % TARGET_PAGE_SIZE); } @@ -105,20 +98,20 @@ void pvrdma_ring_read_inc(PvrdmaRing *ring) void *pvrdma_ring_next_elem_write(PvrdmaRing *ring) { -unsigned int idx, offset, tail; +int idx; +unsigned int offset, tail; -/* -pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail, - ring->ring_state->cons_head); -*/ - -if (!pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems, )) { +idx = pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems, ); +if (idx <= 0) { pr_dbg("CQ is full\n"); return NULL; } idx = pvrdma_idx(>ring_state->prod_tail, ring->max_elems); -/* TODO: tail == idx */ +if (idx < 0 || tail != idx) { +pr_dbg("invalid idx\n"); +return NULL; +} offset = idx * ring->elem_sz; return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % TARGET_PAGE_SIZE); -- 2.17.1
[Qemu-devel] [PATCH PULL 28/31] pvrdma: check number of pages when creating rings
From: Prasad J Pandit When creating CQ/QP rings, an object can have up to PVRDMA_MAX_FAST_REG_PAGES 8 pages. Check 'npages' parameter to avoid excessive memory allocation or a null dereference. Reported-by: Li Qiang Signed-off-by: Prasad J Pandit Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_cmd.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index 3b94545761..f236ac4795 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -259,6 +259,11 @@ static int create_cq_ring(PCIDevice *pci_dev , PvrdmaRing **ring, int rc = -EINVAL; char ring_name[MAX_RING_NAME_SZ]; +if (!nchunks || nchunks > PVRDMA_MAX_FAST_REG_PAGES) { +pr_dbg("invalid nchunks: %d\n", nchunks); +return rc; +} + pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)pdir_dma); dir = rdma_pci_dma_map(pci_dev, pdir_dma, TARGET_PAGE_SIZE); if (!dir) { @@ -372,6 +377,12 @@ static int create_qp_rings(PCIDevice *pci_dev, uint64_t pdir_dma, char ring_name[MAX_RING_NAME_SZ]; uint32_t wqe_sz; +if (!spages || spages > PVRDMA_MAX_FAST_REG_PAGES +|| !rpages || rpages > PVRDMA_MAX_FAST_REG_PAGES) { +pr_dbg("invalid pages: %d, %d\n", spages, rpages); +return rc; +} + pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)pdir_dma); dir = rdma_pci_dma_map(pci_dev, pdir_dma, TARGET_PAGE_SIZE); if (!dir) { -- 2.17.1
[Qemu-devel] [PATCH PULL 27/31] pvrdma: add uar_read routine
From: Prasad J Pandit Define skeleton 'uar_read' routine. Avoid NULL dereference. Reported-by: Li Qiang Signed-off-by: Prasad J Pandit Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index 64de16fb52..838ad8a949 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -448,6 +448,11 @@ static const MemoryRegionOps regs_ops = { }, }; +static uint64_t uar_read(void *opaque, hwaddr addr, unsigned size) +{ +return 0x; +} + static void uar_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) { PVRDMADev *dev = opaque; @@ -489,6 +494,7 @@ static void uar_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) } static const MemoryRegionOps uar_ops = { +.read = uar_read, .write = uar_write, .endianness = DEVICE_LITTLE_ENDIAN, .impl = { -- 2.17.1
[Qemu-devel] [PATCH PULL 30/31] rdma: remove unused VENDOR_ERR_NO_SGE macro
From: Prasad J Pandit With commit 4481985c (rdma: check num_sge does not exceed MAX_SGE) macro VENDOR_ERR_NO_SGE is no longer in use - delete it. Signed-off-by: Prasad J Pandit Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index bd4710d16f..c28bfbd44d 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -37,12 +37,11 @@ #define VENDOR_ERR_TOO_MANY_SGES0x202 #define VENDOR_ERR_NOMEM0x203 #define VENDOR_ERR_QP0 0x204 -#define VENDOR_ERR_NO_SGE 0x205 +#define VENDOR_ERR_INV_NUM_SGE 0x205 #define VENDOR_ERR_MAD_SEND 0x206 #define VENDOR_ERR_INVLKEY 0x207 #define VENDOR_ERR_MR_SMALL 0x208 #define VENDOR_ERR_INV_MAD_BUFF 0x209 -#define VENDOR_ERR_INV_NUM_SGE 0x210 #define THR_NAME_LEN 16 #define THR_POLL_TO 5000 -- 2.17.1
[Qemu-devel] [PATCH PULL 26/31] rdma: check num_sge does not exceed MAX_SGE
From: Prasad J Pandit rdma back-end has scatter/gather array ibv_sge[MAX_SGE=4] set to have 4 elements. A guest could send a 'PvrdmaSqWqe' ring element with 'num_sge' set to > MAX_SGE, which may lead to OOB access issue. Add check to avoid it. Reported-by: Saar Amar Signed-off-by: Prasad J Pandit Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index ae1e4dcb29..bd4710d16f 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -476,9 +476,9 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, } pr_dbg("num_sge=%d\n", num_sge); -if (!num_sge) { -pr_dbg("num_sge=0\n"); -complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx); +if (!num_sge || num_sge > MAX_SGE) { +pr_dbg("invalid num_sge=%d\n", num_sge); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx); return; } @@ -603,9 +603,9 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev, } pr_dbg("num_sge=%d\n", num_sge); -if (!num_sge) { -pr_dbg("num_sge=0\n"); -complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx); +if (!num_sge || num_sge > MAX_SGE) { +pr_dbg("invalid num_sge=%d\n", num_sge); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx); return; } -- 2.17.1
[Qemu-devel] [PATCH PULL 22/31] hw/rdma: Do not use bitmap_zero_extend to free bitmap
From: Yuval Shaia bitmap_zero_extend is designed to work for extending, not for shrinking. Using g_free instead. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_rm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c index b7d4ebe972..ca127c8c26 100644 --- a/hw/rdma/rdma_rm.c +++ b/hw/rdma/rdma_rm.c @@ -43,7 +43,7 @@ static inline void res_tbl_free(RdmaRmResTbl *tbl) { qemu_mutex_destroy(>lock); g_free(tbl->tbl); -bitmap_zero_extend(tbl->bitmap, tbl->tbl_sz, 0); +g_free(tbl->bitmap); } static inline void *res_tbl_get(RdmaRmResTbl *tbl, uint32_t handle) -- 2.17.1
[Qemu-devel] [PATCH PULL 21/31] hw/pvrdma: Clean device's resource when system is shutdown
From: Yuval Shaia In order to clean some external resources such as GIDs, QPs etc, register to receive notification when VM is shutdown. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma.h | 2 ++ hw/rdma/vmw/pvrdma_main.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h index 10a3c4fb7c..ffae36986e 100644 --- a/hw/rdma/vmw/pvrdma.h +++ b/hw/rdma/vmw/pvrdma.h @@ -17,6 +17,7 @@ #define PVRDMA_PVRDMA_H #include "qemu/units.h" +#include "qemu/notify.h" #include "hw/pci/pci.h" #include "hw/pci/msix.h" #include "chardev/char-fe.h" @@ -87,6 +88,7 @@ typedef struct PVRDMADev { RdmaDeviceResources rdma_dev_res; CharBackend mad_chr; VMXNET3State *func0; +Notifier shutdown_notifier; } PVRDMADev; #define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME) diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index 150404dfa6..23dc9926e3 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -24,6 +24,7 @@ #include "hw/qdev-properties.h" #include "cpu.h" #include "trace.h" +#include "sysemu/sysemu.h" #include "../rdma_rm.h" #include "../rdma_backend.h" @@ -334,6 +335,9 @@ static void pvrdma_fini(PCIDevice *pdev) if (msix_enabled(pdev)) { uninit_msix(pdev, RDMA_MAX_INTRS); } + +pr_dbg("Device %s %x.%x is down\n", pdev->name, PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn)); } static void pvrdma_stop(PVRDMADev *dev) @@ -559,6 +563,14 @@ static int pvrdma_check_ram_shared(Object *obj, void *opaque) return 0; } +static void pvrdma_shutdown_notifier(Notifier *n, void *opaque) +{ +PVRDMADev *dev = container_of(n, PVRDMADev, shutdown_notifier); +PCIDevice *pci_dev = PCI_DEVICE(dev); + +pvrdma_fini(pci_dev); +} + static void pvrdma_realize(PCIDevice *pdev, Error **errp) { int rc; @@ -632,6 +644,9 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp) goto out; } +dev->shutdown_notifier.notify = pvrdma_shutdown_notifier; +qemu_register_shutdown_notifier(>shutdown_notifier); + out: if (rc) { error_append_hint(errp, "Device fail to load\n"); -- 2.17.1
[Qemu-devel] [PATCH PULL 20/31] vl: Introduce shutdown_notifiers
From: Yuval Shaia Notifier will be used for signaling shutdown event to inform system is shutdown. This will allow devices and other component to run some cleanup code needed before VM is shutdown. Signed-off-by: Yuval Shaia Reviewed-by: Cornelia Huck Signed-off-by: Marcel Apfelbaum --- include/sysemu/sysemu.h | 1 + vl.c| 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index c8efdeb376..e0d15da937 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -62,6 +62,7 @@ void qemu_register_wakeup_support(void); void qemu_system_shutdown_request(ShutdownCause reason); void qemu_system_powerdown_request(void); void qemu_register_powerdown_notifier(Notifier *notifier); +void qemu_register_shutdown_notifier(Notifier *notifier); void qemu_system_debug_request(void); void qemu_system_vmstop_request(RunState reason); void qemu_system_vmstop_request_prepare(void); diff --git a/vl.c b/vl.c index 46ebf813b3..8353d3c718 100644 --- a/vl.c +++ b/vl.c @@ -1577,6 +1577,8 @@ static NotifierList suspend_notifiers = NOTIFIER_LIST_INITIALIZER(suspend_notifiers); static NotifierList wakeup_notifiers = NOTIFIER_LIST_INITIALIZER(wakeup_notifiers); +static NotifierList shutdown_notifiers = +NOTIFIER_LIST_INITIALIZER(shutdown_notifiers); static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE); ShutdownCause qemu_shutdown_requested_get(void) @@ -1828,6 +1830,12 @@ static void qemu_system_powerdown(void) notifier_list_notify(_notifiers, NULL); } +static void qemu_system_shutdown(ShutdownCause cause) +{ +qapi_event_send_shutdown(shutdown_caused_by_guest(cause), cause); +notifier_list_notify(_notifiers, ); +} + void qemu_system_powerdown_request(void) { trace_qemu_system_powerdown_request(); @@ -1840,6 +1848,11 @@ void qemu_register_powerdown_notifier(Notifier *notifier) notifier_list_add(_notifiers, notifier); } +void qemu_register_shutdown_notifier(Notifier *notifier) +{ +notifier_list_add(_notifiers, notifier); +} + void qemu_system_debug_request(void) { debug_requested = 1; @@ -1867,7 +1880,7 @@ static bool main_loop_should_exit(void) request = qemu_shutdown_requested(); if (request) { qemu_kill_report(); -qapi_event_send_shutdown(shutdown_caused_by_guest(request), request); +qemu_system_shutdown(request); if (no_shutdown) { vm_stop(RUN_STATE_SHUTDOWN); } else { -- 2.17.1
[Qemu-devel] [PATCH PULL 29/31] pvrdma: release ring object in case of an error
From: Prasad J Pandit create_cq and create_qp routines allocate ring object, but it's not released in case of an error, leading to memory leakage. Reported-by: Li Qiang Signed-off-by: Prasad J Pandit Reviewed-by: Yuval Shaia Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_cmd.c | 37 ++--- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index f236ac4795..89920887bf 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -313,6 +313,14 @@ out: return rc; } +static void destroy_cq_ring(PvrdmaRing *ring) +{ +pvrdma_ring_free(ring); +/* ring_state was in slot 1, not 0 so need to jump back */ +rdma_pci_dma_unmap(ring->dev, --ring->ring_state, TARGET_PAGE_SIZE); +g_free(ring); +} + static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req, union pvrdma_cmd_resp *rsp) { @@ -335,6 +343,10 @@ static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req, rc = rdma_rm_alloc_cq(>rdma_dev_res, >backend_dev, cmd->cqe, >cq_handle, ring); +if (rc) { +destroy_cq_ring(ring); +} + resp->cqe = cmd->cqe; return rc; @@ -356,10 +368,7 @@ static int destroy_cq(PVRDMADev *dev, union pvrdma_cmd_req *req, } ring = (PvrdmaRing *)cq->opaque; -pvrdma_ring_free(ring); -/* ring_state was in slot 1, not 0 so need to jump back */ -rdma_pci_dma_unmap(PCI_DEVICE(dev), --ring->ring_state, TARGET_PAGE_SIZE); -g_free(ring); +destroy_cq_ring(ring); rdma_rm_dealloc_cq(>rdma_dev_res, cmd->cq_handle); @@ -457,6 +466,17 @@ out: return rc; } +static void destroy_qp_rings(PvrdmaRing *ring) +{ +pr_dbg("sring=%p\n", [0]); +pvrdma_ring_free([0]); +pr_dbg("rring=%p\n", [1]); +pvrdma_ring_free([1]); + +rdma_pci_dma_unmap(ring->dev, ring->ring_state, TARGET_PAGE_SIZE); +g_free(ring); +} + static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req *req, union pvrdma_cmd_resp *rsp) { @@ -486,6 +506,7 @@ static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req *req, cmd->max_recv_sge, cmd->recv_cq_handle, rings, >qpn); if (rc) { +destroy_qp_rings(rings); return rc; } @@ -558,13 +579,7 @@ static int destroy_qp(PVRDMADev *dev, union pvrdma_cmd_req *req, rdma_rm_dealloc_qp(>rdma_dev_res, cmd->qp_handle); ring = (PvrdmaRing *)qp->opaque; -pr_dbg("sring=%p\n", [0]); -pvrdma_ring_free([0]); -pr_dbg("rring=%p\n", [1]); -pvrdma_ring_free([1]); - -rdma_pci_dma_unmap(PCI_DEVICE(dev), ring->ring_state, TARGET_PAGE_SIZE); -g_free(ring); +destroy_qp_rings(ring); return 0; } -- 2.17.1
[Qemu-devel] [PATCH PULL 19/31] hw/rdma: Remove unneeded code that handles more that one port
From: Yuval Shaia Device supports only one port, let's remove a dead code that handles more than one port. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_rm.c | 34 -- hw/rdma/rdma_rm.h | 2 +- hw/rdma/rdma_rm_defs.h | 4 ++-- 3 files changed, 19 insertions(+), 21 deletions(-) diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c index 250254561c..b7d4ebe972 100644 --- a/hw/rdma/rdma_rm.c +++ b/hw/rdma/rdma_rm.c @@ -545,7 +545,7 @@ int rdma_rm_add_gid(RdmaDeviceResources *dev_res, RdmaBackendDev *backend_dev, return -EINVAL; } -memcpy(_res->ports[0].gid_tbl[gid_idx].gid, gid, sizeof(*gid)); +memcpy(_res->port.gid_tbl[gid_idx].gid, gid, sizeof(*gid)); return 0; } @@ -556,15 +556,15 @@ int rdma_rm_del_gid(RdmaDeviceResources *dev_res, RdmaBackendDev *backend_dev, int rc; rc = rdma_backend_del_gid(backend_dev, ifname, - _res->ports[0].gid_tbl[gid_idx].gid); + _res->port.gid_tbl[gid_idx].gid); if (rc) { pr_dbg("Fail to delete gid\n"); return -EINVAL; } -memset(dev_res->ports[0].gid_tbl[gid_idx].gid.raw, 0, - sizeof(dev_res->ports[0].gid_tbl[gid_idx].gid)); -dev_res->ports[0].gid_tbl[gid_idx].backend_gid_index = -1; +memset(dev_res->port.gid_tbl[gid_idx].gid.raw, 0, + sizeof(dev_res->port.gid_tbl[gid_idx].gid)); +dev_res->port.gid_tbl[gid_idx].backend_gid_index = -1; return 0; } @@ -577,16 +577,16 @@ int rdma_rm_get_backend_gid_index(RdmaDeviceResources *dev_res, return -EINVAL; } -if (unlikely(dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index == -1)) { -dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index = +if (unlikely(dev_res->port.gid_tbl[sgid_idx].backend_gid_index == -1)) { +dev_res->port.gid_tbl[sgid_idx].backend_gid_index = rdma_backend_get_gid_index(backend_dev, - _res->ports[0].gid_tbl[sgid_idx].gid); + _res->port.gid_tbl[sgid_idx].gid); } pr_dbg("backend_gid_index=%d\n", - dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index); + dev_res->port.gid_tbl[sgid_idx].backend_gid_index); -return dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index; +return dev_res->port.gid_tbl[sgid_idx].backend_gid_index; } static void destroy_qp_hash_key(gpointer data) @@ -596,15 +596,13 @@ static void destroy_qp_hash_key(gpointer data) static void init_ports(RdmaDeviceResources *dev_res) { -int i, j; +int i; -memset(dev_res->ports, 0, sizeof(dev_res->ports)); +memset(_res->port, 0, sizeof(dev_res->port)); -for (i = 0; i < MAX_PORTS; i++) { -dev_res->ports[i].state = IBV_PORT_DOWN; -for (j = 0; j < MAX_PORT_GIDS; j++) { -dev_res->ports[i].gid_tbl[j].backend_gid_index = -1; -} +dev_res->port.state = IBV_PORT_DOWN; +for (i = 0; i < MAX_PORT_GIDS; i++) { +dev_res->port.gid_tbl[i].backend_gid_index = -1; } } @@ -613,7 +611,7 @@ static void fini_ports(RdmaDeviceResources *dev_res, { int i; -dev_res->ports[0].state = IBV_PORT_DOWN; +dev_res->port.state = IBV_PORT_DOWN; for (i = 0; i < MAX_PORT_GIDS; i++) { rdma_rm_del_gid(dev_res, backend_dev, ifname, i); } diff --git a/hw/rdma/rdma_rm.h b/hw/rdma/rdma_rm.h index a7169b4e89..3c602c04c0 100644 --- a/hw/rdma/rdma_rm.h +++ b/hw/rdma/rdma_rm.h @@ -79,7 +79,7 @@ int rdma_rm_get_backend_gid_index(RdmaDeviceResources *dev_res, static inline union ibv_gid *rdma_rm_get_gid(RdmaDeviceResources *dev_res, int sgid_idx) { -return _res->ports[0].gid_tbl[sgid_idx].gid; +return _res->port.gid_tbl[sgid_idx].gid; } #endif diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h index 7b3435f991..0ba61d1838 100644 --- a/hw/rdma/rdma_rm_defs.h +++ b/hw/rdma/rdma_rm_defs.h @@ -18,7 +18,7 @@ #include "rdma_backend_defs.h" -#define MAX_PORTS 1 +#define MAX_PORTS 1 /* Do not change - we support only one port */ #define MAX_PORT_GIDS 255 #define MAX_GIDS MAX_PORT_GIDS #define MAX_PORT_PKEYS1 @@ -97,7 +97,7 @@ typedef struct RdmaRmPort { } RdmaRmPort; typedef struct RdmaDeviceResources { -RdmaRmPort ports[MAX_PORTS]; +RdmaRmPort port; RdmaRmResTbl pd_tbl; RdmaRmResTbl mr_tbl; RdmaRmResTbl uc_tbl; -- 2.17.1
[Qemu-devel] [PATCH PULL 18/31] hw/pvrdma: Fill error code in command's response
From: Yuval Shaia Driver checks error code let's set it. In addition, for code simplification purposes, set response's fields ack, response and err outside of the scope of command handlers. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_cmd.c | 197 ++- 1 file changed, 90 insertions(+), 107 deletions(-) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index 0d3c818c20..3b94545761 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -128,6 +128,9 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req, struct pvrdma_port_attr attrs = {0}; pr_dbg("port=%d\n", cmd->port_num); +if (cmd->port_num > MAX_PORTS) { +return -EINVAL; +} if (rdma_backend_query_port(>backend_dev, (struct ibv_port_attr *))) { @@ -135,9 +138,6 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req, } memset(resp, 0, sizeof(*resp)); -resp->hdr.response = cmd->hdr.response; -resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP; -resp->hdr.err = 0; resp->attrs.state = dev->func0->device_active ? attrs.state : PVRDMA_PORT_DOWN; @@ -160,12 +160,16 @@ static int query_pkey(PVRDMADev *dev, union pvrdma_cmd_req *req, struct pvrdma_cmd_query_pkey_resp *resp = >query_pkey_resp; pr_dbg("port=%d\n", cmd->port_num); +if (cmd->port_num > MAX_PORTS) { +return -EINVAL; +} + pr_dbg("index=%d\n", cmd->index); +if (cmd->index > MAX_PKEYS) { +return -EINVAL; +} memset(resp, 0, sizeof(*resp)); -resp->hdr.response = cmd->hdr.response; -resp->hdr.ack = PVRDMA_CMD_QUERY_PKEY_RESP; -resp->hdr.err = 0; resp->pkey = PVRDMA_PKEY; pr_dbg("pkey=0x%x\n", resp->pkey); @@ -178,17 +182,15 @@ static int create_pd(PVRDMADev *dev, union pvrdma_cmd_req *req, { struct pvrdma_cmd_create_pd *cmd = >create_pd; struct pvrdma_cmd_create_pd_resp *resp = >create_pd_resp; +int rc; pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0); memset(resp, 0, sizeof(*resp)); -resp->hdr.response = cmd->hdr.response; -resp->hdr.ack = PVRDMA_CMD_CREATE_PD_RESP; -resp->hdr.err = rdma_rm_alloc_pd(>rdma_dev_res, >backend_dev, - >pd_handle, cmd->ctx_handle); +rc = rdma_rm_alloc_pd(>rdma_dev_res, >backend_dev, + >pd_handle, cmd->ctx_handle); -pr_dbg("ret=%d\n", resp->hdr.err); -return resp->hdr.err; +return rc; } static int destroy_pd(PVRDMADev *dev, union pvrdma_cmd_req *req, @@ -210,10 +212,9 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req *req, struct pvrdma_cmd_create_mr_resp *resp = >create_mr_resp; PCIDevice *pci_dev = PCI_DEVICE(dev); void *host_virt = NULL; +int rc = 0; memset(resp, 0, sizeof(*resp)); -resp->hdr.response = cmd->hdr.response; -resp->hdr.ack = PVRDMA_CMD_CREATE_MR_RESP; pr_dbg("pd_handle=%d\n", cmd->pd_handle); pr_dbg("access_flags=0x%x\n", cmd->access_flags); @@ -224,22 +225,18 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req *req, cmd->length); if (!host_virt) { pr_dbg("Failed to map to pdir\n"); -resp->hdr.err = -EINVAL; -goto out; +return -EINVAL; } } -resp->hdr.err = rdma_rm_alloc_mr(>rdma_dev_res, cmd->pd_handle, - cmd->start, cmd->length, host_virt, - cmd->access_flags, >mr_handle, - >lkey, >rkey); -if (resp->hdr.err && host_virt) { +rc = rdma_rm_alloc_mr(>rdma_dev_res, cmd->pd_handle, cmd->start, + cmd->length, host_virt, cmd->access_flags, + >mr_handle, >lkey, >rkey); +if (rc && host_virt) { munmap(host_virt, cmd->length); } -out: -pr_dbg("ret=%d\n", resp->hdr.err); -return resp->hdr.err; +return rc; } static int destroy_mr(PVRDMADev *dev, union pvrdma_cmd_req *req, @@ -317,28 +314,25 @@ static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req, struct pvrdma_cmd_create_cq *cmd = >create_cq; struct pvrdma_cmd_create_cq_resp *resp = >create_cq_resp; PvrdmaRing *ring = NULL; +int rc; memset(resp, 0, sizeof(*resp)); -resp->hdr.response = cmd->hdr.response; -resp->hdr.ack = PVRDMA_CMD_CREATE_CQ_RESP; resp->cqe = cmd->cqe; -resp->hdr.err = create_cq_ring(PCI_DEVICE(dev), , cmd->pdir_dma, - cmd->nchunks, cmd->cqe); -if (resp->hdr.err) { -goto out; +rc = create_cq_ring(PCI_DEVICE(dev), , cmd->pdir_dma, cmd->nchunks, +cmd->cqe); +if (rc) { +
[Qemu-devel] [PATCH PULL 16/31] hw/pvrdma: Make device state depend on Ethernet function state
From: Yuval Shaia User should be able to control the device by changing Ethernet function state so if user runs 'ifconfig ens3 down' the PVRDMA function should be down as well. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_cmd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index 2979582fac..0d3c818c20 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -139,7 +139,8 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req, resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP; resp->hdr.err = 0; -resp->attrs.state = attrs.state; +resp->attrs.state = dev->func0->device_active ? attrs.state : +PVRDMA_PORT_DOWN; resp->attrs.max_mtu = attrs.max_mtu; resp->attrs.active_mtu = attrs.active_mtu; resp->attrs.phys_state = attrs.phys_state; -- 2.17.1
[Qemu-devel] [PATCH PULL 12/31] hw/pvrdma: Add support to allow guest to configure GID table
From: Yuval Shaia The control over the RDMA device's GID table is done by updating the device's Ethernet function addresses. Usually the first GID entry is determined by the MAC address, the second by the first IPv6 address and the third by the IPv4 address. Other entries can be added by adding more IP addresses. The opposite is the same, i.e. whenever an address is removed, the corresponding GID entry is removed. The process is done by the network and RDMA stacks. Whenever an address is added the ib_core driver is notified and calls the device driver add_gid function which in turn update the device. To support this in pvrdma device we need to hook into the create_bind and destroy_bind HW commands triggered by pvrdma driver in guest. Whenever a change is made to the pvrdma port's GID table a special QMP message is sent to be processed by libvirt to update the address of the backend Ethernet device. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 344 +--- hw/rdma/rdma_backend.h | 22 +-- hw/rdma/rdma_backend_defs.h | 11 +- hw/rdma/rdma_rm.c | 104 ++- hw/rdma/rdma_rm.h | 17 +- hw/rdma/rdma_rm_defs.h | 9 +- hw/rdma/rdma_utils.h| 16 ++ hw/rdma/vmw/pvrdma.h| 2 +- hw/rdma/vmw/pvrdma_cmd.c| 55 +++--- hw/rdma/vmw/pvrdma_main.c | 25 +-- hw/rdma/vmw/pvrdma_qp_ops.c | 20 +++ 11 files changed, 462 insertions(+), 163 deletions(-) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index c6dedda555..1d496bbd95 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -15,15 +15,18 @@ #include "qemu/osdep.h" #include "qemu/error-report.h" +#include "sysemu/sysemu.h" #include "qapi/error.h" #include "qapi/qmp/qlist.h" #include "qapi/qmp/qnum.h" +#include "qapi/qapi-events-rdma.h" #include #include #include #include +#include "contrib/rdmacm-mux/rdmacm-mux.h" #include "trace.h" #include "rdma_utils.h" #include "rdma_rm.h" @@ -160,6 +163,77 @@ static void *comp_handler_thread(void *arg) return NULL; } +static inline void disable_rdmacm_mux_async(RdmaBackendDev *backend_dev) +{ +atomic_set(_dev->rdmacm_mux.can_receive, 0); +} + +static inline void enable_rdmacm_mux_async(RdmaBackendDev *backend_dev) +{ +atomic_set(_dev->rdmacm_mux.can_receive, sizeof(RdmaCmMuxMsg)); +} + +static inline int rdmacm_mux_can_process_async(RdmaBackendDev *backend_dev) +{ +return atomic_read(_dev->rdmacm_mux.can_receive); +} + +static int check_mux_op_status(CharBackend *mad_chr_be) +{ +RdmaCmMuxMsg msg = {0}; +int ret; + +pr_dbg("Reading response\n"); +ret = qemu_chr_fe_read_all(mad_chr_be, (uint8_t *), sizeof(msg)); +if (ret != sizeof(msg)) { +pr_dbg("Invalid message size %d, expecting %ld\n", ret, sizeof(msg)); +return -EIO; +} + +pr_dbg("msg_type=%d\n", msg.hdr.msg_type); +pr_dbg("op_code=%d\n", msg.hdr.op_code); +pr_dbg("err_code=%d\n", msg.hdr.err_code); + +if (msg.hdr.msg_type != RDMACM_MUX_MSG_TYPE_RESP) { +pr_dbg("Invalid message type %d\n", msg.hdr.msg_type); +return -EIO; +} + +if (msg.hdr.err_code != RDMACM_MUX_ERR_CODE_OK) { +pr_dbg("Operation failed in mux, error code %d\n", msg.hdr.err_code); +return -EIO; +} + +return 0; +} + +static int exec_rdmacm_mux_req(RdmaBackendDev *backend_dev, RdmaCmMuxMsg *msg) +{ +int rc = 0; + +pr_dbg("Executing request %d\n", msg->hdr.op_code); + +msg->hdr.msg_type = RDMACM_MUX_MSG_TYPE_REQ; +disable_rdmacm_mux_async(backend_dev); +rc = qemu_chr_fe_write(backend_dev->rdmacm_mux.chr_be, + (const uint8_t *)msg, sizeof(*msg)); +if (rc != sizeof(*msg)) { +enable_rdmacm_mux_async(backend_dev); +pr_dbg("Fail to send request to rdmacm_mux (rc=%d)\n", rc); +return -EIO; +} + +rc = check_mux_op_status(backend_dev->rdmacm_mux.chr_be); +if (rc) { +pr_dbg("Fail to execute rdmacm_mux request %d (rc=%d)\n", + msg->hdr.op_code, rc); +} + +enable_rdmacm_mux_async(backend_dev); + +return 0; +} + static void stop_backend_thread(RdmaBackendThread *thread) { thread->run = false; @@ -300,11 +374,11 @@ static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res, return 0; } -static int mad_send(RdmaBackendDev *backend_dev, struct ibv_sge *sge, -uint32_t num_sge) +static int mad_send(RdmaBackendDev *backend_dev, uint8_t sgid_idx, +union ibv_gid *sgid, struct ibv_sge *sge, uint32_t num_sge) { -struct backend_umad umad = {0}; -char *hdr, *msg; +RdmaCmMuxMsg msg = {0}; +char *hdr, *data; int ret; pr_dbg("num_sge=%d\n", num_sge); @@ -313,26 +387,31 @@ static int mad_send(RdmaBackendDev *backend_dev, struct ibv_sge *sge, return -EINVAL; } -
[Qemu-devel] [PATCH PULL 15/31] hw/rdma: Initialize node_guid from vmxnet3 mac address
From: Yuval Shaia node_guid should be set once device is load. Make node_guid be GID format (32 bit) of PCI function 0 vmxnet3 device's MAC. A new function was added to do the conversion. So for example the MAC 56:b6:44:e9:62:dc will be converted to GID 54b6:44ff:fee9:62dc. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_utils.h | 9 + hw/rdma/vmw/pvrdma_cmd.c | 10 -- hw/rdma/vmw/pvrdma_main.c | 5 - 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/hw/rdma/rdma_utils.h b/hw/rdma/rdma_utils.h index 062e2cd688..4490ea0b94 100644 --- a/hw/rdma/rdma_utils.h +++ b/hw/rdma/rdma_utils.h @@ -63,4 +63,13 @@ extern unsigned long pr_dbg_cnt; void *rdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen); void rdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len); +static inline void addrconf_addr_eui48(uint8_t *eui, const char *addr) +{ +memcpy(eui, addr, 3); +eui[3] = 0xFF; +eui[4] = 0xFE; +memcpy(eui + 5, addr + 3, 3); +eui[0] ^= 2; +} + #endif diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index a334f6205e..2979582fac 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -592,16 +592,6 @@ static int create_bind(PVRDMADev *dev, union pvrdma_cmd_req *req, return -EINVAL; } -/* TODO: Since drivers stores node_guid at load_dsr phase then this - * assignment is not relevant, i need to figure out a way how to - * retrieve MAC of our netdev */ -if (!cmd->index) { -dev->node_guid = -dev->rdma_dev_res.ports[0].gid_tbl[0].gid.global.interface_id; -pr_dbg("dev->node_guid=0x%llx\n", - (long long unsigned int)be64_to_cpu(dev->node_guid)); -} - return 0; } diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index b35b5dc5f0..150404dfa6 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -264,7 +264,7 @@ static void init_dsr_dev_caps(PVRDMADev *dev) dsr->caps.sys_image_guid = 0; pr_dbg("sys_image_guid=%" PRIx64 "\n", dsr->caps.sys_image_guid); -dsr->caps.node_guid = cpu_to_be64(dev->node_guid); +dsr->caps.node_guid = dev->node_guid; pr_dbg("node_guid=%" PRIx64 "\n", be64_to_cpu(dsr->caps.node_guid)); dsr->caps.phys_port_cnt = MAX_PORTS; @@ -588,6 +588,9 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp) } dev->func0 = VMXNET3(func0); +addrconf_addr_eui48((unsigned char *)>node_guid, +(const char *)>func0->conf.macaddr.a); + memdev_root = object_resolve_path("/objects", NULL); if (memdev_root) { object_child_foreach(memdev_root, pvrdma_check_ram_shared, _shared); -- 2.17.1
[Qemu-devel] [PATCH PULL 07/31] hw/pvrdma: Make function reset_device return void
From: Yuval Shaia This function cannot fail - fix it to return void Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index 6c8c0154fa..fc2abd34af 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -369,13 +369,11 @@ static int unquiesce_device(PVRDMADev *dev) return 0; } -static int reset_device(PVRDMADev *dev) +static void reset_device(PVRDMADev *dev) { pvrdma_stop(dev); pr_dbg("Device reset complete\n"); - -return 0; } static uint64_t regs_read(void *opaque, hwaddr addr, unsigned size) -- 2.17.1
[Qemu-devel] [PATCH PULL 14/31] hw/pvrdma: Make sure PCI function 0 is vmxnet3
From: Yuval Shaia Guest driver enforces it, we should also. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma.h | 2 ++ hw/rdma/vmw/pvrdma_main.c | 12 2 files changed, 14 insertions(+) diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h index b019cb843a..10a3c4fb7c 100644 --- a/hw/rdma/vmw/pvrdma.h +++ b/hw/rdma/vmw/pvrdma.h @@ -20,6 +20,7 @@ #include "hw/pci/pci.h" #include "hw/pci/msix.h" #include "chardev/char-fe.h" +#include "hw/net/vmxnet3_defs.h" #include "../rdma_backend_defs.h" #include "../rdma_rm_defs.h" @@ -85,6 +86,7 @@ typedef struct PVRDMADev { RdmaBackendDev backend_dev; RdmaDeviceResources rdma_dev_res; CharBackend mad_chr; +VMXNET3State *func0; } PVRDMADev; #define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME) diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c index ac8c092db0..b35b5dc5f0 100644 --- a/hw/rdma/vmw/pvrdma_main.c +++ b/hw/rdma/vmw/pvrdma_main.c @@ -565,6 +565,7 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp) PVRDMADev *dev = PVRDMA_DEV(pdev); Object *memdev_root; bool ram_shared = false; +PCIDevice *func0; init_pr_dbg(); @@ -576,6 +577,17 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp) return; } +func0 = pci_get_function_0(pdev); +/* Break if not vmxnet3 device in slot 0 */ +if (strcmp(object_get_typename(>qdev.parent_obj), TYPE_VMXNET3)) { +pr_dbg("func0 type is %s\n", + object_get_typename(>qdev.parent_obj)); +error_setg(errp, "Device on %x.0 must be %s", PCI_SLOT(pdev->devfn), + TYPE_VMXNET3); +return; +} +dev->func0 = VMXNET3(func0); + memdev_root = object_resolve_path("/objects", NULL); if (memdev_root) { object_child_foreach(memdev_root, pvrdma_check_ram_shared, _shared); -- 2.17.1
[Qemu-devel] [PATCH PULL 17/31] hw/pvrdma: Fill all CQE fields
From: Yuval Shaia Add ability to pass specific WC attributes to CQE such as GRH_BIT flag. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 59 +++-- hw/rdma/rdma_backend.h | 4 +-- hw/rdma/vmw/pvrdma_qp_ops.c | 31 +++ 3 files changed, 58 insertions(+), 36 deletions(-) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index 1d496bbd95..ae1e4dcb29 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -60,13 +60,24 @@ struct backend_umad { char mad[RDMA_MAX_PRIVATE_DATA]; }; -static void (*comp_handler)(int status, unsigned int vendor_err, void *ctx); +static void (*comp_handler)(void *ctx, struct ibv_wc *wc); -static void dummy_comp_handler(int status, unsigned int vendor_err, void *ctx) +static void dummy_comp_handler(void *ctx, struct ibv_wc *wc) { pr_err("No completion handler is registered\n"); } +static inline void complete_work(enum ibv_wc_status status, uint32_t vendor_err, + void *ctx) +{ +struct ibv_wc wc = {0}; + +wc.status = status; +wc.vendor_err = vendor_err; + +comp_handler(ctx, ); +} + static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq) { int i, ne; @@ -91,7 +102,7 @@ static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq) } pr_dbg("Processing %s CQE\n", bctx->is_tx_req ? "send" : "recv"); -comp_handler(wc[i].status, wc[i].vendor_err, bctx->up_ctx); +comp_handler(bctx->up_ctx, [i]); rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id); g_free(bctx); @@ -256,8 +267,8 @@ static void start_comp_thread(RdmaBackendDev *backend_dev) comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED); } -void rdma_backend_register_comp_handler(void (*handler)(int status, -unsigned int vendor_err, void *ctx)) +void rdma_backend_register_comp_handler(void (*handler)(void *ctx, + struct ibv_wc *wc)) { comp_handler = handler; } @@ -451,14 +462,14 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, if (!qp->ibqp) { /* This field does not get initialized for QP0 and QP1 */ if (qp_type == IBV_QPT_SMI) { pr_dbg("QP0 unsupported\n"); -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx); } else if (qp_type == IBV_QPT_GSI) { pr_dbg("QP1\n"); rc = mad_send(backend_dev, sgid_idx, sgid, sge, num_sge); if (rc) { -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx); } else { -comp_handler(IBV_WC_SUCCESS, 0, ctx); +complete_work(IBV_WC_SUCCESS, 0, ctx); } } return; @@ -467,7 +478,7 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, pr_dbg("num_sge=%d\n", num_sge); if (!num_sge) { pr_dbg("num_sge=0\n"); -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx); return; } @@ -478,21 +489,21 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, rc = rdma_rm_alloc_cqe_ctx(backend_dev->rdma_dev_res, _id, bctx); if (unlikely(rc)) { pr_dbg("Failed to allocate cqe_ctx\n"); -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx); goto out_free_bctx; } rc = build_host_sge_array(backend_dev->rdma_dev_res, new_sge, sge, num_sge); if (rc) { pr_dbg("Error: Failed to build host SGE array\n"); -comp_handler(IBV_WC_GENERAL_ERR, rc, ctx); +complete_work(IBV_WC_GENERAL_ERR, rc, ctx); goto out_dealloc_cqe_ctx; } if (qp_type == IBV_QPT_UD) { wr.wr.ud.ah = create_ah(backend_dev, qp->ibpd, sgid_idx, dgid); if (!wr.wr.ud.ah) { -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx); goto out_dealloc_cqe_ctx; } wr.wr.ud.remote_qpn = dqpn; @@ -510,7 +521,7 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, if (rc) { pr_dbg("Fail (%d, %d) to post send WQE to qpn %d\n", rc, errno, qp->ibqp->qp_num); -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx); +complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx); goto out_dealloc_cqe_ctx; } @@ -579,13 +590,13 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
[Qemu-devel] [PATCH PULL 10/31] hw/pvrdma: Set the correct opcode for send completion
From: Yuval Shaia opcode for WC should be set by the device and not taken from work element. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_qp_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c index 7b0f440fda..3388be1926 100644 --- a/hw/rdma/vmw/pvrdma_qp_ops.c +++ b/hw/rdma/vmw/pvrdma_qp_ops.c @@ -154,7 +154,7 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle) comp_ctx->cq_handle = qp->send_cq_handle; comp_ctx->cqe.wr_id = wqe->hdr.wr_id; comp_ctx->cqe.qp = qp_handle; -comp_ctx->cqe.opcode = wqe->hdr.opcode; +comp_ctx->cqe.opcode = IBV_WC_SEND; rdma_backend_post_send(>backend_dev, >backend_qp, qp->qp_type, (struct ibv_sge *)>sge[0], wqe->hdr.num_sge, -- 2.17.1
[Qemu-devel] [PATCH PULL 13/31] vmxnet3: Move some definitions to header file
From: Yuval Shaia pvrdma setup requires vmxnet3 device on PCI function 0 and PVRDMA device on PCI function 1. pvrdma device needs to access vmxnet3 device object for several reasons: 1. Make sure PCI function 0 is vmxnet3. 2. To monitor vmxnet3 device state. 3. To configure node_guid accoring to vmxnet3 device's MAC address. To be able to access vmxnet3 device the definition of VMXNET3State is moved to a new header file. Signed-off-by: Yuval Shaia Reviewed-by: Dmitry Fleytman Signed-off-by: Marcel Apfelbaum --- hw/net/vmxnet3.c | 116 +--- hw/net/vmxnet3_defs.h | 133 ++ 2 files changed, 134 insertions(+), 115 deletions(-) create mode 100644 hw/net/vmxnet3_defs.h diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c index 3648630386..54746a4030 100644 --- a/hw/net/vmxnet3.c +++ b/hw/net/vmxnet3.c @@ -18,7 +18,6 @@ #include "qemu/osdep.h" #include "hw/hw.h" #include "hw/pci/pci.h" -#include "net/net.h" #include "net/tap.h" #include "net/checksum.h" #include "sysemu/sysemu.h" @@ -29,6 +28,7 @@ #include "migration/register.h" #include "vmxnet3.h" +#include "vmxnet3_defs.h" #include "vmxnet_debug.h" #include "vmware_utils.h" #include "net_tx_pkt.h" @@ -131,23 +131,11 @@ typedef struct VMXNET3Class { DeviceRealize parent_dc_realize; } VMXNET3Class; -#define TYPE_VMXNET3 "vmxnet3" -#define VMXNET3(obj) OBJECT_CHECK(VMXNET3State, (obj), TYPE_VMXNET3) - #define VMXNET3_DEVICE_CLASS(klass) \ OBJECT_CLASS_CHECK(VMXNET3Class, (klass), TYPE_VMXNET3) #define VMXNET3_DEVICE_GET_CLASS(obj) \ OBJECT_GET_CLASS(VMXNET3Class, (obj), TYPE_VMXNET3) -/* Cyclic ring abstraction */ -typedef struct { -hwaddr pa; -uint32_t size; -uint32_t cell_size; -uint32_t next; -uint8_t gen; -} Vmxnet3Ring; - static inline void vmxnet3_ring_init(PCIDevice *d, Vmxnet3Ring *ring, hwaddr pa, @@ -245,108 +233,6 @@ vmxnet3_dump_rx_descr(struct Vmxnet3_RxDesc *descr) descr->rsvd, descr->dtype, descr->ext1, descr->btype); } -/* Device state and helper functions */ -#define VMXNET3_RX_RINGS_PER_QUEUE (2) - -typedef struct { -Vmxnet3Ring tx_ring; -Vmxnet3Ring comp_ring; - -uint8_t intr_idx; -hwaddr tx_stats_pa; -struct UPT1_TxStats txq_stats; -} Vmxnet3TxqDescr; - -typedef struct { -Vmxnet3Ring rx_ring[VMXNET3_RX_RINGS_PER_QUEUE]; -Vmxnet3Ring comp_ring; -uint8_t intr_idx; -hwaddr rx_stats_pa; -struct UPT1_RxStats rxq_stats; -} Vmxnet3RxqDescr; - -typedef struct { -bool is_masked; -bool is_pending; -bool is_asserted; -} Vmxnet3IntState; - -typedef struct { -PCIDevice parent_obj; -NICState *nic; -NICConf conf; -MemoryRegion bar0; -MemoryRegion bar1; -MemoryRegion msix_bar; - -Vmxnet3RxqDescr rxq_descr[VMXNET3_DEVICE_MAX_RX_QUEUES]; -Vmxnet3TxqDescr txq_descr[VMXNET3_DEVICE_MAX_TX_QUEUES]; - -/* Whether MSI-X support was installed successfully */ -bool msix_used; -hwaddr drv_shmem; -hwaddr temp_shared_guest_driver_memory; - -uint8_t txq_num; - -/* This boolean tells whether RX packet being indicated has to */ -/* be split into head and body chunks from different RX rings */ -bool rx_packets_compound; - -bool rx_vlan_stripping; -bool lro_supported; - -uint8_t rxq_num; - -/* Network MTU */ -uint32_t mtu; - -/* Maximum number of fragments for indicated TX packets */ -uint32_t max_tx_frags; - -/* Maximum number of fragments for indicated RX packets */ -uint16_t max_rx_frags; - -/* Index for events interrupt */ -uint8_t event_int_idx; - -/* Whether automatic interrupts masking enabled */ -bool auto_int_masking; - -bool peer_has_vhdr; - -/* TX packets to QEMU interface */ -struct NetTxPkt *tx_pkt; -uint32_t offload_mode; -uint32_t cso_or_gso_size; -uint16_t tci; -bool needs_vlan; - -struct NetRxPkt *rx_pkt; - -bool tx_sop; -bool skip_current_tx_pkt; - -uint32_t device_active; -uint32_t last_command; - -uint32_t link_status_and_speed; - -Vmxnet3IntState interrupt_states[VMXNET3_MAX_INTRS]; - -uint32_t temp_mac; /* To store the low part first */ - -MACAddr perm_mac; -uint32_t vlan_table[VMXNET3_VFT_SIZE]; -uint32_t rx_mode; -MACAddr *mcast_list; -uint32_t mcast_list_len; -uint32_t mcast_list_buff_size; /* needed for live migration. */ - -/* Compatibility flags for migration */ -uint32_t compat_flags; -} VMXNET3State; - /* Interrupt management */ /* diff --git a/hw/net/vmxnet3_defs.h b/hw/net/vmxnet3_defs.h new file mode 100644 index 00..6c19d29b12 --- /dev/null
[Qemu-devel] [PATCH PULL 06/31] hw/rdma: Add support for MAD packets
From: Yuval Shaia MAD (Management Datagram) packets are widely used by various modules both in kernel and in user space for example the rdma_* API which is used to create and maintain "connection" layer on top of RDMA uses several types of MAD packets. For more information please refer to chapter 13.4 in Volume 1 Architecture Specification, Release 1.1 available here: https://www.infinibandta.org/ibta-specifications-download/ To support MAD packets the device uses an external utility (contrib/rdmacm-mux) to relay packets from and to the guest driver. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 250 +++- hw/rdma/rdma_backend.h | 4 +- hw/rdma/rdma_backend_defs.h | 10 +- hw/rdma/vmw/pvrdma.h| 2 + hw/rdma/vmw/pvrdma_main.c | 4 +- 5 files changed, 260 insertions(+), 10 deletions(-) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index 1e148398a2..c6dedda555 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -16,8 +16,13 @@ #include "qemu/osdep.h" #include "qemu/error-report.h" #include "qapi/error.h" +#include "qapi/qmp/qlist.h" +#include "qapi/qmp/qnum.h" #include +#include +#include +#include #include "trace.h" #include "rdma_utils.h" @@ -33,16 +38,25 @@ #define VENDOR_ERR_MAD_SEND 0x206 #define VENDOR_ERR_INVLKEY 0x207 #define VENDOR_ERR_MR_SMALL 0x208 +#define VENDOR_ERR_INV_MAD_BUFF 0x209 +#define VENDOR_ERR_INV_NUM_SGE 0x210 #define THR_NAME_LEN 16 #define THR_POLL_TO 5000 +#define MAD_HDR_SIZE sizeof(struct ibv_grh) + typedef struct BackendCtx { -uint64_t req_id; void *up_ctx; bool is_tx_req; +struct ibv_sge sge; /* Used to save MAD recv buffer */ } BackendCtx; +struct backend_umad { +struct ib_user_mad hdr; +char mad[RDMA_MAX_PRIVATE_DATA]; +}; + static void (*comp_handler)(int status, unsigned int vendor_err, void *ctx); static void dummy_comp_handler(int status, unsigned int vendor_err, void *ctx) @@ -286,6 +300,61 @@ static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res, return 0; } +static int mad_send(RdmaBackendDev *backend_dev, struct ibv_sge *sge, +uint32_t num_sge) +{ +struct backend_umad umad = {0}; +char *hdr, *msg; +int ret; + +pr_dbg("num_sge=%d\n", num_sge); + +if (num_sge != 2) { +return -EINVAL; +} + +umad.hdr.length = sge[0].length + sge[1].length; +pr_dbg("msg_len=%d\n", umad.hdr.length); + +if (umad.hdr.length > sizeof(umad.mad)) { +return -ENOMEM; +} + +umad.hdr.addr.qpn = htobe32(1); +umad.hdr.addr.grh_present = 1; +umad.hdr.addr.gid_index = backend_dev->backend_gid_idx; +memcpy(umad.hdr.addr.gid, backend_dev->gid.raw, sizeof(umad.hdr.addr.gid)); +umad.hdr.addr.hop_limit = 0xFF; + +hdr = rdma_pci_dma_map(backend_dev->dev, sge[0].addr, sge[0].length); +if (!hdr) { +pr_dbg("Fail to map to sge[0]\n"); +return -ENOMEM; +} +msg = rdma_pci_dma_map(backend_dev->dev, sge[1].addr, sge[1].length); +if (!msg) { +pr_dbg("Fail to map to sge[1]\n"); +rdma_pci_dma_unmap(backend_dev->dev, hdr, sge[0].length); +return -ENOMEM; +} + +pr_dbg_buf("mad_hdr", hdr, sge[0].length); +pr_dbg_buf("mad_data", data, sge[1].length); + +memcpy([0], hdr, sge[0].length); +memcpy([sge[0].length], msg, sge[1].length); + +rdma_pci_dma_unmap(backend_dev->dev, msg, sge[1].length); +rdma_pci_dma_unmap(backend_dev->dev, hdr, sge[0].length); + +ret = qemu_chr_fe_write(backend_dev->mad_chr_be, (const uint8_t *), +sizeof(umad)); + +pr_dbg("qemu_chr_fe_write=%d\n", ret); + +return (ret != sizeof(umad)); +} + void rdma_backend_post_send(RdmaBackendDev *backend_dev, RdmaBackendQP *qp, uint8_t qp_type, struct ibv_sge *sge, uint32_t num_sge, @@ -304,9 +373,13 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx); } else if (qp_type == IBV_QPT_GSI) { pr_dbg("QP1\n"); -comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx); +rc = mad_send(backend_dev, sge, num_sge); +if (rc) { +comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx); +} else { +comp_handler(IBV_WC_SUCCESS, 0, ctx); +} } -pr_dbg("qp->ibqp is NULL for qp_type %d!!!\n", qp_type); return; } @@ -370,6 +443,48 @@ out_free_bctx: g_free(bctx); } +static unsigned int save_mad_recv_buffer(RdmaBackendDev *backend_dev, + struct ibv_sge *sge, uint32_t num_sge, + void *ctx) +{ +BackendCtx *bctx; +int
[Qemu-devel] [PATCH PULL 03/31] hw/rdma: Add ability to force notification without re-arm
From: Yuval Shaia Upon completion of incoming packet the device pushes CQE to driver's RX ring and notify the driver (msix). While for data-path incoming packets the driver needs the ability to control whether it wished to receive interrupts or not, for control-path packets such as incoming MAD the driver needs to be notified anyway, it even do not need to re-arm the notification bit. Enhance the notification field to support this. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_rm.c | 12 ++-- hw/rdma/rdma_rm_defs.h | 8 +++- hw/rdma/vmw/pvrdma_qp_ops.c | 6 -- 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c index 8d59a42cd1..4f10fcabcc 100644 --- a/hw/rdma/rdma_rm.c +++ b/hw/rdma/rdma_rm.c @@ -263,7 +263,7 @@ int rdma_rm_alloc_cq(RdmaDeviceResources *dev_res, RdmaBackendDev *backend_dev, } cq->opaque = opaque; -cq->notify = false; +cq->notify = CNT_CLEAR; rc = rdma_backend_create_cq(backend_dev, >backend_cq, cqe); if (rc) { @@ -291,7 +291,10 @@ void rdma_rm_req_notify_cq(RdmaDeviceResources *dev_res, uint32_t cq_handle, return; } -cq->notify = notify; +if (cq->notify != CNT_SET) { +cq->notify = notify ? CNT_ARM : CNT_CLEAR; +} + pr_dbg("notify=%d\n", cq->notify); } @@ -349,6 +352,11 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, uint32_t pd_handle, return -EINVAL; } +if (qp_type == IBV_QPT_GSI) { +scq->notify = CNT_SET; +rcq->notify = CNT_SET; +} + qp = res_tbl_alloc(_res->qp_tbl, _qpn); if (!qp) { return -ENOMEM; diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h index 7228151239..9b399063d3 100644 --- a/hw/rdma/rdma_rm_defs.h +++ b/hw/rdma/rdma_rm_defs.h @@ -49,10 +49,16 @@ typedef struct RdmaRmPD { uint32_t ctx_handle; } RdmaRmPD; +typedef enum CQNotificationType { +CNT_CLEAR, +CNT_ARM, +CNT_SET, +} CQNotificationType; + typedef struct RdmaRmCQ { RdmaBackendCQ backend_cq; void *opaque; -bool notify; +CQNotificationType notify; } RdmaRmCQ; /* MR (DMA region) */ diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c index c668afd0ed..762700a205 100644 --- a/hw/rdma/vmw/pvrdma_qp_ops.c +++ b/hw/rdma/vmw/pvrdma_qp_ops.c @@ -89,8 +89,10 @@ static int pvrdma_post_cqe(PVRDMADev *dev, uint32_t cq_handle, pvrdma_ring_write_inc(>dsr_info.cq); pr_dbg("cq->notify=%d\n", cq->notify); -if (cq->notify) { -cq->notify = false; +if (cq->notify != CNT_CLEAR) { +if (cq->notify == CNT_ARM) { +cq->notify = CNT_CLEAR; +} post_interrupt(dev, INTR_VEC_CMD_COMPLETION_Q); } -- 2.17.1
[Qemu-devel] [PATCH PULL 11/31] qapi: Define new QMP message for pvrdma
From: Yuval Shaia pvrdma requires that the same GID attached to it will be attached to the backend device in the host. A new QMP messages is defined so pvrdma device can broadcast any change made to its GID table. This event is captured by libvirt which in turn will update the GID table in the backend device. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Acked-by: Markus Armbruster Signed-off-by: Marcel Apfelbaum --- MAINTAINERS | 1 + Makefile.objs | 3 ++- qapi/qapi-schema.json | 1 + qapi/rdma.json| 38 ++ 4 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 qapi/rdma.json diff --git a/MAINTAINERS b/MAINTAINERS index 856d379b0a..180695f5d3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2413,6 +2413,7 @@ F: hw/rdma/* F: hw/rdma/vmw/* F: docs/pvrdma.txt F: contrib/rdmacm-mux/* +F: qapi/rdma.json Build and test automation - diff --git a/Makefile.objs b/Makefile.objs index 319f14d937..bc5b8a8442 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -1,5 +1,6 @@ QAPI_MODULES = block-core block char common crypto introspect job migration -QAPI_MODULES += misc net rocker run-state sockets tpm trace transaction ui +QAPI_MODULES += misc net rdma rocker run-state sockets tpm trace transaction +QAPI_MODULES += ui ### # Common libraries for tools and emulators diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 65b6dc2f6f..3bbdfcee84 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -86,6 +86,7 @@ { 'include': 'char.json' } { 'include': 'job.json' } { 'include': 'net.json' } +{ 'include': 'rdma.json' } { 'include': 'rocker.json' } { 'include': 'tpm.json' } { 'include': 'ui.json' } diff --git a/qapi/rdma.json b/qapi/rdma.json new file mode 100644 index 00..b58105b1b6 --- /dev/null +++ b/qapi/rdma.json @@ -0,0 +1,38 @@ +# -*- Mode: Python -*- +# + +## +# = RDMA device +## + +## +# @RDMA_GID_STATUS_CHANGED: +# +# Emitted when guest driver adds/deletes GID to/from device +# +# @netdev: RoCE Network Device name +# +# @gid-status: Add or delete indication +# +# @subnet-prefix: Subnet Prefix +# +# @interface-id : Interface ID +# +# Since: 4.0 +# +# Example: +# +# <- {"timestamp": {"seconds": 1541579657, "microseconds": 986760}, +# "event": "RDMA_GID_STATUS_CHANGED", +# "data": +# {"netdev": "bridge0", +# "interface-id": 15880512517475447892, +# "gid-status": true, +# "subnet-prefix": 33022}} +# +## +{ 'event': 'RDMA_GID_STATUS_CHANGED', + 'data': { 'netdev': 'str', +'gid-status': 'bool', +'subnet-prefix' : 'uint64', +'interface-id' : 'uint64' } } -- 2.17.1
[Qemu-devel] [PATCH PULL 05/31] hw/rdma: Abort send-op if fail to create addr handler
From: Yuval Shaia Function create_ah might return NULL, let's exit with an error. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c index d7a4bbd91f..1e148398a2 100644 --- a/hw/rdma/rdma_backend.c +++ b/hw/rdma/rdma_backend.c @@ -338,6 +338,10 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev, if (qp_type == IBV_QPT_UD) { wr.wr.ud.ah = create_ah(backend_dev, qp->ibpd, backend_dev->backend_gid_idx, dgid); +if (!wr.wr.ud.ah) { +comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx); +goto out_dealloc_cqe_ctx; +} wr.wr.ud.remote_qpn = dqpn; wr.wr.ud.remote_qkey = dqkey; } -- 2.17.1
[Qemu-devel] [PATCH PULL 08/31] hw/pvrdma: Make default pkey 0xFFFF
From: Yuval Shaia Commit 6e7dba23af ("hw/pvrdma: Make default pkey 0x") exports default pkey as external definition but omit the change from 0x7FFF to 0x. Fixes: 6e7dba23af ("hw/pvrdma: Make default pkey 0x") Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h index e3742d893a..15c3f28b86 100644 --- a/hw/rdma/vmw/pvrdma.h +++ b/hw/rdma/vmw/pvrdma.h @@ -52,7 +52,7 @@ #define PVRDMA_FW_VERSION14 /* Some defaults */ -#define PVRDMA_PKEY 0x7FFF +#define PVRDMA_PKEY 0x typedef struct DSRInfo { dma_addr_t dma; -- 2.17.1
[Qemu-devel] [PATCH PULL 09/31] hw/pvrdma: Set the correct opcode for recv completion
From: Yuval Shaia The function pvrdma_post_cqe populates CQE entry with opcode from the given completion element. For receive operation value was not set. Fix it by setting it to IBV_WC_RECV. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_qp_ops.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c index 762700a205..7b0f440fda 100644 --- a/hw/rdma/vmw/pvrdma_qp_ops.c +++ b/hw/rdma/vmw/pvrdma_qp_ops.c @@ -196,8 +196,9 @@ int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle) comp_ctx = g_malloc(sizeof(CompHandlerCtx)); comp_ctx->dev = dev; comp_ctx->cq_handle = qp->recv_cq_handle; -comp_ctx->cqe.qp = qp_handle; comp_ctx->cqe.wr_id = wqe->hdr.wr_id; +comp_ctx->cqe.qp = qp_handle; +comp_ctx->cqe.opcode = IBV_WC_RECV; rdma_backend_post_recv(>backend_dev, >rdma_dev_res, >backend_qp, qp->qp_type, -- 2.17.1
[Qemu-devel] [PATCH PULL 01/31] hw/pvrdma: Check the correct return value
From: Yuval Shaia Return value of 0 means ok, we want to free the memory only in case of error. Signed-off-by: Yuval Shaia Message-Id: <20181025061700.17050-1-yuval.sh...@oracle.com> Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/vmw/pvrdma_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index 4faeb21631..57d6f41ae6 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -232,7 +232,7 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req *req, cmd->start, cmd->length, host_virt, cmd->access_flags, >mr_handle, >lkey, >rkey); -if (host_virt && !resp->hdr.err) { +if (resp->hdr.err && host_virt) { munmap(host_virt, cmd->length); } -- 2.17.1
[Qemu-devel] [PATCH PULL 00/31] RDMA queue
The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8: Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into staging (2018-12-21 15:49:59 +) are available in the Git repository at: https://github.com/marcel-apf/qemu tags/rdma-pull-request for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b: pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 11:09:57 +0200) RDMA queue * Add support for RDMA MAD * Various fixes for the pvrdma backend Prasad J Pandit (7): pvrdma: release device resources in case of an error rdma: check num_sge does not exceed MAX_SGE pvrdma: add uar_read routine pvrdma: check number of pages when creating rings pvrdma: release ring object in case of an error rdma: remove unused VENDOR_ERR_NO_SGE macro pvrdma: check return value from pvrdma_idx_ring_has_ routines Yuval Shaia (24): hw/pvrdma: Check the correct return value contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer hw/rdma: Add ability to force notification without re-arm hw/rdma: Return qpn 1 if ibqp is NULL hw/rdma: Abort send-op if fail to create addr handler hw/rdma: Add support for MAD packets hw/pvrdma: Make function reset_device return void hw/pvrdma: Make default pkey 0x hw/pvrdma: Set the correct opcode for recv completion hw/pvrdma: Set the correct opcode for send completion qapi: Define new QMP message for pvrdma hw/pvrdma: Add support to allow guest to configure GID table vmxnet3: Move some definitions to header file hw/pvrdma: Make sure PCI function 0 is vmxnet3 hw/rdma: Initialize node_guid from vmxnet3 mac address hw/pvrdma: Make device state depend on Ethernet function state hw/pvrdma: Fill all CQE fields hw/pvrdma: Fill error code in command's response hw/rdma: Remove unneeded code that handles more that one port vl: Introduce shutdown_notifiers hw/pvrdma: Clean device's resource when system is shutdown hw/rdma: Do not use bitmap_zero_extend to free bitmap hw/rdma: Do not call rdma_backend_del_gid on an empty gid docs: Update pvrdma device documentation MAINTAINERS | 2 + Makefile | 3 + Makefile.objs| 4 +- contrib/rdmacm-mux/Makefile.objs | 4 + contrib/rdmacm-mux/main.c| 798 +++ contrib/rdmacm-mux/rdmacm-mux.h | 61 +++ docs/pvrdma.txt | 126 ++- hw/net/vmxnet3.c | 116 +- hw/net/vmxnet3_defs.h| 133 +++ hw/rdma/rdma_backend.c | 524 + hw/rdma/rdma_backend.h | 28 +- hw/rdma/rdma_backend_defs.h | 19 +- hw/rdma/rdma_rm.c| 120 +- hw/rdma/rdma_rm.h| 17 +- hw/rdma/rdma_rm_defs.h | 21 +- hw/rdma/rdma_utils.h | 25 ++ hw/rdma/vmw/pvrdma.h | 10 +- hw/rdma/vmw/pvrdma_cmd.c | 273 +++--- hw/rdma/vmw/pvrdma_dev_ring.c| 29 +- hw/rdma/vmw/pvrdma_main.c| 70 ++-- hw/rdma/vmw/pvrdma_qp_ops.c | 62 ++- include/sysemu/sysemu.h | 1 + qapi/qapi-schema.json| 1 + qapi/rdma.json | 38 ++ vl.c | 15 +- 25 files changed, 2082 insertions(+), 418 deletions(-) create mode 100644 contrib/rdmacm-mux/Makefile.objs create mode 100644 contrib/rdmacm-mux/main.c create mode 100644 contrib/rdmacm-mux/rdmacm-mux.h create mode 100644 hw/net/vmxnet3_defs.h create mode 100644 qapi/rdma.json -- 2.17.1
[Qemu-devel] [PATCH PULL 04/31] hw/rdma: Return qpn 1 if ibqp is NULL
From: Yuval Shaia Device is not supporting QP0, only QP1. Signed-off-by: Yuval Shaia Reviewed-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- hw/rdma/rdma_backend.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h index 86e8fe8ab6..3ccc9a2494 100644 --- a/hw/rdma/rdma_backend.h +++ b/hw/rdma/rdma_backend.h @@ -33,7 +33,7 @@ static inline union ibv_gid *rdma_backend_gid(RdmaBackendDev *dev) static inline uint32_t rdma_backend_qpn(const RdmaBackendQP *qp) { -return qp->ibqp ? qp->ibqp->qp_num : 0; +return qp->ibqp ? qp->ibqp->qp_num : 1; } static inline uint32_t rdma_backend_mr_lkey(const RdmaBackendMR *mr) -- 2.17.1
[Qemu-devel] [PATCH PULL 02/31] contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer
From: Yuval Shaia RDMA MAD kernel module (ibcm) disallow more than one MAD-agent for a given MAD class. This does not go hand-by-hand with qemu pvrdma device's requirements where each VM is MAD agent. Fix it by adding implementation of RDMA MAD multiplexer service which on one hand register as a sole MAD agent with the kernel module and on the other hand gives service to more than one VM. Design Overview: Reviewed-by: Shamir Rabinovitch A server process is registered to UMAD framework (for this to work the rdma_cm kernel module needs to be unloaded) and creates a unix socket to listen to incoming request from clients. A client process (such as QEMU) connects to this unix socket and registers with its own GID. TX: When client needs to send rdma_cm MAD message it construct it the same way as without this multiplexer, i.e. creates a umad packet but this time it writes its content to the socket instead of calling umad_send(). The server, upon receiving such a message fetch local_comm_id from it so a context for this session can be maintain and relay the message to UMAD layer by calling umad_send(). RX: The server creates a worker thread to process incoming rdma_cm MAD messages. When an incoming message arrived (umad_recv()) the server, depending on the message type (attr_id) looks for target client by either searching in gid->fd table or in local_comm_id->fd table. With the extracted fd the server relays to incoming message to the client. Signed-off-by: Yuval Shaia Reviewed-by: Shamir Rabinovitch Signed-off-by: Marcel Apfelbaum Signed-off-by: Marcel Apfelbaum --- MAINTAINERS | 1 + Makefile | 3 + Makefile.objs| 1 + contrib/rdmacm-mux/Makefile.objs | 4 + contrib/rdmacm-mux/main.c| 798 +++ contrib/rdmacm-mux/rdmacm-mux.h | 61 +++ 6 files changed, 868 insertions(+) create mode 100644 contrib/rdmacm-mux/Makefile.objs create mode 100644 contrib/rdmacm-mux/main.c create mode 100644 contrib/rdmacm-mux/rdmacm-mux.h diff --git a/MAINTAINERS b/MAINTAINERS index 3b31e07b26..856d379b0a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2412,6 +2412,7 @@ S: Maintained F: hw/rdma/* F: hw/rdma/vmw/* F: docs/pvrdma.txt +F: contrib/rdmacm-mux/* Build and test automation - diff --git a/Makefile b/Makefile index 038780c6d0..dd53965f77 100644 --- a/Makefile +++ b/Makefile @@ -362,6 +362,7 @@ dummy := $(call unnest-vars,, \ elf2dmp-obj-y \ ivshmem-client-obj-y \ ivshmem-server-obj-y \ +rdmacm-mux-obj-y \ libvhost-user-obj-y \ vhost-user-scsi-obj-y \ vhost-user-blk-obj-y \ @@ -579,6 +580,8 @@ vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a $(call LINK, $^) vhost-user-blk$(EXESUF): $(vhost-user-blk-obj-y) libvhost-user.a $(call LINK, $^) +rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS) + $(call LINK, $^) module_block.h: $(SRC_PATH)/scripts/modules/module_block.py config-host.mak $(call quiet-command,$(PYTHON) $< $@ \ diff --git a/Makefile.objs b/Makefile.objs index 56af0347d3..319f14d937 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -133,6 +133,7 @@ vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS) vhost-user-scsi.o-libs := $(LIBISCSI_LIBS) vhost-user-scsi-obj-y = contrib/vhost-user-scsi/ vhost-user-blk-obj-y = contrib/vhost-user-blk/ +rdmacm-mux-obj-y = contrib/rdmacm-mux/ ## trace-events-subdirs = diff --git a/contrib/rdmacm-mux/Makefile.objs b/contrib/rdmacm-mux/Makefile.objs new file mode 100644 index 00..be3eacb6f7 --- /dev/null +++ b/contrib/rdmacm-mux/Makefile.objs @@ -0,0 +1,4 @@ +ifdef CONFIG_PVRDMA +CFLAGS += -libumad -Wno-format-truncation +rdmacm-mux-obj-y = main.o +endif diff --git a/contrib/rdmacm-mux/main.c b/contrib/rdmacm-mux/main.c new file mode 100644 index 00..835a7f9214 --- /dev/null +++ b/contrib/rdmacm-mux/main.c @@ -0,0 +1,798 @@ +/* + * QEMU paravirtual RDMA - rdmacm-mux implementation + * + * Copyright (C) 2018 Oracle + * Copyright (C) 2018 Red Hat Inc + * + * Authors: + * Yuval Shaia + * Marcel Apfelbaum + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "sys/poll.h" +#include "sys/ioctl.h" +#include "pthread.h" +#include "syslog.h" + +#include "infiniband/verbs.h" +#include "infiniband/umad.h" +#include "infiniband/umad_types.h" +#include "infiniband/umad_sa.h" +#include "infiniband/umad_cm.h" + +#include "rdmacm-mux.h" + +#define SCALE_US 1000 +#define COMMID_TTL 2 /* How many SCALE_US a context of MAD session is saved */ +#define SLEEP_SECS 5 /* This is used both in poll() and thread */ +#define
Re: [Qemu-devel] [RFC PATCH 0/7] virtio-fs: shared file system for virtual machines3
On 2018/12/11 1:31, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This is the first RFC for the QEMU side of 'virtio-fs'; > a new mechanism for mounting host directories into the guest > in a fast, consistent and secure manner. Our primary use > case is kata containers, but it should be usable in other scenarios > as well. > > There are corresponding patches being posted to Linux kernel, > libfuse and kata lists. > > For a fuller design description, and benchmark numbers, please see > Vivek's posting of the kernel set here: > > https://marc.info/?l=linux-kernel=154446243024251=2 > > We've got a small website with instructions on how to use it, here: > > https://virtio-fs.gitlab.io/ > > and all the code is available on gitlab at: > > https://gitlab.com/virtio-fs > > QEMU's changes > -- > > The QEMU changes are pretty small; > > There's a new vhost-user device, which is used to carry a stream of > FUSE messages to an external daemon that actually performs > all the file IO. The FUSE daemon is an external process in order to > achieve better isolation for security and resource control (e.g. number > of file descriptors) and also because it's cleaner than trying to > integrate libfuse into QEMU. > > This device has an extra BAR that contains (up to) 3 regions: > > a) a DAX mapping range ('the cache') - into which QEMU mmap's > files on behalf of the external daemon; those files are > then directly mapped by the guest in a way similar to a DAX > backed file system; one advantage of this is that multiple > guests all accessing the same files should all be sharing > those pages of host cache. > > b) An experimental set of mappings for use by a metadata versioning > daemon; this mapping is shared between multiple guests and > the daemon, but only contains a set of version counters that > allow a guest to quickly tell if its metadata is stale. > > TODO > > > This is the first RFC, we know we have a bunch of things to clear up: > > a) The virtio device specificiation is still in flux and is expected > to change > > b) We'd like to find ways of reducing the map/unmap latency for DAX > > c) The metadata versioning scheme needs to settle out. > > d) mmap'ing host files has some interesting side effects; for example > if the file gets truncated by the host and then the guest accesses > the mapping, KVM can fail the guest hard. > > Dr. David Alan Gilbert (6): > virtio: Add shared memory capability > virtio-fs: Add cache BAR > virtio-fs: Add vhost-user slave commands for mapping > virtio-fs: Fill in slave commands for mapping > virtio-fs: Allow mapping of meta data version table > virtio-fs: Allow mapping of journal > > Stefan Hajnoczi (1): > virtio: add vhost-user-fs-pci device > > configure | 10 + > contrib/libvhost-user/libvhost-user.h | 3 + > docs/interop/vhost-user.txt | 35 ++ > hw/virtio/Makefile.objs | 1 + > hw/virtio/vhost-user-fs.c | 517 > hw/virtio/vhost-user.c | 16 + > hw/virtio/virtio-pci.c | 115 + > hw/virtio/virtio-pci.h | 19 + > include/hw/pci/pci.h| 1 + > include/hw/virtio/vhost-user-fs.h | 79 +++ > include/standard-headers/linux/virtio_fs.h | 48 ++ > include/standard-headers/linux/virtio_ids.h | 1 + > include/standard-headers/linux/virtio_pci.h | 9 + > 13 files changed, 854 insertions(+) > create mode 100644 hw/virtio/vhost-user-fs.c > create mode 100644 include/hw/virtio/vhost-user-fs.h > create mode 100644 include/standard-headers/linux/virtio_fs.h > Hi Dave, I encounter a problem after running qemu with virtio-fs, I find I only can mount virtio-fs using the following command: mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o tag=myfs,rootmode=04,user_id=0,group_id=0 or mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o tag=myfs,rootmode=04,user_id=0,group_id=0,dax Then, I want to know how to use "cache=always" or "cache=none", even "cache=auto", "cache=writeback"? Thanks, Yiwen.
Re: [Qemu-devel] Segfaults in chardev due to races
On 21/12/18 23:31, Max Reitz wrote: > I suppose the issue is that QMP events are sent by one thread, and > client disconnects are handled by a different one. So if a QMP event is > sent while a client disconnects concurrently, races may occur; and the > only protection against concurrent access appears to be the > chr_write_lock, which I don't think is enough. I think disconnection (tcp_chr_disconnect) has to take the chr_write_lock too. Paolo
Re: [Qemu-devel] [PATCH] i386: remove the 'INTEL_PT' CPUID bit from named CPU models
On 22/12/18 02:01, Robert Hoo wrote: > On Fri, 2018-12-21 at 16:27 +0100, Paolo Bonzini wrote: >> On 21/12/18 16:22, Philippe Mathieu-Daudé wrote: >>> Hi Paolo, >>> >>> On 12/21/18 7:30 AM, Paolo Bonzini wrote: From: Robert Hoo Processor tracing is not yet implemented for KVM and it will be an opt in feature requiring a special module parameter. Disable it, because it is wrong to enable it by default and it is impossible that no one has ever used it. Cc: qemu-sta...@nongnu.org >>> >>> Does this patch misses Robert S-o-b? >>> Signed-off-by: Robert Hoo > > Paolo's right. It didn't come from me. >> >> No, the author is wrong, it should be me. "git commit -c" apparently >> copies the author from the original commit. >> >> Paolo > > Hi Paolo, would you hold on INTEL_PT removal for a moment? I think I > need Luwei's double confirm. I'm aware of Luwei's patches, they will be in 4.21. As mentioned in the commit message, they will be an opt-in feature, not enabled by default; the default is system-wide tracing and no INTEL_PT CPUID bit available in the guest. Paolo
Re: [Qemu-devel] [PULL v4 00/35] Misc patches for 2018-12-21
On 21/12/18 22:09, Peter Maydell wrote: > I don't really understand what's going on here, or why > it only happens with this one system (my main x86-64 > Linux Ubuntu 16.04.5 box) and not the various others I'm > running test builds on. But it does seem to be 100% > reliable with any of these pullreqs with the new test > driver in them :-( I'm afraid something in your setup is causing make's stdout to have O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be something above it. I also checked Perl with strace and, at least here, it doesn't set O_NONBLOCK. So here are some ideas... First, can you try applying something like this to reproduce? --- a/Makefile +++ b/Makefile @@ -17,9 +17,13 @@ print-%: # All following code might depend on configuration variables ifneq ($(wildcard config-host.mak),) # Put the all: rule here so that config-host.mak can contain dependencies. -all: +all: lotsofoutput include config-host.mak +.PHONY: lotsofoutput +lotsofoutput: + yes 1234567890 | head -n 1 + git-submodule-update: .PHONY: git-submodule-update And please try applying this, which is a bit of a shot in the dark but 1) it is a good idea anyway; 2) it may help, if not alone, together with the workarounds below: diff --git a/scripts/tap-driver.pl b/scripts/tap-driver.pl index 5e59b5db49..6621a5cd67 100755 --- a/scripts/tap-driver.pl +++ b/scripts/tap-driver.pl @@ -313,6 +313,7 @@ sub main () my $iterator = TAP::Parser::Iterator::Stream->new(\*STDIN); my $parser = TAP::Parser->new ({iterator => $iterator }); + STDOUT->autoflush(1); while (defined (my $cur = $parser->next)) { # Parsing of TAP input should stop after a "Bail out!" directive. diff --git a/scripts/tap-merge.pl b/scripts/tap-merge.pl index 59e3fa5007..10ccf57bb2 100755 --- a/scripts/tap-merge.pl +++ b/scripts/tap-merge.pl @@ -53,6 +53,7 @@ sub main () my $testno = 0; # Number of test results seen so far. my $bailed_out = 0; # Whether a "Bail out!" directive has been seen. + STDOUT->autoflush(1); while (defined (my $cur = $parser->next)) { if ($cur->is_bailout) Possible workarounds include: - using "make -Oline" or "make -Onone" (for -Oline, it may require the above autoflush patch). - running this Python script before invoking make import os from fcntl import * fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~os.O_NONBLOCK) Paolo