date:20181222

Re: [Qemu-devel] [PATCH v3] s390x/pci: add common function measurement block

2018-12-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/1544796678-12736-1-git-send-email-pmo...@linux.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 1544796678-12736-1-git-send-email-pmo...@linux.ibm.com
Type: series
Subject: [Qemu-devel] [PATCH v3] s390x/pci: add common function measurement 
block

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
3664584 s390x/pci: add common function measurement block

=== OUTPUT BEGIN ===
Checking PATCH 1/1: s390x/pci: add common function measurement block...
ERROR: code indent should never use tabs
#80: FILE: hw/s390x/s390-pci-bus.h:304:
+#define ZPCI_FMB_FORMAT^I0$

ERROR: code indent should never use tabs
#213: FILE: hw/s390x/s390-pci-inst.c:950:
+^Iret = MEMTX_ERROR;$

WARNING: line over 80 characters
#241: FILE: hw/s390x/s390-pci-inst.c:978:
+if (fmb_do_update(pbdev, offset, pbdev->fmb.sample++, 
sizeof(pbdev->fmb.sample))) {

total: 2 errors, 1 warnings, 257 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1544796678-12736-1-git-send-email-pmo...@linux.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [Bug 1809304] Re: qemu-img convert is freezing for some DMG files.

2018-12-22 Thread yuchenlin

Because of lacking zero chunk table, reading zero sector will return EIO.
I have submitted a series to fix this problem.

Please refer to this series: http://lists.nongnu.org/archive/html/qemu-
devel/2018-12/msg05637.html

Thanks,
Yu-Chen Lin

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1809304

Title:
  qemu-img convert is freezing for some DMG files.

Status in QEMU:
  New

Bug description:
  Recently, I created a file using hdiutil from MacOS (using Zlib
  compression):

  $ hdiutil create -volname MyVolName -srcfolder /path/to/my/vol/ -ov
  -format UDZO myvolname.dmg

  But, when I try to convert this volume using qemu-img convert, this
  command is freezing.

  I'm using the upstream version to test it.

  It is freezing inside the binary search method to retrieve the chunk.

  But, I still don't know why.

  I'm attaching the file as an example.

  It can be mounted using MacOS or other Linux apps like hfsleuth and
  darling-dmg.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1809304/+subscriptions

[Qemu-devel] [PATCH v2 1/3] dmg: fix binary search

2018-12-22 Thread yuchenlin

There is a possible hang in original binary search implementation. That is
if chunk1 = 4, chunk2 = 5, chunk3 = 4, and we go else case.

The chunk1 will be still 4, and so on.

Signed-off-by: yuchenlin 
---
 block/dmg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 50e91aef6d..0e05702f5d 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -572,14 +572,14 @@ static inline uint32_t search_chunk(BDRVDMGState *s, 
uint64_t sector_num)
 {
 /* binary search */
 uint32_t chunk1 = 0, chunk2 = s->n_chunks, chunk3;
-while (chunk1 != chunk2) {
+while (chunk1 <= chunk2) {
 chunk3 = (chunk1 + chunk2) / 2;
 if (s->sectors[chunk3] > sector_num) {
-chunk2 = chunk3;
+chunk2 = chunk3 - 1;
 } else if (s->sectors[chunk3] + s->sectorcounts[chunk3] > sector_num) {
 return chunk3;
 } else {
-chunk1 = chunk3;
+chunk1 = chunk3 + 1;
 }
 }
 return s->n_chunks; /* error */
-- 
2.17.1

[Qemu-devel] [PATCH v2 2/3] dmg: use enumeration type instead of hard coding number

2018-12-22 Thread yuchenlin

Signed-off-by: yuchenlin 
---
 block/dmg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 0e05702f5d..6b0a057bf8 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -267,7 +267,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, 
DmgHeaderState *ds,
 
 /* all-zeroes sector (type 2) does not need to be "uncompressed" and 
can
  * therefore be unbounded. */
-if (s->types[i] != 2 && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
+if (s->types[i] != UDIG && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
 error_report("sector count %" PRIu64 " for chunk %" PRIu32
  " is larger than max (%u)",
  s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
@@ -706,7 +706,7 @@ dmg_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 /* Special case: current chunk is all zeroes. Do not perform a memcpy 
as
  * s->uncompressed_chunk may be too small to cover the large all-zeroes
  * section. dmg_read_chunk is called to find s->current_chunk */
-if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */
+if (s->types[s->current_chunk] == UDIG) { /* all zeroes block entry */
 qemu_iovec_memset(qiov, i * 512, 0, 512);
 continue;
 }
-- 
2.17.1

[Qemu-devel] [PATCH v2 3/3] dmg: don't skip zero chunk

2018-12-22 Thread yuchenlin

The dmg file has many tables which describe: "start from sector XXX to
sector XXX, the compression method is XXX and where the compressed data
resides on".

Each sector in the expanded file should be covered by a table. The table
will describe the offset of compressed data (or raw depends on the type)
in the dmg.

For example:

[---The expanded file]
[---bzip table ---]/* zeros */[---zlib---]
^
| if we want to read this sector.

we will find bzip table which contains this sector, and get the
compressed data offset, read it from dmg, uncompress it, finally write to
expanded file.

If we skip zero chunk (table), some sector cannot find the table which
will cause search_chunk() return s->n_chunks, dmg_read_chunk() return -1
and finally causing dmg_co_preadv() return EIO.

See:

[---The expanded file]
[---bzip table ---]/* zeros */[---zlib---]
^
| if we want to read this sector.

Oops, we cannot find the table contains it...

In the original implementation, we don't have zero table. When we try to
read sector inside the zero chunk. We will get EIO, and skip reading.

After this patch, we treat zero chunk the same as ignore chunk, it will
directly write zero and avoid some sector may not find the table.

After this patch:

[---The expanded file]
[---bzip table ---][--zeros--][---zlib---]

Signed-off-by: yuchenlin 
---
 block/dmg.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 6b0a057bf8..137fe9c1ff 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -130,7 +130,8 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t 
chunk,
 case UDRW: /* copy */
 uncompressed_sectors = DIV_ROUND_UP(s->lengths[chunk], 512);
 break;
-case UDIG: /* zero */
+case UDZE: /* zero */
+case UDIG: /* ignore */
 /* as the all-zeroes block may be large, it is treated specially: the
  * sector is not copied from a large buffer, a simple memset is used
  * instead. Therefore uncompressed_sectors does not need to be set. */
@@ -199,8 +200,9 @@ typedef struct DmgHeaderState {
 static bool dmg_is_known_block_type(uint32_t entry_type)
 {
 switch (entry_type) {
+case UDZE:/* zeros */
 case UDRW:/* uncompressed */
-case UDIG:/* zeroes */
+case UDIG:/* ignore */
 case UDZO:/* zlib */
 return true;
 case UDBZ:/* bzip2 */
@@ -265,9 +267,10 @@ static int dmg_read_mish_block(BDRVDMGState *s, 
DmgHeaderState *ds,
 /* sector count */
 s->sectorcounts[i] = buff_read_uint64(buffer, offset + 0x10);
 
-/* all-zeroes sector (type 2) does not need to be "uncompressed" and 
can
- * therefore be unbounded. */
-if (s->types[i] != UDIG && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
+/* all-zeroes sector (type UDZE and UDIG) does not need to be
+ * "uncompressed" and can therefore be unbounded. */
+if (s->types[i] != UDZE && s->types[i] != UDIG
+&& s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
 error_report("sector count %" PRIu64 " for chunk %" PRIu32
  " is larger than max (%u)",
  s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
@@ -671,7 +674,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 return -1;
 }
 break;
-case UDIG: /* zero */
+case UDZE: /* zeros */
+case UDIG: /* ignore */
 /* see dmg_read, it is treated specially. No buffer needs to be
  * pre-filled, the zeroes can be set directly. */
 break;
@@ -706,7 +710,8 @@ dmg_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 /* Special case: current chunk is all zeroes. Do not perform a memcpy 
as
  * s->uncompressed_chunk may be too small to cover the large all-zeroes
  * section. dmg_read_chunk is called to find s->current_chunk */
-if (s->types[s->current_chunk] == UDIG) { /* all zeroes block entry */
+if (s->types[s->current_chunk] == UDZE
+|| s->types[s->current_chunk] == UDIG) { /* all zeroes block entry 
*/
 qemu_iovec_memset(qiov, i * 512, 0, 512);
 continue;
 }
-- 
2.17.1

[Qemu-devel] [PATCH v2 0/3] dmg: fixing reading in dmg

2018-12-22 Thread yuchenlin

There are two bugs in dmg reading.

First, it may hang in binary search. this problem is solved by patch 1.
Second, because of lacking zero chunk table, reading zero sector will
return EIO. thie problem is solved by patch 2 and 3.

Thanks

v1 - >v2:
* fix typos in patch 1
* add patch 2 and patch 3

yuchenlin (3):
  dmg: fix binary search
  dmg: use enumeration type instead of hard coding number
  dmg: don't skip zero chunk

 block/dmg.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

-- 
2.17.1

Re: [Qemu-devel] [PATCH v3 8/9] target/ppc: move FP and VMX registers into aligned vsr register array

2018-12-22 Thread Richard Henderson

On 12/22/18 10:09 AM, Mark Cave-Ayland wrote:
> Do you want these helpers used just within
> linux-user/ppc/signal.c or also within the other files touched by this patch 
> e.g.
> arch_dump.c, gdbstub.c etc.?

Everywhere.  Thanks!


r~

[Qemu-devel] [Bug 1793275] Re: Hosts fail to start after update to QEMU 3.0

2018-12-22 Thread Neil Darlow

This bug is not present in QEMU-3.1.0.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793275

Title:
  Hosts fail to start after update to QEMU 3.0

Status in QEMU:
  New

Bug description:
  Host OS: Archlinux
  Host Architecture: AMD64
  Guest OS: FreeBSD-11.2 (x2) and Archlinux (x1)
  Guest Architecture: AMD64

  I have been using QEMU 2.x without issue for a number of years but
  since updating to QEMU 3.0 my guests do not complete startup.

  FreeBSD 11.2 guest failure symptom:
  The two FreeBSD-11.2 guests output repeated messages of "unexpected cache 
type 4". This appears to be an internal error message and I've not found any 
instances of it through Google search.

  Archlinux guest failure symptom:
  The single Archlinux guest gets no further than the message "uncompressing 
initial ramdisk".

  The guests are started by a qemu-kvm invokation. No virtual machine
  managers are used. The command lines used (from ps awx) to launch the
  VMs are:

  [neil@optimus ~]$ ps awx |grep qemu
   1492 ?Sl 3:19 /usr/bin/qemu-system-x86_64 -daemonize -pidfile 
/run/qemu_vps1.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive 
file=/dev/system/vps1,cache=none,format=raw,if=virtio,index=0,media=disk -m 
1024 -name FreeBSD_1 -net nic,macaddr=52:54:AD:86:64:00,model=virtio -net 
vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.2:23,server,nowait 
-vnc 192.168.0.1:0
   1510 ?Sl 0:54 /usr/bin/qemu-system-x86_64 -daemonize -pidfile 
/run/qemu_vps2.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive 
file=/dev/system/vps2,cache=none,format=raw,if=virtio,index=0,media=disk -m 
1024 -name Archlinux -net nic,macaddr=52:54:AD:86:64:01,model=virtio -net 
vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.3:23,server,nowait 
-vnc 192.168.0.1:1
   1529 ?Sl 3:07 /usr/bin/qemu-system-x86_64 -daemonize -pidfile 
/run/qemu_vps3.pid -enable-kvm -cpu host -smp 2 -k en-gb -boot order=c -drive 
file=/dev/system/vps3,cache=none,format=raw,if=virtio,index=0,media=disk -m 
1024 -name FreeBSD_2 -net nic,macaddr=52:54:AD:86:64:02,model=virtio -net 
vde,sock=/run/vde_switch-tap0.sock -monitor telnet:127.0.0.4:23,server,nowait 
-vnc 192.168.0.1:2

  The VMs were installed to LVM volumes on the host machine (hence the
  /dev/system/vpsN device names). Networking is over a Linux tap
  interface connected to a VDE2 virtual network switch.

  Currently working version of QEMU: qemu-headless 2.12.1-1
  Failing version of QEMU: qemu-headless-3.0.0-1

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793275/+subscriptions

Re: [Qemu-devel] [PATCH v3 8/9] target/ppc: move FP and VMX registers into aligned vsr register array

2018-12-22 Thread Mark Cave-Ayland

On 20/12/2018 17:57, Richard Henderson wrote:

> On 12/20/18 8:31 AM, Mark Cave-Ayland wrote:
>> The VSX register array is a block of 64 128-bit registers where the first 32
>> registers consist of the existing 64-bit FP registers extended to 128-bit
>> using new VSR registers, and the last 32 registers are the VMX 128-bit
>> registers as show below:
>>
>> 64-bit   64-bit
>> +++
>> |FP0 ||  VSR0
>> +++
>> |FP1 ||  VSR1
>> +++
>> |... |... |  ...
>> +++
>> |FP30||  VSR30
>> +++
>> |FP31||  VSR31
>> +++
>> |  VMX0   |  VSR32
>> +-+
>> |  VMX1   |  VSR33
>> +-+
>> |  ...|  ...
>> +-+
>> |  VMX30  |  VSR62
>> +-+
>> |  VMX31  |  VSR63
>> +-+
>>
>> In order to allow for future conversion of VSX instructions to use TCG vector
>> operations, recreate the same layout using an aligned version of the existing
>> vsr register array.
>>
>> Since the old fpr and avr register arrays are removed, the existing callers
>> must also be updated to use the correct offset in the vsr register array. 
>> This
>> also includes switching the relevant VMState fields over to using subarrays
>> to make sure that migration is preserved.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> Reviewed-by: Richard Henderson 
>> Acked-by: David Gibson 
>> ---
>>  linux-user/ppc/signal.c | 24 ++---
>>  target/ppc/arch_dump.c  | 12 +++
>>  target/ppc/cpu.h|  9 ++---
>>  target/ppc/gdbstub.c|  8 ++---
>>  target/ppc/internal.h   | 18 +++---
>>  target/ppc/machine.c| 72 
>> ++---
>>  target/ppc/monitor.c|  4 +--
>>  target/ppc/translate.c  | 14 
>>  target/ppc/translate/dfp-impl.inc.c |  2 +-
>>  target/ppc/translate/vmx-impl.inc.c |  7 +++-
>>  target/ppc/translate/vsx-impl.inc.c |  4 +--
>>  target/ppc/translate_init.inc.c | 24 ++---
>>  12 files changed, 126 insertions(+), 72 deletions(-)
>>
>> diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
>> index 2ae120a2bc..a053dd5b84 100644
>> --- a/linux-user/ppc/signal.c
>> +++ b/linux-user/ppc/signal.c
>> @@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct 
>> target_mcontext *frame)
>>  /* Save Altivec registers if necessary.  */
>>  if (env->insns_flags & PPC_ALTIVEC) {
>>  uint32_t *vrsave;
>> -for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
>> -ppc_avr_t *avr = >avr[i];
>> +for (i = 0; i < 32; i++) {
>> +ppc_avr_t *avr = >vsr[32 + i];
> 
> Because of our subsequent discussion re endianness within these vectors, I
> think it would be helpful add some helpers here.
> 
> static inline ppc_avr_t *cpu_avr_ptr(CPUPPCState *env, int i)
> {
> return >vsr[32 + i];
> }
> 
>>  /* Save VSX second halves */
>>  if (env->insns_flags2 & PPC2_VSX) {
>>  uint64_t *vsregs = (uint64_t *)>mc_vregs.altivec[34];
>> -for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
>> -__put_user(env->vsr[i], [i]);
>> +for (i = 0; i < 32; i++) {
>> +__put_user(env->vsr[i].u64[1], [i]);
> 
> static inline uint64_t *cpu_vsrl_ptr(CPUPPCState *env, int i)
> {
> return >vsr[i].u64[1];
> }
> 
>>  /* Save floating point registers.  */
>>  if (env->insns_flags & PPC_FLOAT) {
>> -for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
>> -__put_user(env->fpr[i], >mc_fregs[i]);
>> +for (i = 0; i < 32; i++) {
>> +__put_user(env->vsr[i].u64[0], >mc_fregs[i]);
> 
> static inline uint64_t *cpu_fpr_ptr(CPUPPCState *env, int i)
> {
> return >vsr[i].u64[0];
> }
> 
> 
> Eventually, we will want to make these last two functions be dependent on host
> endianness, so that we can remove getVSR and putVSR.  At which point VSR and
> AVX registers will have the same representation.  Because at present they
> don't, which IMO is, if not a bug, at least a severe mis-feature.

Okay I can do that. Do you want these helpers used just within
linux-user/ppc/signal.c or also within the other files touched by this patch 
e.g.
arch_dump.c,

Re: [Qemu-devel] [PATCH 0/2] Fix TABs in many files

2018-12-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20181213223737.11793-1-pbonz...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20181213223737.11793-1-pbonz...@redhat.com
Type: series
Subject: [Qemu-devel] [PATCH 0/2] Fix TABs in many files

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
303caa2 avoid TABs in files that only contain a few
aab0b24 remove space-tab sequences

=== OUTPUT BEGIN ===
Checking PATCH 1/2: remove space-tab sequences...
ERROR: code indent should never use tabs
#21: FILE: bsd-user/x86_64/target_syscall.h:15:
+^Iabi_ulong r11;$

ERROR: code indent should never use tabs
#34: FILE: crypto/aes.c:1074:
+^Iint i = 0;$

ERROR: code indent should never use tabs
#43: FILE: crypto/aes.c:1163:
+^I^I}$

ERROR: code indent should never use tabs
#52: FILE: crypto/aes.c:1250:
+^I/* round 2: */$

ERROR: code indent should never use tabs
#61: FILE: crypto/aes.c:1260:
+^I/* round 4: */$

ERROR: code indent should never use tabs
#70: FILE: crypto/aes.c:1270:
+^I/* round 6: */$

ERROR: code indent should never use tabs
#79: FILE: crypto/aes.c:1280:
+^I/* round 8: */$

ERROR: code indent should never use tabs
#88: FILE: crypto/aes.c:1572:
+^Is0 =$

ERROR: code indent should never use tabs
#94: FILE: crypto/aes.c:1577:
+^I^Irk[0];$

ERROR: code indent should never use tabs
#97: FILE: crypto/aes.c:1579:
+^Is1 =$

ERROR: code indent should never use tabs
#103: FILE: crypto/aes.c:1584:
+^I^Irk[1];$

ERROR: code indent should never use tabs
#106: FILE: crypto/aes.c:1586:
+^Is2 =$

ERROR: code indent should never use tabs
#112: FILE: crypto/aes.c:1591:
+^I^Irk[2];$

ERROR: code indent should never use tabs
#115: FILE: crypto/aes.c:1593:
+^Is3 =$

ERROR: code indent should never use tabs
#121: FILE: crypto/aes.c:1598:
+^I^Irk[3];$

ERROR: code indent should never use tabs
#134: FILE: disas/alpha.c:675:
+^I^Iwhich bits in the actual opcode must match OPCODE.$

ERROR: code indent should never use tabs
#143: FILE: disas/alpha.c:702:
+^I^Ibut with defined results on previous implementations.$

ERROR: code indent should never use tabs
#147: FILE: disas/alpha.c:705:
+^I^Ipresumably undefined results on previous implementations$

ERROR: code indent should never use tabs
#156: FILE: disas/alpha.c:835:
+^I^I^I0xFFE0, BASE, { RC } },^I^I/* ev56 but */$

ERROR: code indent should never use tabs
#169: FILE: disas/arm.c:1080:
+^I^I^I^I(top bit of range being the sign bit)$

ERROR: code indent should never use tabs
#182: FILE: disas/i386.c:6078:
+^I}$

ERROR: code indent should never use tabs
#191: FILE: disas/i386.c:6115:
+^I}$

ERROR: code indent should never use tabs
#204: FILE: disas/m68k.c:353:
+^I^I^I^I^I^I(not 0,1,7.2-4)$

ERROR: space required after that ',' (ctx:VxV)
#204: FILE: disas/m68k.c:353:
+   (not 0,1,7.2-4)
  ^

ERROR: space required after that ',' (ctx:VxV)
#204: FILE: disas/m68k.c:353:
+   (not 0,1,7.2-4)
^

ERROR: spaces required around that '-' (ctx:VxV)
#204: FILE: disas/m68k.c:353:
+   (not 0,1,7.2-4)
^

ERROR: code indent should never use tabs
#213: FILE: disas/m68k.c:1650:
+^I  case 0x18: name = "%psr"; break;$

ERROR: trailing statements should be on next line
#213: FILE: disas/m68k.c:1650:
+ case 0x18: name = "%psr"; break;

ERROR: code indent should never use tabs
#241: FILE: include/hw/elf_ops.h:346:
+^I*pentry = (uint64_t)(elf_sword)ehdr.e_entry;$

ERROR: code indent should never use tabs
#254: FILE: linux-user/linuxload.c:57:
+^Ibprm->e_uid = st.st_uid;$

ERROR: code indent should never use tabs
#267: FILE: linux-user/syscall.c:905:
+^Ireturn target_brk;$

ERROR: code indent should never use tabs
#280: FILE: linux-user/syscall_defs.h:1810:
+^Iabi_long^Ist_blocks;^I/* Number 512-byte blocks allocated. */$

ERROR: code indent should never use tabs
#289: FILE: linux-user/syscall_defs.h:1819:
+^Iabi_long^I__unused[3];$

ERROR: code indent should never use tabs
#302: FILE: linux-user/x86_64/target_syscall.h:15:
+^Iabi_ulong r11;$

ERROR: suspect code indent for conditional statements (24, 27)
#313: FILE: slirp/ip_input.c:195:
if

Re: [Qemu-devel] [PATCH 0/3] Allow hw/audio drivers to pass raw DB values to audio/ drivers

2018-12-22 Thread Yaroslav Isakov

Sorry for a mess, forgot to add maintainer. Gerd, please, take a look
at these patches.

пт, 21 дек. 2018 г. в 22:40, Yaroslav Isakov :
>
> This patch series introduces the ability for virtual audio drivers to pass
> information about guest-chosen DB values to backend audio drivers.
>
> For now, supported virtual driver is hda-codec, and backend is pulseaudio, as
> they both support DB values.
>
> Without these patches, emulated Windows has a very short range of hearable
> sound, as range in the guest is much smaller than in Pulseaudio.
>
> Yaroslav Isakov (3):
>   Allow audio driver to pass DB value to underlying drivers
>   Pass raw DB values from hda-codec.c to audio driver
>   If raw DB values are known, use them in paaudio
>
>  audio/audio.c   | 15 +--
>  audio/audio.h   |  6 --
>  audio/mixeng.h  |  6 --
>  audio/paaudio.c |  9 +++--
>  hw/audio/ac97.c |  4 ++--
>  hw/audio/hda-codec-common.h |  2 +-
>  hw/audio/hda-codec.c| 12 ++--
>  hw/audio/lm4549.c   |  2 +-
>  hw/audio/wm8750.c   | 18 --
>  hw/display/xlnx_dp.c|  3 ++-
>  hw/usb/dev-audio.c  |  6 --
>  11 files changed, 60 insertions(+), 23 deletions(-)
>
> --
> 2.18.1
>

[Qemu-devel] [Bug 1809546] Re: Writing a byte to a pl011 SFR overwrites the whole SFR

2018-12-22 Thread Daniels Umanovskis

Adding the link script.

** Attachment added: "linkscript.ld"
   
https://bugs.launchpad.net/qemu/+bug/1809546/+attachment/5224337/+files/linkscript.ld

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1809546

Title:
  Writing a byte to a pl011 SFR overwrites the whole SFR

Status in QEMU:
  New

Bug description:
  The bug is present in QEMU 2.8.1 and, if my analysis is correct, also
  on master.

  I first noticed that a PL011 UART driver, which is fine on real
  hardware, fails to enable the RX interrupt in the IMSC register when
  running in QEMU. However, the problem only comes up if the code is
  compiled without optimizations. I think I've narrowed it down to a
  minimal example that will exhibit the problem if run as a bare-metal
  application.

  Given:

  pl011_addr: .word 0x10009000

  The following snippet will be problematic:

   ldr r3, pl011_addr
   ldrb r2, [r3, #0x38]// IMSC
   mov r2, #0
   orr r2, r2, #0x10   // R2 == 0x10
   strb r2, [r3, #0x38]// Whole word reads correctly after this
   ldrb r2, [r3, #0x39]
   mov r2, #0
   strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as 
well

  After the first strb instruction, which writes to 0x10009038,
  everything is fine. It can be seen in the QEMU monitor:

  (qemu) xp 0x10009038
  10009038: 0x0010

  After the second strb instruction, the write to 0x10009039 clears the
  entire word:

  (qemu) xp 0x10009038
  10009038: 0x

  QEMU command-line, using the vexpress-a9 which has the PL011 at
  0x10009000:

  qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic
  -monitor telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin
  -gdb tcp::2159 -serial mon:stdio

  Compiling the original C code with optimizations makes the driver
  work. It compiles down to assembly that only does a single write:

  ldr r3, pl011_addr
  mov r2, #0x10
  str r2, [r3, #0x38]

  Attached is the an assembly file, and linkscript, that shows the
  problem, and also includes the working code.

  I haven't debugged inside of QEMU itself but it seems to me that the
  problem is in pl011_write in pl011.c - the functions looks at which
  offset is being written, and then writes the entire SFR that offset
  falls under, which means that changing a single byte will change the
  whole SFR.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1809546/+subscriptions

[Qemu-devel] [Bug 1809546] [NEW] Writing a byte to a pl011 SFR overwrites the whole SFR

2018-12-22 Thread Daniels Umanovskis

Public bug reported:

The bug is present in QEMU 2.8.1 and, if my analysis is correct, also on
master.

I first noticed that a PL011 UART driver, which is fine on real
hardware, fails to enable the RX interrupt in the IMSC register when
running in QEMU. However, the problem only comes up if the code is
compiled without optimizations. I think I've narrowed it down to a
minimal example that will exhibit the problem if run as a bare-metal
application.

Given:

pl011_addr: .word 0x10009000

The following snippet will be problematic:

 ldr r3, pl011_addr
 ldrb r2, [r3, #0x38]// IMSC
 mov r2, #0
 orr r2, r2, #0x10   // R2 == 0x10
 strb r2, [r3, #0x38]// Whole word reads correctly after this
 ldrb r2, [r3, #0x39]
 mov r2, #0
 strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as well

After the first strb instruction, which writes to 0x10009038, everything
is fine. It can be seen in the QEMU monitor:

(qemu) xp 0x10009038
10009038: 0x0010

After the second strb instruction, the write to 0x10009039 clears the
entire word:

(qemu) xp 0x10009038
10009038: 0x

QEMU command-line, using the vexpress-a9 which has the PL011 at
0x10009000:

qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic -monitor
telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin -gdb tcp::2159
-serial mon:stdio

Compiling the original C code with optimizations makes the driver work.
It compiles down to assembly that only does a single write:

ldr r3, pl011_addr
mov r2, #0x10
str r2, [r3, #0x38]

Attached is the an assembly file, and linkscript, that shows the
problem, and also includes the working code.

I haven't debugged inside of QEMU itself but it seems to me that the
problem is in pl011_write in pl011.c - the functions looks at which
offset is being written, and then writes the entire SFR that offset
falls under, which means that changing a single byte will change the
whole SFR.

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "startup.s"
   https://bugs.launchpad.net/bugs/1809546/+attachment/5224336/+files/startup.s

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1809546

Title:
  Writing a byte to a pl011 SFR overwrites the whole SFR

Status in QEMU:
  New

Bug description:
  The bug is present in QEMU 2.8.1 and, if my analysis is correct, also
  on master.

  I first noticed that a PL011 UART driver, which is fine on real
  hardware, fails to enable the RX interrupt in the IMSC register when
  running in QEMU. However, the problem only comes up if the code is
  compiled without optimizations. I think I've narrowed it down to a
  minimal example that will exhibit the problem if run as a bare-metal
  application.

  Given:

  pl011_addr: .word 0x10009000

  The following snippet will be problematic:

   ldr r3, pl011_addr
   ldrb r2, [r3, #0x38]// IMSC
   mov r2, #0
   orr r2, r2, #0x10   // R2 == 0x10
   strb r2, [r3, #0x38]// Whole word reads correctly after this
   ldrb r2, [r3, #0x39]
   mov r2, #0
   strb r2, [r3, #0x39]// Problem here! Overwrites offset 0x38 as 
well

  After the first strb instruction, which writes to 0x10009038,
  everything is fine. It can be seen in the QEMU monitor:

  (qemu) xp 0x10009038
  10009038: 0x0010

  After the second strb instruction, the write to 0x10009039 clears the
  entire word:

  (qemu) xp 0x10009038
  10009038: 0x

  QEMU command-line, using the vexpress-a9 which has the PL011 at
  0x10009000:

  qemu-system-arm -S -M vexpress-a9 -m 32M -no-reboot -nographic
  -monitor telnet:127.0.0.1:1234,server,nowait -kernel pl011-sfr.bin
  -gdb tcp::2159 -serial mon:stdio

  Compiling the original C code with optimizations makes the driver
  work. It compiles down to assembly that only does a single write:

  ldr r3, pl011_addr
  mov r2, #0x10
  str r2, [r3, #0x38]

  Attached is the an assembly file, and linkscript, that shows the
  problem, and also includes the working code.

  I haven't debugged inside of QEMU itself but it seems to me that the
  problem is in pl011_write in pl011.c - the functions looks at which
  offset is being written, and then writes the entire SFR that offset
  falls under, which means that changing a single byte will change the
  whole SFR.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1809546/+subscriptions

Re: [Qemu-devel] [PATCH v12 01/31] block: Use bdrv_refresh_filename() to pull

2018-12-22 Thread Alberto Garcia

On Mon 17 Dec 2018 11:43:18 PM CET, Max Reitz  wrote:
> @@ -3327,6 +,7 @@ static int img_rebase(int argc, char **argv)
>  qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
>  }
>  
> +bdrv_refresh_filename(bs);
>  overlay_filename = bs->exact_filename[0] ? bs->exact_filename
>   : bs->filename;
>  out_real_path = g_malloc(PATH_MAX);
> -- 
> 2.19.2

I also doubt that these new hunks are necessary, but it doesn't hurt to
be consistent :)

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-devel] [PATCH PULL 00/31] RDMA queue

2018-12-22 Thread Marcel Apfelbaum


Hi Peter,

On 12/22/18 3:59 PM, Peter Maydell wrote:

On Sat, 22 Dec 2018 at 09:50, Marcel Apfelbaum
 wrote:

The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8:

   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' 
into staging (2018-12-21 15:49:59 +)

are available in the Git repository at:

   https://github.com/marcel-apf/qemu tags/rdma-pull-request

for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b:

   pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 
11:09:57 +0200)


RDMA queue
  * Add support for RDMA MAD
  * Various fixes for the pvrdma backend



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0
for any user-visible changes.


Done.

Thanks,
Marcel


-- PMM

Re: [Qemu-devel] [PATCH PULL 00/31] RDMA queue

2018-12-22 Thread Peter Maydell

On Sat, 22 Dec 2018 at 09:50, Marcel Apfelbaum
 wrote:
>
> The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' 
> into staging (2018-12-21 15:49:59 +)
>
> are available in the Git repository at:
>
>   https://github.com/marcel-apf/qemu tags/rdma-pull-request
>
> for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b:
>
>   pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 
> 11:09:57 +0200)
>
> 
> RDMA queue
>  * Add support for RDMA MAD
>  * Various fixes for the pvrdma backend
>
> 
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0
for any user-visible changes.

-- PMM

Re: [Qemu-devel] [PULL v4 00/35] Misc patches for 2018-12-21

2018-12-22 Thread Peter Maydell

On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini  wrote:
>
> On 21/12/18 22:09, Peter Maydell wrote:
> > I don't really understand what's going on here, or why
> > it only happens with this one system (my main x86-64
> > Linux Ubuntu 16.04.5 box) and not the various others I'm
> > running test builds on. But it does seem to be 100%
> > reliable with any of these pullreqs with the new test
> > driver in them :-(
>
> I'm afraid something in your setup is causing make's stdout to have
> O_NONBLOCK set.  Make doesn't use O_NONBLOCK at all, so it must be
> something above it.  I also checked Perl with strace and, at least here,
> it doesn't set O_NONBLOCK.
>
> So here are some ideas... First, can you try applying something like
> this to reproduce?

OK, I'll give these a go, but it'll have to be in January now.

thanks
-- PMM

[Qemu-devel] [PATCH v2] qemu: avoid memory leak while remove disk

2018-12-22 Thread w00426999

Memset vhost_dev to zero in the vhost_dev_cleanup function.
This causes dev.vqs to be NULL, so that
vqs does not free up space when calling the g_free function.
This will result in a memory leak. But you can't release vqs
directly in the vhost_dev_cleanup function, because vhost_net
will also call this function, and vhost_net's vqs is assigned by array.
In order to solve this problem, we first save the pointer of vqs,
and release the space of vqs after vhost_dev_cleanup is called.

Signed-off-by: Jian Wang 
---
 hw/block/vhost-user-blk.c | 7 +--
 hw/scsi/vhost-scsi.c  | 3 ++-
 hw/scsi/vhost-user-scsi.c | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940..c3af28f 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -250,6 +250,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
 VhostUserState *user;
+struct vhost_virtqueue *vqs = NULL;
 int i, ret;
 
 if (!s->chardev.chr) {
@@ -288,6 +289,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
 s->dev.vq_index = 0;
 s->dev.backend_features = 0;
+vqs = s->dev.vqs;
 
 vhost_dev_set_config_notifier(>dev, _ops);
 
@@ -314,7 +316,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 vhost_err:
 vhost_dev_cleanup(>dev);
 virtio_err:
-g_free(s->dev.vqs);
+g_free(vqs);
 virtio_cleanup(vdev);
 
 vhost_user_cleanup(user);
@@ -326,10 +328,11 @@ static void vhost_user_blk_device_unrealize(DeviceState 
*dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(dev);
+struct vhost_virtqueue *vqs = s->dev.vqs;
 
 vhost_user_blk_set_status(vdev, 0);
 vhost_dev_cleanup(>dev);
-g_free(s->dev.vqs);
+g_free(vqs);
 virtio_cleanup(vdev);
 
 if (s->vhost_user) {
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 7f21b4f..61e2e57 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -215,6 +215,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(dev);
+struct vhost_virtqueue *vqs = vsc->dev.vqs;
 
 migrate_del_blocker(vsc->migration_blocker);
 error_free(vsc->migration_blocker);
@@ -223,7 +224,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 vhost_scsi_set_status(vdev, 0);
 
 vhost_dev_cleanup(>dev);
-g_free(vsc->dev.vqs);
+g_free(vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
 }
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 2e1ba4a..6728878 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -121,12 +121,13 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserSCSI *s = VHOST_USER_SCSI(dev);
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+struct vhost_virtqueue *vqs = vsc->dev.vqs;
 
 /* This will stop the vhost backend. */
 vhost_user_scsi_set_status(vdev, 0);
 
 vhost_dev_cleanup(>dev);
-g_free(vsc->dev.vqs);
+g_free(vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
 
-- 
1.8.3.1

[Qemu-devel] [PATCH PULL 24/31] docs: Update pvrdma device documentation

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Interface with the device is changed with the addition of support for
MAD packets.
Adjust documentation accordingly.

While there fix a minor mistake which may lead to think that there is a
relation between using RXE on host and the compatibility with bare-metal
peers.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
Signed-off-by: Marcel Apfelbaum 
---
 docs/pvrdma.txt | 126 
 1 file changed, 107 insertions(+), 19 deletions(-)

diff --git a/docs/pvrdma.txt b/docs/pvrdma.txt
index 5599318159..5175251b47 100644
--- a/docs/pvrdma.txt
+++ b/docs/pvrdma.txt
@@ -9,8 +9,9 @@ It works with its Linux Kernel driver AS IS, no need for any 
special guest
 modifications.
 
 While it complies with the VMware device, it can also communicate with bare
-metal RDMA-enabled machines and does not require an RDMA HCA in the host, it
-can work with Soft-RoCE (rxe).
+metal RDMA-enabled machines as peers.
+
+It does not require an RDMA HCA in the host, it can work with Soft-RoCE (rxe).
 
 It does not require the whole guest RAM to be pinned allowing memory
 over-commit and, even if not implemented yet, migration support will be
@@ -78,29 +79,116 @@ the required RDMA libraries.
 
 3. Usage
 
+
+
+3.1 VM Memory settings
+==
 Currently the device is working only with memory backed RAM
 and it must be mark as "shared":
-m 1G \
-object memory-backend-ram,id=mb1,size=1G,share \
-numa node,memdev=mb1 \
 
-The pvrdma device is composed of two functions:
- - Function 0 is a vmxnet Ethernet Device which is redundant in Guest
-   but is required to pass the ibdevice GID using its MAC.
-   Examples:
- For an rxe backend using eth0 interface it will use its mac:
-   -device vmxnet3,addr=.0,multifunction=on,mac=
- For an SRIOV VF, we take the Ethernet Interface exposed by it:
-   -device vmxnet3,multifunction=on,mac=
- - Function 1 is the actual device:
-   -device 
pvrdma,addr=.1,backend-dev=,backend-gid-idx=,backend-port=
-   where the ibdevice can be rxe or RDMA VF (e.g. mlx5_4)
- Note: Pay special attention that the GID at backend-gid-idx matches vmxnet's 
MAC.
- The rules of conversion are part of the RoCE spec, but since manual conversion
- is not required, spotting problems is not hard:
-Example: GID: fe80::::7efe:90ff:fecb:743a
- MAC: 7c:fe:90:cb:74:3a
-Note the difference between the first byte of the MAC and the GID.
+
+3.2 MAD Multiplexer
+===
+MAD Multiplexer is a service that exposes MAD-like interface for VMs in
+order to overcome the limitation where only single entity can register with
+MAD layer to send and receive RDMA-CM MAD packets.
+
+To build rdmacm-mux run
+# make rdmacm-mux
+
+The application accepts 3 command line arguments and exposes a UNIX socket
+to pass control and data to it.
+-d rdma-device-name  Name of RDMA device to register with
+-s unix-socket-path  Path to unix socket to listen (default 
/var/run/rdmacm-mux)
+-p rdma-device-port  Port number of RDMA device to register with (default 1)
+The final UNIX socket file name is a concatenation of the 3 arguments so
+for example for device mlx5_0 on port 2 this /var/run/rdmacm-mux-mlx5_0-2
+will be created.
+
+pvrdma requires this service.
+
+Please refer to contrib/rdmacm-mux for more details.
+
+
+3.3 Service exposed by libvirt daemon
+=
+The control over the RDMA device's GID table is done by updating the
+device's Ethernet function addresses.
+Usually the first GID entry is determined by the MAC address, the second by
+the first IPv6 address and the third by the IPv4 address. Other entries can
+be added by adding more IP addresses. The opposite is the same, i.e.
+whenever an address is removed, the corresponding GID entry is removed.
+The process is done by the network and RDMA stacks. Whenever an address is
+added the ib_core driver is notified and calls the device driver add_gid
+function which in turn update the device.
+To support this in pvrdma device the device hooks into the create_bind and
+destroy_bind HW commands triggered by pvrdma driver in guest.
+
+Whenever changed is made to the pvrdma port's GID table a special QMP
+messages is sent to be processed by libvirt to update the address of the
+backend Ethernet device.
+
+pvrdma requires that libvirt service will be up.
+
+
+3.4 PCI devices settings
+
+RoCE device exposes two functions - an Ethernet and RDMA.
+To support it, pvrdma device is composed of two PCI functions, an Ethernet
+device of type vmxnet3 on PCI slot 0 and a PVRDMA device on PCI slot 1. The
+Ethernet function can be used for other Ethernet purposes such as IP.
+
+
+3.5 Device parameters
+=
+- netdev: Specifies the Ethernet device function name on the host for
+  example enp175s0f0. For Soft-RoCE device (rxe) this would be the Ethernet
+  device used to create it.
+-

[Qemu-devel] [PATCH PULL 25/31] pvrdma: release device resources in case of an error

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

If during pvrdma device initialisation an error occurs,
pvrdma_realize() does not release memory resources, leading
to memory leakage.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Message-Id: <20181212175817.815-1-ppan...@redhat.com>
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 23dc9926e3..64de16fb52 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -573,7 +573,7 @@ static void pvrdma_shutdown_notifier(Notifier *n, void 
*opaque)
 
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
-int rc;
+int rc = 0;
 PVRDMADev *dev = PVRDMA_DEV(pdev);
 Object *memdev_root;
 bool ram_shared = false;
@@ -649,6 +649,7 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 
 out:
 if (rc) {
+pvrdma_fini(pdev);
 error_append_hint(errp, "Device fail to load\n");
 }
 }
-- 
2.17.1

[Qemu-devel] [PATCH PULL 23/31] hw/rdma: Do not call rdma_backend_del_gid on an empty gid

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

When device goes down the function fini_ports loops over all entries in
gid table regardless of the fact whether entry is valid or not. In case
that entry is not valid we'd like to skip from any further processing in
backend device.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_rm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index ca127c8c26..f5b1295890 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -555,6 +555,10 @@ int rdma_rm_del_gid(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
 {
 int rc;
 
+if (!dev_res->port.gid_tbl[gid_idx].gid.global.interface_id) {
+return 0;
+}
+
 rc = rdma_backend_del_gid(backend_dev, ifname,
   _res->port.gid_tbl[gid_idx].gid);
 if (rc) {
-- 
2.17.1

[Qemu-devel] [PATCH PULL 31/31] pvrdma: check return value from pvrdma_idx_ring_has_ routines

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

pvrdma_idx_ring_has_[data/space] routines also return invalid
index PVRDMA_INVALID_IDX[=-1], if ring has no data/space. Check
return value from these routines to avoid plausible infinite loops.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_dev_ring.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_dev_ring.c b/hw/rdma/vmw/pvrdma_dev_ring.c
index 01247fc041..e8e5b502f6 100644
--- a/hw/rdma/vmw/pvrdma_dev_ring.c
+++ b/hw/rdma/vmw/pvrdma_dev_ring.c
@@ -73,23 +73,16 @@ out:
 
 void *pvrdma_ring_next_elem_read(PvrdmaRing *ring)
 {
+int e;
 unsigned int idx = 0, offset;
 
-/*
-pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
-   ring->ring_state->cons_head);
-*/
-
-if (!pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, )) {
+e = pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, );
+if (e <= 0) {
 pr_dbg("No more data in ring\n");
 return NULL;
 }
 
 offset = idx * ring->elem_sz;
-/*
-pr_dbg("idx=%d\n", idx);
-pr_dbg("offset=%d\n", offset);
-*/
 return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % 
TARGET_PAGE_SIZE);
 }
 
@@ -105,20 +98,20 @@ void pvrdma_ring_read_inc(PvrdmaRing *ring)
 
 void *pvrdma_ring_next_elem_write(PvrdmaRing *ring)
 {
-unsigned int idx, offset, tail;
+int idx;
+unsigned int offset, tail;
 
-/*
-pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
-   ring->ring_state->cons_head);
-*/
-
-if (!pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems, )) {
+idx = pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems, );
+if (idx <= 0) {
 pr_dbg("CQ is full\n");
 return NULL;
 }
 
 idx = pvrdma_idx(>ring_state->prod_tail, ring->max_elems);
-/* TODO: tail == idx */
+if (idx < 0 || tail != idx) {
+pr_dbg("invalid idx\n");
+return NULL;
+}
 
 offset = idx * ring->elem_sz;
 return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % 
TARGET_PAGE_SIZE);
-- 
2.17.1

[Qemu-devel] [PATCH PULL 28/31] pvrdma: check number of pages when creating rings

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

When creating CQ/QP rings, an object can have up to
PVRDMA_MAX_FAST_REG_PAGES 8 pages. Check 'npages' parameter
to avoid excessive memory allocation or a null dereference.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_cmd.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index 3b94545761..f236ac4795 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -259,6 +259,11 @@ static int create_cq_ring(PCIDevice *pci_dev , PvrdmaRing 
**ring,
 int rc = -EINVAL;
 char ring_name[MAX_RING_NAME_SZ];
 
+if (!nchunks || nchunks > PVRDMA_MAX_FAST_REG_PAGES) {
+pr_dbg("invalid nchunks: %d\n", nchunks);
+return rc;
+}
+
 pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)pdir_dma);
 dir = rdma_pci_dma_map(pci_dev, pdir_dma, TARGET_PAGE_SIZE);
 if (!dir) {
@@ -372,6 +377,12 @@ static int create_qp_rings(PCIDevice *pci_dev, uint64_t 
pdir_dma,
 char ring_name[MAX_RING_NAME_SZ];
 uint32_t wqe_sz;
 
+if (!spages || spages > PVRDMA_MAX_FAST_REG_PAGES
+|| !rpages || rpages > PVRDMA_MAX_FAST_REG_PAGES) {
+pr_dbg("invalid pages: %d, %d\n", spages, rpages);
+return rc;
+}
+
 pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)pdir_dma);
 dir = rdma_pci_dma_map(pci_dev, pdir_dma, TARGET_PAGE_SIZE);
 if (!dir) {
-- 
2.17.1

[Qemu-devel] [PATCH PULL 27/31] pvrdma: add uar_read routine

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

Define skeleton 'uar_read' routine. Avoid NULL dereference.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 64de16fb52..838ad8a949 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -448,6 +448,11 @@ static const MemoryRegionOps regs_ops = {
 },
 };
 
+static uint64_t uar_read(void *opaque, hwaddr addr, unsigned size)
+{
+return 0x;
+}
+
 static void uar_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
 PVRDMADev *dev = opaque;
@@ -489,6 +494,7 @@ static void uar_write(void *opaque, hwaddr addr, uint64_t 
val, unsigned size)
 }
 
 static const MemoryRegionOps uar_ops = {
+.read = uar_read,
 .write = uar_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
-- 
2.17.1

[Qemu-devel] [PATCH PULL 30/31] rdma: remove unused VENDOR_ERR_NO_SGE macro

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

With commit 4481985c (rdma: check num_sge does not exceed MAX_SGE)
macro VENDOR_ERR_NO_SGE is no longer in use - delete it.

Signed-off-by: Prasad J Pandit 
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index bd4710d16f..c28bfbd44d 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -37,12 +37,11 @@
 #define VENDOR_ERR_TOO_MANY_SGES0x202
 #define VENDOR_ERR_NOMEM0x203
 #define VENDOR_ERR_QP0  0x204
-#define VENDOR_ERR_NO_SGE   0x205
+#define VENDOR_ERR_INV_NUM_SGE  0x205
 #define VENDOR_ERR_MAD_SEND 0x206
 #define VENDOR_ERR_INVLKEY  0x207
 #define VENDOR_ERR_MR_SMALL 0x208
 #define VENDOR_ERR_INV_MAD_BUFF 0x209
-#define VENDOR_ERR_INV_NUM_SGE  0x210
 
 #define THR_NAME_LEN 16
 #define THR_POLL_TO  5000
-- 
2.17.1

[Qemu-devel] [PATCH PULL 26/31] rdma: check num_sge does not exceed MAX_SGE

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

rdma back-end has scatter/gather array ibv_sge[MAX_SGE=4] set
to have 4 elements. A guest could send a 'PvrdmaSqWqe' ring element
with 'num_sge' set to > MAX_SGE, which may lead to OOB access issue.
Add check to avoid it.

Reported-by: Saar Amar 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index ae1e4dcb29..bd4710d16f 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -476,9 +476,9 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 }
 
 pr_dbg("num_sge=%d\n", num_sge);
-if (!num_sge) {
-pr_dbg("num_sge=0\n");
-complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx);
+if (!num_sge || num_sge > MAX_SGE) {
+pr_dbg("invalid num_sge=%d\n", num_sge);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx);
 return;
 }
 
@@ -603,9 +603,9 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 }
 
 pr_dbg("num_sge=%d\n", num_sge);
-if (!num_sge) {
-pr_dbg("num_sge=0\n");
-complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx);
+if (!num_sge || num_sge > MAX_SGE) {
+pr_dbg("invalid num_sge=%d\n", num_sge);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx);
 return;
 }
 
-- 
2.17.1

[Qemu-devel] [PATCH PULL 22/31] hw/rdma: Do not use bitmap_zero_extend to free bitmap

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

bitmap_zero_extend is designed to work for extending, not for
shrinking.
Using g_free instead.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_rm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index b7d4ebe972..ca127c8c26 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -43,7 +43,7 @@ static inline void res_tbl_free(RdmaRmResTbl *tbl)
 {
 qemu_mutex_destroy(>lock);
 g_free(tbl->tbl);
-bitmap_zero_extend(tbl->bitmap, tbl->tbl_sz, 0);
+g_free(tbl->bitmap);
 }
 
 static inline void *res_tbl_get(RdmaRmResTbl *tbl, uint32_t handle)
-- 
2.17.1

[Qemu-devel] [PATCH PULL 21/31] hw/pvrdma: Clean device's resource when system is shutdown

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

In order to clean some external resources such as GIDs, QPs etc,
register to receive notification when VM is shutdown.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma.h  |  2 ++
 hw/rdma/vmw/pvrdma_main.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h
index 10a3c4fb7c..ffae36986e 100644
--- a/hw/rdma/vmw/pvrdma.h
+++ b/hw/rdma/vmw/pvrdma.h
@@ -17,6 +17,7 @@
 #define PVRDMA_PVRDMA_H
 
 #include "qemu/units.h"
+#include "qemu/notify.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/msix.h"
 #include "chardev/char-fe.h"
@@ -87,6 +88,7 @@ typedef struct PVRDMADev {
 RdmaDeviceResources rdma_dev_res;
 CharBackend mad_chr;
 VMXNET3State *func0;
+Notifier shutdown_notifier;
 } PVRDMADev;
 #define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME)
 
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 150404dfa6..23dc9926e3 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -24,6 +24,7 @@
 #include "hw/qdev-properties.h"
 #include "cpu.h"
 #include "trace.h"
+#include "sysemu/sysemu.h"
 
 #include "../rdma_rm.h"
 #include "../rdma_backend.h"
@@ -334,6 +335,9 @@ static void pvrdma_fini(PCIDevice *pdev)
 if (msix_enabled(pdev)) {
 uninit_msix(pdev, RDMA_MAX_INTRS);
 }
+
+pr_dbg("Device %s %x.%x is down\n", pdev->name, PCI_SLOT(pdev->devfn),
+   PCI_FUNC(pdev->devfn));
 }
 
 static void pvrdma_stop(PVRDMADev *dev)
@@ -559,6 +563,14 @@ static int pvrdma_check_ram_shared(Object *obj, void 
*opaque)
 return 0;
 }
 
+static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
+{
+PVRDMADev *dev = container_of(n, PVRDMADev, shutdown_notifier);
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+pvrdma_fini(pci_dev);
+}
+
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
 int rc;
@@ -632,6 +644,9 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 goto out;
 }
 
+dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
+qemu_register_shutdown_notifier(>shutdown_notifier);
+
 out:
 if (rc) {
 error_append_hint(errp, "Device fail to load\n");
-- 
2.17.1

[Qemu-devel] [PATCH PULL 20/31] vl: Introduce shutdown_notifiers

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Notifier will be used for signaling shutdown event to inform system is
shutdown. This will allow devices and other component to run some
cleanup code needed before VM is shutdown.

Signed-off-by: Yuval Shaia 
Reviewed-by: Cornelia Huck 
Signed-off-by: Marcel Apfelbaum 
---
 include/sysemu/sysemu.h |  1 +
 vl.c| 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c8efdeb376..e0d15da937 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -62,6 +62,7 @@ void qemu_register_wakeup_support(void);
 void qemu_system_shutdown_request(ShutdownCause reason);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
+void qemu_register_shutdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
 void qemu_system_vmstop_request(RunState reason);
 void qemu_system_vmstop_request_prepare(void);
diff --git a/vl.c b/vl.c
index 46ebf813b3..8353d3c718 100644
--- a/vl.c
+++ b/vl.c
@@ -1577,6 +1577,8 @@ static NotifierList suspend_notifiers =
 NOTIFIER_LIST_INITIALIZER(suspend_notifiers);
 static NotifierList wakeup_notifiers =
 NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
+static NotifierList shutdown_notifiers =
+NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
 ShutdownCause qemu_shutdown_requested_get(void)
@@ -1828,6 +1830,12 @@ static void qemu_system_powerdown(void)
 notifier_list_notify(_notifiers, NULL);
 }
 
+static void qemu_system_shutdown(ShutdownCause cause)
+{
+qapi_event_send_shutdown(shutdown_caused_by_guest(cause), cause);
+notifier_list_notify(_notifiers, );
+}
+
 void qemu_system_powerdown_request(void)
 {
 trace_qemu_system_powerdown_request();
@@ -1840,6 +1848,11 @@ void qemu_register_powerdown_notifier(Notifier *notifier)
 notifier_list_add(_notifiers, notifier);
 }
 
+void qemu_register_shutdown_notifier(Notifier *notifier)
+{
+notifier_list_add(_notifiers, notifier);
+}
+
 void qemu_system_debug_request(void)
 {
 debug_requested = 1;
@@ -1867,7 +1880,7 @@ static bool main_loop_should_exit(void)
 request = qemu_shutdown_requested();
 if (request) {
 qemu_kill_report();
-qapi_event_send_shutdown(shutdown_caused_by_guest(request), request);
+qemu_system_shutdown(request);
 if (no_shutdown) {
 vm_stop(RUN_STATE_SHUTDOWN);
 } else {
-- 
2.17.1

[Qemu-devel] [PATCH PULL 29/31] pvrdma: release ring object in case of an error

2018-12-22 Thread Marcel Apfelbaum

From: Prasad J Pandit 

create_cq and create_qp routines allocate ring object, but it's
not released in case of an error, leading to memory leakage.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Yuval Shaia 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_cmd.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index f236ac4795..89920887bf 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -313,6 +313,14 @@ out:
 return rc;
 }
 
+static void destroy_cq_ring(PvrdmaRing *ring)
+{
+pvrdma_ring_free(ring);
+/* ring_state was in slot 1, not 0 so need to jump back */
+rdma_pci_dma_unmap(ring->dev, --ring->ring_state, TARGET_PAGE_SIZE);
+g_free(ring);
+}
+
 static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req,
  union pvrdma_cmd_resp *rsp)
 {
@@ -335,6 +343,10 @@ static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 
 rc = rdma_rm_alloc_cq(>rdma_dev_res, >backend_dev, cmd->cqe,
   >cq_handle, ring);
+if (rc) {
+destroy_cq_ring(ring);
+}
+
 resp->cqe = cmd->cqe;
 
 return rc;
@@ -356,10 +368,7 @@ static int destroy_cq(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 }
 
 ring = (PvrdmaRing *)cq->opaque;
-pvrdma_ring_free(ring);
-/* ring_state was in slot 1, not 0 so need to jump back */
-rdma_pci_dma_unmap(PCI_DEVICE(dev), --ring->ring_state, TARGET_PAGE_SIZE);
-g_free(ring);
+destroy_cq_ring(ring);
 
 rdma_rm_dealloc_cq(>rdma_dev_res, cmd->cq_handle);
 
@@ -457,6 +466,17 @@ out:
 return rc;
 }
 
+static void destroy_qp_rings(PvrdmaRing *ring)
+{
+pr_dbg("sring=%p\n", [0]);
+pvrdma_ring_free([0]);
+pr_dbg("rring=%p\n", [1]);
+pvrdma_ring_free([1]);
+
+rdma_pci_dma_unmap(ring->dev, ring->ring_state, TARGET_PAGE_SIZE);
+g_free(ring);
+}
+
 static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
  union pvrdma_cmd_resp *rsp)
 {
@@ -486,6 +506,7 @@ static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
   cmd->max_recv_sge, cmd->recv_cq_handle, rings,
   >qpn);
 if (rc) {
+destroy_qp_rings(rings);
 return rc;
 }
 
@@ -558,13 +579,7 @@ static int destroy_qp(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 rdma_rm_dealloc_qp(>rdma_dev_res, cmd->qp_handle);
 
 ring = (PvrdmaRing *)qp->opaque;
-pr_dbg("sring=%p\n", [0]);
-pvrdma_ring_free([0]);
-pr_dbg("rring=%p\n", [1]);
-pvrdma_ring_free([1]);
-
-rdma_pci_dma_unmap(PCI_DEVICE(dev), ring->ring_state, TARGET_PAGE_SIZE);
-g_free(ring);
+destroy_qp_rings(ring);
 
 return 0;
 }
-- 
2.17.1

[Qemu-devel] [PATCH PULL 19/31] hw/rdma: Remove unneeded code that handles more that one port

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Device supports only one port, let's remove a dead code that handles
more than one port.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_rm.c  | 34 --
 hw/rdma/rdma_rm.h  |  2 +-
 hw/rdma/rdma_rm_defs.h |  4 ++--
 3 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index 250254561c..b7d4ebe972 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -545,7 +545,7 @@ int rdma_rm_add_gid(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
 return -EINVAL;
 }
 
-memcpy(_res->ports[0].gid_tbl[gid_idx].gid, gid, sizeof(*gid));
+memcpy(_res->port.gid_tbl[gid_idx].gid, gid, sizeof(*gid));
 
 return 0;
 }
@@ -556,15 +556,15 @@ int rdma_rm_del_gid(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
 int rc;
 
 rc = rdma_backend_del_gid(backend_dev, ifname,
-  _res->ports[0].gid_tbl[gid_idx].gid);
+  _res->port.gid_tbl[gid_idx].gid);
 if (rc) {
 pr_dbg("Fail to delete gid\n");
 return -EINVAL;
 }
 
-memset(dev_res->ports[0].gid_tbl[gid_idx].gid.raw, 0,
-   sizeof(dev_res->ports[0].gid_tbl[gid_idx].gid));
-dev_res->ports[0].gid_tbl[gid_idx].backend_gid_index = -1;
+memset(dev_res->port.gid_tbl[gid_idx].gid.raw, 0,
+   sizeof(dev_res->port.gid_tbl[gid_idx].gid));
+dev_res->port.gid_tbl[gid_idx].backend_gid_index = -1;
 
 return 0;
 }
@@ -577,16 +577,16 @@ int rdma_rm_get_backend_gid_index(RdmaDeviceResources 
*dev_res,
 return -EINVAL;
 }
 
-if (unlikely(dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index == -1)) 
{
-dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index =
+if (unlikely(dev_res->port.gid_tbl[sgid_idx].backend_gid_index == -1)) {
+dev_res->port.gid_tbl[sgid_idx].backend_gid_index =
 rdma_backend_get_gid_index(backend_dev,
-   _res->ports[0].gid_tbl[sgid_idx].gid);
+   _res->port.gid_tbl[sgid_idx].gid);
 }
 
 pr_dbg("backend_gid_index=%d\n",
-   dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index);
+   dev_res->port.gid_tbl[sgid_idx].backend_gid_index);
 
-return dev_res->ports[0].gid_tbl[sgid_idx].backend_gid_index;
+return dev_res->port.gid_tbl[sgid_idx].backend_gid_index;
 }
 
 static void destroy_qp_hash_key(gpointer data)
@@ -596,15 +596,13 @@ static void destroy_qp_hash_key(gpointer data)
 
 static void init_ports(RdmaDeviceResources *dev_res)
 {
-int i, j;
+int i;
 
-memset(dev_res->ports, 0, sizeof(dev_res->ports));
+memset(_res->port, 0, sizeof(dev_res->port));
 
-for (i = 0; i < MAX_PORTS; i++) {
-dev_res->ports[i].state = IBV_PORT_DOWN;
-for (j = 0; j < MAX_PORT_GIDS; j++) {
-dev_res->ports[i].gid_tbl[j].backend_gid_index = -1;
-}
+dev_res->port.state = IBV_PORT_DOWN;
+for (i = 0; i < MAX_PORT_GIDS; i++) {
+dev_res->port.gid_tbl[i].backend_gid_index = -1;
 }
 }
 
@@ -613,7 +611,7 @@ static void fini_ports(RdmaDeviceResources *dev_res,
 {
 int i;
 
-dev_res->ports[0].state = IBV_PORT_DOWN;
+dev_res->port.state = IBV_PORT_DOWN;
 for (i = 0; i < MAX_PORT_GIDS; i++) {
 rdma_rm_del_gid(dev_res, backend_dev, ifname, i);
 }
diff --git a/hw/rdma/rdma_rm.h b/hw/rdma/rdma_rm.h
index a7169b4e89..3c602c04c0 100644
--- a/hw/rdma/rdma_rm.h
+++ b/hw/rdma/rdma_rm.h
@@ -79,7 +79,7 @@ int rdma_rm_get_backend_gid_index(RdmaDeviceResources 
*dev_res,
 static inline union ibv_gid *rdma_rm_get_gid(RdmaDeviceResources *dev_res,
  int sgid_idx)
 {
-return _res->ports[0].gid_tbl[sgid_idx].gid;
+return _res->port.gid_tbl[sgid_idx].gid;
 }
 
 #endif
diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h
index 7b3435f991..0ba61d1838 100644
--- a/hw/rdma/rdma_rm_defs.h
+++ b/hw/rdma/rdma_rm_defs.h
@@ -18,7 +18,7 @@
 
 #include "rdma_backend_defs.h"
 
-#define MAX_PORTS 1
+#define MAX_PORTS 1 /* Do not change - we support only one port */
 #define MAX_PORT_GIDS 255
 #define MAX_GIDS  MAX_PORT_GIDS
 #define MAX_PORT_PKEYS1
@@ -97,7 +97,7 @@ typedef struct RdmaRmPort {
 } RdmaRmPort;
 
 typedef struct RdmaDeviceResources {
-RdmaRmPort ports[MAX_PORTS];
+RdmaRmPort port;
 RdmaRmResTbl pd_tbl;
 RdmaRmResTbl mr_tbl;
 RdmaRmResTbl uc_tbl;
-- 
2.17.1

[Qemu-devel] [PATCH PULL 18/31] hw/pvrdma: Fill error code in command's response

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Driver checks error code let's set it.
In addition, for code simplification purposes, set response's fields
ack, response and err outside of the scope of command handlers.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_cmd.c | 197 ++-
 1 file changed, 90 insertions(+), 107 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index 0d3c818c20..3b94545761 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -128,6 +128,9 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 struct pvrdma_port_attr attrs = {0};
 
 pr_dbg("port=%d\n", cmd->port_num);
+if (cmd->port_num > MAX_PORTS) {
+return -EINVAL;
+}
 
 if (rdma_backend_query_port(>backend_dev,
 (struct ibv_port_attr *))) {
@@ -135,9 +138,6 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 }
 
 memset(resp, 0, sizeof(*resp));
-resp->hdr.response = cmd->hdr.response;
-resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP;
-resp->hdr.err = 0;
 
 resp->attrs.state = dev->func0->device_active ? attrs.state :
 PVRDMA_PORT_DOWN;
@@ -160,12 +160,16 @@ static int query_pkey(PVRDMADev *dev, union 
pvrdma_cmd_req *req,
 struct pvrdma_cmd_query_pkey_resp *resp = >query_pkey_resp;
 
 pr_dbg("port=%d\n", cmd->port_num);
+if (cmd->port_num > MAX_PORTS) {
+return -EINVAL;
+}
+
 pr_dbg("index=%d\n", cmd->index);
+if (cmd->index > MAX_PKEYS) {
+return -EINVAL;
+}
 
 memset(resp, 0, sizeof(*resp));
-resp->hdr.response = cmd->hdr.response;
-resp->hdr.ack = PVRDMA_CMD_QUERY_PKEY_RESP;
-resp->hdr.err = 0;
 
 resp->pkey = PVRDMA_PKEY;
 pr_dbg("pkey=0x%x\n", resp->pkey);
@@ -178,17 +182,15 @@ static int create_pd(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 {
 struct pvrdma_cmd_create_pd *cmd = >create_pd;
 struct pvrdma_cmd_create_pd_resp *resp = >create_pd_resp;
+int rc;
 
 pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0);
 
 memset(resp, 0, sizeof(*resp));
-resp->hdr.response = cmd->hdr.response;
-resp->hdr.ack = PVRDMA_CMD_CREATE_PD_RESP;
-resp->hdr.err = rdma_rm_alloc_pd(>rdma_dev_res, >backend_dev,
- >pd_handle, cmd->ctx_handle);
+rc = rdma_rm_alloc_pd(>rdma_dev_res, >backend_dev,
+  >pd_handle, cmd->ctx_handle);
 
-pr_dbg("ret=%d\n", resp->hdr.err);
-return resp->hdr.err;
+return rc;
 }
 
 static int destroy_pd(PVRDMADev *dev, union pvrdma_cmd_req *req,
@@ -210,10 +212,9 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 struct pvrdma_cmd_create_mr_resp *resp = >create_mr_resp;
 PCIDevice *pci_dev = PCI_DEVICE(dev);
 void *host_virt = NULL;
+int rc = 0;
 
 memset(resp, 0, sizeof(*resp));
-resp->hdr.response = cmd->hdr.response;
-resp->hdr.ack = PVRDMA_CMD_CREATE_MR_RESP;
 
 pr_dbg("pd_handle=%d\n", cmd->pd_handle);
 pr_dbg("access_flags=0x%x\n", cmd->access_flags);
@@ -224,22 +225,18 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
cmd->length);
 if (!host_virt) {
 pr_dbg("Failed to map to pdir\n");
-resp->hdr.err = -EINVAL;
-goto out;
+return -EINVAL;
 }
 }
 
-resp->hdr.err = rdma_rm_alloc_mr(>rdma_dev_res, cmd->pd_handle,
- cmd->start, cmd->length, host_virt,
- cmd->access_flags, >mr_handle,
- >lkey, >rkey);
-if (resp->hdr.err && host_virt) {
+rc = rdma_rm_alloc_mr(>rdma_dev_res, cmd->pd_handle, cmd->start,
+  cmd->length, host_virt, cmd->access_flags,
+  >mr_handle, >lkey, >rkey);
+if (rc && host_virt) {
 munmap(host_virt, cmd->length);
 }
 
-out:
-pr_dbg("ret=%d\n", resp->hdr.err);
-return resp->hdr.err;
+return rc;
 }
 
 static int destroy_mr(PVRDMADev *dev, union pvrdma_cmd_req *req,
@@ -317,28 +314,25 @@ static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 struct pvrdma_cmd_create_cq *cmd = >create_cq;
 struct pvrdma_cmd_create_cq_resp *resp = >create_cq_resp;
 PvrdmaRing *ring = NULL;
+int rc;
 
 memset(resp, 0, sizeof(*resp));
-resp->hdr.response = cmd->hdr.response;
-resp->hdr.ack = PVRDMA_CMD_CREATE_CQ_RESP;
 
 resp->cqe = cmd->cqe;
 
-resp->hdr.err = create_cq_ring(PCI_DEVICE(dev), , cmd->pdir_dma,
-   cmd->nchunks, cmd->cqe);
-if (resp->hdr.err) {
-goto out;
+rc = create_cq_ring(PCI_DEVICE(dev), , cmd->pdir_dma, cmd->nchunks,
+cmd->cqe);
+if (rc) {
+

[Qemu-devel] [PATCH PULL 16/31] hw/pvrdma: Make device state depend on Ethernet function state

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

User should be able to control the device by changing Ethernet function
state so if user runs 'ifconfig ens3 down' the PVRDMA function should be
down as well.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_cmd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index 2979582fac..0d3c818c20 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -139,7 +139,8 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
 resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP;
 resp->hdr.err = 0;
 
-resp->attrs.state = attrs.state;
+resp->attrs.state = dev->func0->device_active ? attrs.state :
+PVRDMA_PORT_DOWN;
 resp->attrs.max_mtu = attrs.max_mtu;
 resp->attrs.active_mtu = attrs.active_mtu;
 resp->attrs.phys_state = attrs.phys_state;
-- 
2.17.1

[Qemu-devel] [PATCH PULL 12/31] hw/pvrdma: Add support to allow guest to configure GID table

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

The control over the RDMA device's GID table is done by updating the
device's Ethernet function addresses.
Usually the first GID entry is determined by the MAC address, the second
by the first IPv6 address and the third by the IPv4 address. Other
entries can be added by adding more IP addresses. The opposite is the
same, i.e. whenever an address is removed, the corresponding GID entry
is removed.

The process is done by the network and RDMA stacks. Whenever an address
is added the ib_core driver is notified and calls the device driver
add_gid function which in turn update the device.

To support this in pvrdma device we need to hook into the create_bind
and destroy_bind HW commands triggered by pvrdma driver in guest.
Whenever a change is made to the pvrdma port's GID table a special QMP
message is sent to be processed by libvirt to update the address of the
backend Ethernet device.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c  | 344 +---
 hw/rdma/rdma_backend.h  |  22 +--
 hw/rdma/rdma_backend_defs.h |  11 +-
 hw/rdma/rdma_rm.c   | 104 ++-
 hw/rdma/rdma_rm.h   |  17 +-
 hw/rdma/rdma_rm_defs.h  |   9 +-
 hw/rdma/rdma_utils.h|  16 ++
 hw/rdma/vmw/pvrdma.h|   2 +-
 hw/rdma/vmw/pvrdma_cmd.c|  55 +++---
 hw/rdma/vmw/pvrdma_main.c   |  25 +--
 hw/rdma/vmw/pvrdma_qp_ops.c |  20 +++
 11 files changed, 462 insertions(+), 163 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index c6dedda555..1d496bbd95 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -15,15 +15,18 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qnum.h"
+#include "qapi/qapi-events-rdma.h"
 
 #include 
 #include 
 #include 
 #include 
 
+#include "contrib/rdmacm-mux/rdmacm-mux.h"
 #include "trace.h"
 #include "rdma_utils.h"
 #include "rdma_rm.h"
@@ -160,6 +163,77 @@ static void *comp_handler_thread(void *arg)
 return NULL;
 }
 
+static inline void disable_rdmacm_mux_async(RdmaBackendDev *backend_dev)
+{
+atomic_set(_dev->rdmacm_mux.can_receive, 0);
+}
+
+static inline void enable_rdmacm_mux_async(RdmaBackendDev *backend_dev)
+{
+atomic_set(_dev->rdmacm_mux.can_receive, sizeof(RdmaCmMuxMsg));
+}
+
+static inline int rdmacm_mux_can_process_async(RdmaBackendDev *backend_dev)
+{
+return atomic_read(_dev->rdmacm_mux.can_receive);
+}
+
+static int check_mux_op_status(CharBackend *mad_chr_be)
+{
+RdmaCmMuxMsg msg = {0};
+int ret;
+
+pr_dbg("Reading response\n");
+ret = qemu_chr_fe_read_all(mad_chr_be, (uint8_t *), sizeof(msg));
+if (ret != sizeof(msg)) {
+pr_dbg("Invalid message size %d, expecting %ld\n", ret, sizeof(msg));
+return -EIO;
+}
+
+pr_dbg("msg_type=%d\n", msg.hdr.msg_type);
+pr_dbg("op_code=%d\n", msg.hdr.op_code);
+pr_dbg("err_code=%d\n", msg.hdr.err_code);
+
+if (msg.hdr.msg_type != RDMACM_MUX_MSG_TYPE_RESP) {
+pr_dbg("Invalid message type %d\n", msg.hdr.msg_type);
+return -EIO;
+}
+
+if (msg.hdr.err_code != RDMACM_MUX_ERR_CODE_OK) {
+pr_dbg("Operation failed in mux, error code %d\n", msg.hdr.err_code);
+return -EIO;
+}
+
+return 0;
+}
+
+static int exec_rdmacm_mux_req(RdmaBackendDev *backend_dev, RdmaCmMuxMsg *msg)
+{
+int rc = 0;
+
+pr_dbg("Executing request %d\n", msg->hdr.op_code);
+
+msg->hdr.msg_type = RDMACM_MUX_MSG_TYPE_REQ;
+disable_rdmacm_mux_async(backend_dev);
+rc = qemu_chr_fe_write(backend_dev->rdmacm_mux.chr_be,
+   (const uint8_t *)msg, sizeof(*msg));
+if (rc != sizeof(*msg)) {
+enable_rdmacm_mux_async(backend_dev);
+pr_dbg("Fail to send request to rdmacm_mux (rc=%d)\n", rc);
+return -EIO;
+}
+
+rc = check_mux_op_status(backend_dev->rdmacm_mux.chr_be);
+if (rc) {
+pr_dbg("Fail to execute rdmacm_mux request %d (rc=%d)\n",
+   msg->hdr.op_code, rc);
+}
+
+enable_rdmacm_mux_async(backend_dev);
+
+return 0;
+}
+
 static void stop_backend_thread(RdmaBackendThread *thread)
 {
 thread->run = false;
@@ -300,11 +374,11 @@ static int build_host_sge_array(RdmaDeviceResources 
*rdma_dev_res,
 return 0;
 }
 
-static int mad_send(RdmaBackendDev *backend_dev, struct ibv_sge *sge,
-uint32_t num_sge)
+static int mad_send(RdmaBackendDev *backend_dev, uint8_t sgid_idx,
+union ibv_gid *sgid, struct ibv_sge *sge, uint32_t num_sge)
 {
-struct backend_umad umad = {0};
-char *hdr, *msg;
+RdmaCmMuxMsg msg = {0};
+char *hdr, *data;
 int ret;
 
 pr_dbg("num_sge=%d\n", num_sge);
@@ -313,26 +387,31 @@ static int mad_send(RdmaBackendDev *backend_dev, struct 
ibv_sge *sge,
 return -EINVAL;
 }
 
-

[Qemu-devel] [PATCH PULL 15/31] hw/rdma: Initialize node_guid from vmxnet3 mac address

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

node_guid should be set once device is load.
Make node_guid be GID format (32 bit) of PCI function 0 vmxnet3 device's
MAC.

A new function was added to do the conversion.
So for example the MAC 56:b6:44:e9:62:dc will be converted to GID
54b6:44ff:fee9:62dc.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_utils.h  |  9 +
 hw/rdma/vmw/pvrdma_cmd.c  | 10 --
 hw/rdma/vmw/pvrdma_main.c |  5 -
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/hw/rdma/rdma_utils.h b/hw/rdma/rdma_utils.h
index 062e2cd688..4490ea0b94 100644
--- a/hw/rdma/rdma_utils.h
+++ b/hw/rdma/rdma_utils.h
@@ -63,4 +63,13 @@ extern unsigned long pr_dbg_cnt;
 void *rdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen);
 void rdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len);
 
+static inline void addrconf_addr_eui48(uint8_t *eui, const char *addr)
+{
+memcpy(eui, addr, 3);
+eui[3] = 0xFF;
+eui[4] = 0xFE;
+memcpy(eui + 5, addr + 3, 3);
+eui[0] ^= 2;
+}
+
 #endif
diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index a334f6205e..2979582fac 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -592,16 +592,6 @@ static int create_bind(PVRDMADev *dev, union 
pvrdma_cmd_req *req,
 return -EINVAL;
 }
 
-/* TODO: Since drivers stores node_guid at load_dsr phase then this
- * assignment is not relevant, i need to figure out a way how to
- * retrieve MAC of our netdev */
-if (!cmd->index) {
-dev->node_guid =
-dev->rdma_dev_res.ports[0].gid_tbl[0].gid.global.interface_id;
-pr_dbg("dev->node_guid=0x%llx\n",
-   (long long unsigned int)be64_to_cpu(dev->node_guid));
-}
-
 return 0;
 }
 
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index b35b5dc5f0..150404dfa6 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -264,7 +264,7 @@ static void init_dsr_dev_caps(PVRDMADev *dev)
 dsr->caps.sys_image_guid = 0;
 pr_dbg("sys_image_guid=%" PRIx64 "\n", dsr->caps.sys_image_guid);
 
-dsr->caps.node_guid = cpu_to_be64(dev->node_guid);
+dsr->caps.node_guid = dev->node_guid;
 pr_dbg("node_guid=%" PRIx64 "\n", be64_to_cpu(dsr->caps.node_guid));
 
 dsr->caps.phys_port_cnt = MAX_PORTS;
@@ -588,6 +588,9 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 }
 dev->func0 = VMXNET3(func0);
 
+addrconf_addr_eui48((unsigned char *)>node_guid,
+(const char *)>func0->conf.macaddr.a);
+
 memdev_root = object_resolve_path("/objects", NULL);
 if (memdev_root) {
 object_child_foreach(memdev_root, pvrdma_check_ram_shared, 
_shared);
-- 
2.17.1

[Qemu-devel] [PATCH PULL 07/31] hw/pvrdma: Make function reset_device return void

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

This function cannot fail - fix it to return void

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 6c8c0154fa..fc2abd34af 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -369,13 +369,11 @@ static int unquiesce_device(PVRDMADev *dev)
 return 0;
 }
 
-static int reset_device(PVRDMADev *dev)
+static void reset_device(PVRDMADev *dev)
 {
 pvrdma_stop(dev);
 
 pr_dbg("Device reset complete\n");
-
-return 0;
 }
 
 static uint64_t regs_read(void *opaque, hwaddr addr, unsigned size)
-- 
2.17.1

[Qemu-devel] [PATCH PULL 14/31] hw/pvrdma: Make sure PCI function 0 is vmxnet3

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Guest driver enforces it, we should also.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma.h  |  2 ++
 hw/rdma/vmw/pvrdma_main.c | 12 
 2 files changed, 14 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h
index b019cb843a..10a3c4fb7c 100644
--- a/hw/rdma/vmw/pvrdma.h
+++ b/hw/rdma/vmw/pvrdma.h
@@ -20,6 +20,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/msix.h"
 #include "chardev/char-fe.h"
+#include "hw/net/vmxnet3_defs.h"
 
 #include "../rdma_backend_defs.h"
 #include "../rdma_rm_defs.h"
@@ -85,6 +86,7 @@ typedef struct PVRDMADev {
 RdmaBackendDev backend_dev;
 RdmaDeviceResources rdma_dev_res;
 CharBackend mad_chr;
+VMXNET3State *func0;
 } PVRDMADev;
 #define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME)
 
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index ac8c092db0..b35b5dc5f0 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -565,6 +565,7 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 PVRDMADev *dev = PVRDMA_DEV(pdev);
 Object *memdev_root;
 bool ram_shared = false;
+PCIDevice *func0;
 
 init_pr_dbg();
 
@@ -576,6 +577,17 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
+func0 = pci_get_function_0(pdev);
+/* Break if not vmxnet3 device in slot 0 */
+if (strcmp(object_get_typename(>qdev.parent_obj), TYPE_VMXNET3)) {
+pr_dbg("func0 type is %s\n",
+   object_get_typename(>qdev.parent_obj));
+error_setg(errp, "Device on %x.0 must be %s", PCI_SLOT(pdev->devfn),
+   TYPE_VMXNET3);
+return;
+}
+dev->func0 = VMXNET3(func0);
+
 memdev_root = object_resolve_path("/objects", NULL);
 if (memdev_root) {
 object_child_foreach(memdev_root, pvrdma_check_ram_shared, 
_shared);
-- 
2.17.1

[Qemu-devel] [PATCH PULL 17/31] hw/pvrdma: Fill all CQE fields

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Add ability to pass specific WC attributes to CQE such as GRH_BIT flag.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c  | 59 +++--
 hw/rdma/rdma_backend.h  |  4 +--
 hw/rdma/vmw/pvrdma_qp_ops.c | 31 +++
 3 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 1d496bbd95..ae1e4dcb29 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -60,13 +60,24 @@ struct backend_umad {
 char mad[RDMA_MAX_PRIVATE_DATA];
 };
 
-static void (*comp_handler)(int status, unsigned int vendor_err, void *ctx);
+static void (*comp_handler)(void *ctx, struct ibv_wc *wc);
 
-static void dummy_comp_handler(int status, unsigned int vendor_err, void *ctx)
+static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
 {
 pr_err("No completion handler is registered\n");
 }
 
+static inline void complete_work(enum ibv_wc_status status, uint32_t 
vendor_err,
+ void *ctx)
+{
+struct ibv_wc wc = {0};
+
+wc.status = status;
+wc.vendor_err = vendor_err;
+
+comp_handler(ctx, );
+}
+
 static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
 {
 int i, ne;
@@ -91,7 +102,7 @@ static void poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 }
 pr_dbg("Processing %s CQE\n", bctx->is_tx_req ? "send" : "recv");
 
-comp_handler(wc[i].status, wc[i].vendor_err, bctx->up_ctx);
+comp_handler(bctx->up_ctx, [i]);
 
 rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
 g_free(bctx);
@@ -256,8 +267,8 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
 }
 
-void rdma_backend_register_comp_handler(void (*handler)(int status,
-unsigned int vendor_err, void *ctx))
+void rdma_backend_register_comp_handler(void (*handler)(void *ctx,
+ struct ibv_wc *wc))
 {
 comp_handler = handler;
 }
@@ -451,14 +462,14 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 if (!qp->ibqp) { /* This field does not get initialized for QP0 and QP1 */
 if (qp_type == IBV_QPT_SMI) {
 pr_dbg("QP0 unsupported\n");
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx);
 } else if (qp_type == IBV_QPT_GSI) {
 pr_dbg("QP1\n");
 rc = mad_send(backend_dev, sgid_idx, sgid, sge, num_sge);
 if (rc) {
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx);
 } else {
-comp_handler(IBV_WC_SUCCESS, 0, ctx);
+complete_work(IBV_WC_SUCCESS, 0, ctx);
 }
 }
 return;
@@ -467,7 +478,7 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 pr_dbg("num_sge=%d\n", num_sge);
 if (!num_sge) {
 pr_dbg("num_sge=0\n");
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NO_SGE, ctx);
 return;
 }
 
@@ -478,21 +489,21 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 rc = rdma_rm_alloc_cqe_ctx(backend_dev->rdma_dev_res, _id, bctx);
 if (unlikely(rc)) {
 pr_dbg("Failed to allocate cqe_ctx\n");
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
 goto out_free_bctx;
 }
 
 rc = build_host_sge_array(backend_dev->rdma_dev_res, new_sge, sge, 
num_sge);
 if (rc) {
 pr_dbg("Error: Failed to build host SGE array\n");
-comp_handler(IBV_WC_GENERAL_ERR, rc, ctx);
+complete_work(IBV_WC_GENERAL_ERR, rc, ctx);
 goto out_dealloc_cqe_ctx;
 }
 
 if (qp_type == IBV_QPT_UD) {
 wr.wr.ud.ah = create_ah(backend_dev, qp->ibpd, sgid_idx, dgid);
 if (!wr.wr.ud.ah) {
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
 goto out_dealloc_cqe_ctx;
 }
 wr.wr.ud.remote_qpn = dqpn;
@@ -510,7 +521,7 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 if (rc) {
 pr_dbg("Fail (%d, %d) to post send WQE to qpn %d\n", rc, errno,
 qp->ibqp->qp_num);
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
 goto out_dealloc_cqe_ctx;
 }
 
@@ -579,13 +590,13 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,

[Qemu-devel] [PATCH PULL 10/31] hw/pvrdma: Set the correct opcode for send completion

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

opcode for WC should be set by the device and not taken from work
element.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_qp_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index 7b0f440fda..3388be1926 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -154,7 +154,7 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
 comp_ctx->cq_handle = qp->send_cq_handle;
 comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
 comp_ctx->cqe.qp = qp_handle;
-comp_ctx->cqe.opcode = wqe->hdr.opcode;
+comp_ctx->cqe.opcode = IBV_WC_SEND;
 
 rdma_backend_post_send(>backend_dev, >backend_qp, qp->qp_type,
(struct ibv_sge *)>sge[0], 
wqe->hdr.num_sge,
-- 
2.17.1

[Qemu-devel] [PATCH PULL 13/31] vmxnet3: Move some definitions to header file

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

pvrdma setup requires vmxnet3 device on PCI function 0 and PVRDMA device
on PCI function 1.
pvrdma device needs to access vmxnet3 device object for several reasons:
1. Make sure PCI function 0 is vmxnet3.
2. To monitor vmxnet3 device state.
3. To configure node_guid accoring to vmxnet3 device's MAC address.

To be able to access vmxnet3 device the definition of VMXNET3State is
moved to a new header file.

Signed-off-by: Yuval Shaia 
Reviewed-by: Dmitry Fleytman 
Signed-off-by: Marcel Apfelbaum 
---
 hw/net/vmxnet3.c  | 116 +---
 hw/net/vmxnet3_defs.h | 133 ++
 2 files changed, 134 insertions(+), 115 deletions(-)
 create mode 100644 hw/net/vmxnet3_defs.h

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 3648630386..54746a4030 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -18,7 +18,6 @@
 #include "qemu/osdep.h"
 #include "hw/hw.h"
 #include "hw/pci/pci.h"
-#include "net/net.h"
 #include "net/tap.h"
 #include "net/checksum.h"
 #include "sysemu/sysemu.h"
@@ -29,6 +28,7 @@
 #include "migration/register.h"
 
 #include "vmxnet3.h"
+#include "vmxnet3_defs.h"
 #include "vmxnet_debug.h"
 #include "vmware_utils.h"
 #include "net_tx_pkt.h"
@@ -131,23 +131,11 @@ typedef struct VMXNET3Class {
 DeviceRealize parent_dc_realize;
 } VMXNET3Class;
 
-#define TYPE_VMXNET3 "vmxnet3"
-#define VMXNET3(obj) OBJECT_CHECK(VMXNET3State, (obj), TYPE_VMXNET3)
-
 #define VMXNET3_DEVICE_CLASS(klass) \
 OBJECT_CLASS_CHECK(VMXNET3Class, (klass), TYPE_VMXNET3)
 #define VMXNET3_DEVICE_GET_CLASS(obj) \
 OBJECT_GET_CLASS(VMXNET3Class, (obj), TYPE_VMXNET3)
 
-/* Cyclic ring abstraction */
-typedef struct {
-hwaddr pa;
-uint32_t size;
-uint32_t cell_size;
-uint32_t next;
-uint8_t gen;
-} Vmxnet3Ring;
-
 static inline void vmxnet3_ring_init(PCIDevice *d,
 Vmxnet3Ring *ring,
  hwaddr pa,
@@ -245,108 +233,6 @@ vmxnet3_dump_rx_descr(struct Vmxnet3_RxDesc *descr)
   descr->rsvd, descr->dtype, descr->ext1, descr->btype);
 }
 
-/* Device state and helper functions */
-#define VMXNET3_RX_RINGS_PER_QUEUE (2)
-
-typedef struct {
-Vmxnet3Ring tx_ring;
-Vmxnet3Ring comp_ring;
-
-uint8_t intr_idx;
-hwaddr tx_stats_pa;
-struct UPT1_TxStats txq_stats;
-} Vmxnet3TxqDescr;
-
-typedef struct {
-Vmxnet3Ring rx_ring[VMXNET3_RX_RINGS_PER_QUEUE];
-Vmxnet3Ring comp_ring;
-uint8_t intr_idx;
-hwaddr rx_stats_pa;
-struct UPT1_RxStats rxq_stats;
-} Vmxnet3RxqDescr;
-
-typedef struct {
-bool is_masked;
-bool is_pending;
-bool is_asserted;
-} Vmxnet3IntState;
-
-typedef struct {
-PCIDevice parent_obj;
-NICState *nic;
-NICConf conf;
-MemoryRegion bar0;
-MemoryRegion bar1;
-MemoryRegion msix_bar;
-
-Vmxnet3RxqDescr rxq_descr[VMXNET3_DEVICE_MAX_RX_QUEUES];
-Vmxnet3TxqDescr txq_descr[VMXNET3_DEVICE_MAX_TX_QUEUES];
-
-/* Whether MSI-X support was installed successfully */
-bool msix_used;
-hwaddr drv_shmem;
-hwaddr temp_shared_guest_driver_memory;
-
-uint8_t txq_num;
-
-/* This boolean tells whether RX packet being indicated has to */
-/* be split into head and body chunks from different RX rings  */
-bool rx_packets_compound;
-
-bool rx_vlan_stripping;
-bool lro_supported;
-
-uint8_t rxq_num;
-
-/* Network MTU */
-uint32_t mtu;
-
-/* Maximum number of fragments for indicated TX packets */
-uint32_t max_tx_frags;
-
-/* Maximum number of fragments for indicated RX packets */
-uint16_t max_rx_frags;
-
-/* Index for events interrupt */
-uint8_t event_int_idx;
-
-/* Whether automatic interrupts masking enabled */
-bool auto_int_masking;
-
-bool peer_has_vhdr;
-
-/* TX packets to QEMU interface */
-struct NetTxPkt *tx_pkt;
-uint32_t offload_mode;
-uint32_t cso_or_gso_size;
-uint16_t tci;
-bool needs_vlan;
-
-struct NetRxPkt *rx_pkt;
-
-bool tx_sop;
-bool skip_current_tx_pkt;
-
-uint32_t device_active;
-uint32_t last_command;
-
-uint32_t link_status_and_speed;
-
-Vmxnet3IntState interrupt_states[VMXNET3_MAX_INTRS];
-
-uint32_t temp_mac;   /* To store the low part first */
-
-MACAddr perm_mac;
-uint32_t vlan_table[VMXNET3_VFT_SIZE];
-uint32_t rx_mode;
-MACAddr *mcast_list;
-uint32_t mcast_list_len;
-uint32_t mcast_list_buff_size; /* needed for live migration. */
-
-/* Compatibility flags for migration */
-uint32_t compat_flags;
-} VMXNET3State;
-
 /* Interrupt management */
 
 /*
diff --git a/hw/net/vmxnet3_defs.h b/hw/net/vmxnet3_defs.h
new file mode 100644
index 00..6c19d29b12
--- /dev/null

[Qemu-devel] [PATCH PULL 06/31] hw/rdma: Add support for MAD packets

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

MAD (Management Datagram) packets are widely used by various modules
both in kernel and in user space for example the rdma_* API which is
used to create and maintain "connection" layer on top of RDMA uses
several types of MAD packets.

For more information please refer to chapter 13.4 in Volume 1
Architecture Specification, Release 1.1 available here:
https://www.infinibandta.org/ibta-specifications-download/

To support MAD packets the device uses an external utility
(contrib/rdmacm-mux) to relay packets from and to the guest driver.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c  | 250 +++-
 hw/rdma/rdma_backend.h  |   4 +-
 hw/rdma/rdma_backend_defs.h |  10 +-
 hw/rdma/vmw/pvrdma.h|   2 +
 hw/rdma/vmw/pvrdma_main.c   |   4 +-
 5 files changed, 260 insertions(+), 10 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 1e148398a2..c6dedda555 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -16,8 +16,13 @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnum.h"
 
 #include 
+#include 
+#include 
+#include 
 
 #include "trace.h"
 #include "rdma_utils.h"
@@ -33,16 +38,25 @@
 #define VENDOR_ERR_MAD_SEND 0x206
 #define VENDOR_ERR_INVLKEY  0x207
 #define VENDOR_ERR_MR_SMALL 0x208
+#define VENDOR_ERR_INV_MAD_BUFF 0x209
+#define VENDOR_ERR_INV_NUM_SGE  0x210
 
 #define THR_NAME_LEN 16
 #define THR_POLL_TO  5000
 
+#define MAD_HDR_SIZE sizeof(struct ibv_grh)
+
 typedef struct BackendCtx {
-uint64_t req_id;
 void *up_ctx;
 bool is_tx_req;
+struct ibv_sge sge; /* Used to save MAD recv buffer */
 } BackendCtx;
 
+struct backend_umad {
+struct ib_user_mad hdr;
+char mad[RDMA_MAX_PRIVATE_DATA];
+};
+
 static void (*comp_handler)(int status, unsigned int vendor_err, void *ctx);
 
 static void dummy_comp_handler(int status, unsigned int vendor_err, void *ctx)
@@ -286,6 +300,61 @@ static int build_host_sge_array(RdmaDeviceResources 
*rdma_dev_res,
 return 0;
 }
 
+static int mad_send(RdmaBackendDev *backend_dev, struct ibv_sge *sge,
+uint32_t num_sge)
+{
+struct backend_umad umad = {0};
+char *hdr, *msg;
+int ret;
+
+pr_dbg("num_sge=%d\n", num_sge);
+
+if (num_sge != 2) {
+return -EINVAL;
+}
+
+umad.hdr.length = sge[0].length + sge[1].length;
+pr_dbg("msg_len=%d\n", umad.hdr.length);
+
+if (umad.hdr.length > sizeof(umad.mad)) {
+return -ENOMEM;
+}
+
+umad.hdr.addr.qpn = htobe32(1);
+umad.hdr.addr.grh_present = 1;
+umad.hdr.addr.gid_index = backend_dev->backend_gid_idx;
+memcpy(umad.hdr.addr.gid, backend_dev->gid.raw, sizeof(umad.hdr.addr.gid));
+umad.hdr.addr.hop_limit = 0xFF;
+
+hdr = rdma_pci_dma_map(backend_dev->dev, sge[0].addr, sge[0].length);
+if (!hdr) {
+pr_dbg("Fail to map to sge[0]\n");
+return -ENOMEM;
+}
+msg = rdma_pci_dma_map(backend_dev->dev, sge[1].addr, sge[1].length);
+if (!msg) {
+pr_dbg("Fail to map to sge[1]\n");
+rdma_pci_dma_unmap(backend_dev->dev, hdr, sge[0].length);
+return -ENOMEM;
+}
+
+pr_dbg_buf("mad_hdr", hdr, sge[0].length);
+pr_dbg_buf("mad_data", data, sge[1].length);
+
+memcpy([0], hdr, sge[0].length);
+memcpy([sge[0].length], msg, sge[1].length);
+
+rdma_pci_dma_unmap(backend_dev->dev, msg, sge[1].length);
+rdma_pci_dma_unmap(backend_dev->dev, hdr, sge[0].length);
+
+ret = qemu_chr_fe_write(backend_dev->mad_chr_be, (const uint8_t *),
+sizeof(umad));
+
+pr_dbg("qemu_chr_fe_write=%d\n", ret);
+
+return (ret != sizeof(umad));
+}
+
 void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 RdmaBackendQP *qp, uint8_t qp_type,
 struct ibv_sge *sge, uint32_t num_sge,
@@ -304,9 +373,13 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_QP0, ctx);
 } else if (qp_type == IBV_QPT_GSI) {
 pr_dbg("QP1\n");
-comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx);
+rc = mad_send(backend_dev, sge, num_sge);
+if (rc) {
+comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx);
+} else {
+comp_handler(IBV_WC_SUCCESS, 0, ctx);
+}
 }
-pr_dbg("qp->ibqp is NULL for qp_type %d!!!\n", qp_type);
 return;
 }
 
@@ -370,6 +443,48 @@ out_free_bctx:
 g_free(bctx);
 }
 
+static unsigned int save_mad_recv_buffer(RdmaBackendDev *backend_dev,
+ struct ibv_sge *sge, uint32_t num_sge,
+ void *ctx)
+{
+BackendCtx *bctx;
+int

[Qemu-devel] [PATCH PULL 03/31] hw/rdma: Add ability to force notification without re-arm

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Upon completion of incoming packet the device pushes CQE to driver's RX
ring and notify the driver (msix).
While for data-path incoming packets the driver needs the ability to
control whether it wished to receive interrupts or not, for control-path
packets such as incoming MAD the driver needs to be notified anyway, it
even do not need to re-arm the notification bit.

Enhance the notification field to support this.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_rm.c   | 12 ++--
 hw/rdma/rdma_rm_defs.h  |  8 +++-
 hw/rdma/vmw/pvrdma_qp_ops.c |  6 --
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index 8d59a42cd1..4f10fcabcc 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -263,7 +263,7 @@ int rdma_rm_alloc_cq(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
 }
 
 cq->opaque = opaque;
-cq->notify = false;
+cq->notify = CNT_CLEAR;
 
 rc = rdma_backend_create_cq(backend_dev, >backend_cq, cqe);
 if (rc) {
@@ -291,7 +291,10 @@ void rdma_rm_req_notify_cq(RdmaDeviceResources *dev_res, 
uint32_t cq_handle,
 return;
 }
 
-cq->notify = notify;
+if (cq->notify != CNT_SET) {
+cq->notify = notify ? CNT_ARM : CNT_CLEAR;
+}
+
 pr_dbg("notify=%d\n", cq->notify);
 }
 
@@ -349,6 +352,11 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, 
uint32_t pd_handle,
 return -EINVAL;
 }
 
+if (qp_type == IBV_QPT_GSI) {
+scq->notify = CNT_SET;
+rcq->notify = CNT_SET;
+}
+
 qp = res_tbl_alloc(_res->qp_tbl, _qpn);
 if (!qp) {
 return -ENOMEM;
diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h
index 7228151239..9b399063d3 100644
--- a/hw/rdma/rdma_rm_defs.h
+++ b/hw/rdma/rdma_rm_defs.h
@@ -49,10 +49,16 @@ typedef struct RdmaRmPD {
 uint32_t ctx_handle;
 } RdmaRmPD;
 
+typedef enum CQNotificationType {
+CNT_CLEAR,
+CNT_ARM,
+CNT_SET,
+} CQNotificationType;
+
 typedef struct RdmaRmCQ {
 RdmaBackendCQ backend_cq;
 void *opaque;
-bool notify;
+CQNotificationType notify;
 } RdmaRmCQ;
 
 /* MR (DMA region) */
diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index c668afd0ed..762700a205 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -89,8 +89,10 @@ static int pvrdma_post_cqe(PVRDMADev *dev, uint32_t 
cq_handle,
 pvrdma_ring_write_inc(>dsr_info.cq);
 
 pr_dbg("cq->notify=%d\n", cq->notify);
-if (cq->notify) {
-cq->notify = false;
+if (cq->notify != CNT_CLEAR) {
+if (cq->notify == CNT_ARM) {
+cq->notify = CNT_CLEAR;
+}
 post_interrupt(dev, INTR_VEC_CMD_COMPLETION_Q);
 }
 
-- 
2.17.1

[Qemu-devel] [PATCH PULL 11/31] qapi: Define new QMP message for pvrdma

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

pvrdma requires that the same GID attached to it will be attached to the
backend device in the host.

A new QMP messages is defined so pvrdma device can broadcast any change
made to its GID table. This event is captured by libvirt which in  turn
will update the GID table in the backend device.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Acked-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
---
 MAINTAINERS   |  1 +
 Makefile.objs |  3 ++-
 qapi/qapi-schema.json |  1 +
 qapi/rdma.json| 38 ++
 4 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 qapi/rdma.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 856d379b0a..180695f5d3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2413,6 +2413,7 @@ F: hw/rdma/*
 F: hw/rdma/vmw/*
 F: docs/pvrdma.txt
 F: contrib/rdmacm-mux/*
+F: qapi/rdma.json
 
 Build and test automation
 -
diff --git a/Makefile.objs b/Makefile.objs
index 319f14d937..bc5b8a8442 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -1,5 +1,6 @@
 QAPI_MODULES = block-core block char common crypto introspect job migration
-QAPI_MODULES += misc net rocker run-state sockets tpm trace transaction ui
+QAPI_MODULES += misc net rdma rocker run-state sockets tpm trace transaction
+QAPI_MODULES += ui
 
 ###
 # Common libraries for tools and emulators
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 65b6dc2f6f..3bbdfcee84 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -86,6 +86,7 @@
 { 'include': 'char.json' }
 { 'include': 'job.json' }
 { 'include': 'net.json' }
+{ 'include': 'rdma.json' }
 { 'include': 'rocker.json' }
 { 'include': 'tpm.json' }
 { 'include': 'ui.json' }
diff --git a/qapi/rdma.json b/qapi/rdma.json
new file mode 100644
index 00..b58105b1b6
--- /dev/null
+++ b/qapi/rdma.json
@@ -0,0 +1,38 @@
+# -*- Mode: Python -*-
+#
+
+##
+# = RDMA device
+##
+
+##
+# @RDMA_GID_STATUS_CHANGED:
+#
+# Emitted when guest driver adds/deletes GID to/from device
+#
+# @netdev: RoCE Network Device name
+#
+# @gid-status: Add or delete indication
+#
+# @subnet-prefix: Subnet Prefix
+#
+# @interface-id : Interface ID
+#
+# Since: 4.0
+#
+# Example:
+#
+# <- {"timestamp": {"seconds": 1541579657, "microseconds": 986760},
+# "event": "RDMA_GID_STATUS_CHANGED",
+# "data":
+# {"netdev": "bridge0",
+# "interface-id": 15880512517475447892,
+# "gid-status": true,
+# "subnet-prefix": 33022}}
+#
+##
+{ 'event': 'RDMA_GID_STATUS_CHANGED',
+  'data': { 'netdev': 'str',
+'gid-status': 'bool',
+'subnet-prefix' : 'uint64',
+'interface-id'  : 'uint64' } }
-- 
2.17.1

[Qemu-devel] [PATCH PULL 05/31] hw/rdma: Abort send-op if fail to create addr handler

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Function create_ah might return NULL, let's exit with an error.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index d7a4bbd91f..1e148398a2 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -338,6 +338,10 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 if (qp_type == IBV_QPT_UD) {
 wr.wr.ud.ah = create_ah(backend_dev, qp->ibpd,
 backend_dev->backend_gid_idx, dgid);
+if (!wr.wr.ud.ah) {
+comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
+goto out_dealloc_cqe_ctx;
+}
 wr.wr.ud.remote_qpn = dqpn;
 wr.wr.ud.remote_qkey = dqkey;
 }
-- 
2.17.1

[Qemu-devel] [PATCH PULL 08/31] hw/pvrdma: Make default pkey 0xFFFF

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Commit 6e7dba23af ("hw/pvrdma: Make default pkey 0x") exports
default pkey as external definition but omit the change from 0x7FFF to
0x.

Fixes: 6e7dba23af ("hw/pvrdma: Make default pkey 0x")

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma.h b/hw/rdma/vmw/pvrdma.h
index e3742d893a..15c3f28b86 100644
--- a/hw/rdma/vmw/pvrdma.h
+++ b/hw/rdma/vmw/pvrdma.h
@@ -52,7 +52,7 @@
 #define PVRDMA_FW_VERSION14
 
 /* Some defaults */
-#define PVRDMA_PKEY  0x7FFF
+#define PVRDMA_PKEY  0x
 
 typedef struct DSRInfo {
 dma_addr_t dma;
-- 
2.17.1

[Qemu-devel] [PATCH PULL 09/31] hw/pvrdma: Set the correct opcode for recv completion

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

The function pvrdma_post_cqe populates CQE entry with opcode from the
given completion element. For receive operation value was not set. Fix
it by setting it to IBV_WC_RECV.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_qp_ops.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index 762700a205..7b0f440fda 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -196,8 +196,9 @@ int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)
 comp_ctx = g_malloc(sizeof(CompHandlerCtx));
 comp_ctx->dev = dev;
 comp_ctx->cq_handle = qp->recv_cq_handle;
-comp_ctx->cqe.qp = qp_handle;
 comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
+comp_ctx->cqe.qp = qp_handle;
+comp_ctx->cqe.opcode = IBV_WC_RECV;
 
 rdma_backend_post_recv(>backend_dev, >rdma_dev_res,
>backend_qp, qp->qp_type,
-- 
2.17.1

[Qemu-devel] [PATCH PULL 01/31] hw/pvrdma: Check the correct return value

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Return value of 0 means ok, we want to free the memory only in case of
error.

Signed-off-by: Yuval Shaia 
Message-Id: <20181025061700.17050-1-yuval.sh...@oracle.com>
Reviewed-by: Marcel Apfelbaum
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index 4faeb21631..57d6f41ae6 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -232,7 +232,7 @@ static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req 
*req,
  cmd->start, cmd->length, host_virt,
  cmd->access_flags, >mr_handle,
  >lkey, >rkey);
-if (host_virt && !resp->hdr.err) {
+if (resp->hdr.err && host_virt) {
 munmap(host_virt, cmd->length);
 }
 
-- 
2.17.1

[Qemu-devel] [PATCH PULL 00/31] RDMA queue

2018-12-22 Thread Marcel Apfelbaum

The following changes since commit 891ff9f4a371da2dbd5244590eb35e8d803e18d8:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into 
staging (2018-12-21 15:49:59 +)

are available in the Git repository at:

  https://github.com/marcel-apf/qemu tags/rdma-pull-request

for you to fetch changes up to f1e2e38ee0136b7710a2caa347049818afd57a1b:

  pvrdma: check return value from pvrdma_idx_ring_has_ routines (2018-12-22 
11:09:57 +0200)


RDMA queue
 * Add support for RDMA MAD
 * Various fixes for the pvrdma backend


Prasad J Pandit (7):
  pvrdma: release device resources in case of an error
  rdma: check num_sge does not exceed MAX_SGE
  pvrdma: add uar_read routine
  pvrdma: check number of pages when creating rings
  pvrdma: release ring object in case of an error
  rdma: remove unused VENDOR_ERR_NO_SGE macro
  pvrdma: check return value from pvrdma_idx_ring_has_ routines

Yuval Shaia (24):
  hw/pvrdma: Check the correct return value
  contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer
  hw/rdma: Add ability to force notification without re-arm
  hw/rdma: Return qpn 1 if ibqp is NULL
  hw/rdma: Abort send-op if fail to create addr handler
  hw/rdma: Add support for MAD packets
  hw/pvrdma: Make function reset_device return void
  hw/pvrdma: Make default pkey 0x
  hw/pvrdma: Set the correct opcode for recv completion
  hw/pvrdma: Set the correct opcode for send completion
  qapi: Define new QMP message for pvrdma
  hw/pvrdma: Add support to allow guest to configure GID table
  vmxnet3: Move some definitions to header file
  hw/pvrdma: Make sure PCI function 0 is vmxnet3
  hw/rdma: Initialize node_guid from vmxnet3 mac address
  hw/pvrdma: Make device state depend on Ethernet function state
  hw/pvrdma: Fill all CQE fields
  hw/pvrdma: Fill error code in command's response
  hw/rdma: Remove unneeded code that handles more that one port
  vl: Introduce shutdown_notifiers
  hw/pvrdma: Clean device's resource when system is shutdown
  hw/rdma: Do not use bitmap_zero_extend to free bitmap
  hw/rdma: Do not call rdma_backend_del_gid on an empty gid
  docs: Update pvrdma device documentation

 MAINTAINERS  |   2 +
 Makefile |   3 +
 Makefile.objs|   4 +-
 contrib/rdmacm-mux/Makefile.objs |   4 +
 contrib/rdmacm-mux/main.c| 798 +++
 contrib/rdmacm-mux/rdmacm-mux.h  |  61 +++
 docs/pvrdma.txt  | 126 ++-
 hw/net/vmxnet3.c | 116 +-
 hw/net/vmxnet3_defs.h| 133 +++
 hw/rdma/rdma_backend.c   | 524 +
 hw/rdma/rdma_backend.h   |  28 +-
 hw/rdma/rdma_backend_defs.h  |  19 +-
 hw/rdma/rdma_rm.c| 120 +-
 hw/rdma/rdma_rm.h|  17 +-
 hw/rdma/rdma_rm_defs.h   |  21 +-
 hw/rdma/rdma_utils.h |  25 ++
 hw/rdma/vmw/pvrdma.h |  10 +-
 hw/rdma/vmw/pvrdma_cmd.c | 273 +++---
 hw/rdma/vmw/pvrdma_dev_ring.c|  29 +-
 hw/rdma/vmw/pvrdma_main.c|  70 ++--
 hw/rdma/vmw/pvrdma_qp_ops.c  |  62 ++-
 include/sysemu/sysemu.h  |   1 +
 qapi/qapi-schema.json|   1 +
 qapi/rdma.json   |  38 ++
 vl.c |  15 +-
 25 files changed, 2082 insertions(+), 418 deletions(-)
 create mode 100644 contrib/rdmacm-mux/Makefile.objs
 create mode 100644 contrib/rdmacm-mux/main.c
 create mode 100644 contrib/rdmacm-mux/rdmacm-mux.h
 create mode 100644 hw/net/vmxnet3_defs.h
 create mode 100644 qapi/rdma.json

-- 
2.17.1

[Qemu-devel] [PATCH PULL 04/31] hw/rdma: Return qpn 1 if ibqp is NULL

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

Device is not supporting QP0, only QP1.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index 86e8fe8ab6..3ccc9a2494 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -33,7 +33,7 @@ static inline union ibv_gid *rdma_backend_gid(RdmaBackendDev 
*dev)
 
 static inline uint32_t rdma_backend_qpn(const RdmaBackendQP *qp)
 {
-return qp->ibqp ? qp->ibqp->qp_num : 0;
+return qp->ibqp ? qp->ibqp->qp_num : 1;
 }
 
 static inline uint32_t rdma_backend_mr_lkey(const RdmaBackendMR *mr)
-- 
2.17.1

[Qemu-devel] [PATCH PULL 02/31] contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer

2018-12-22 Thread Marcel Apfelbaum

From: Yuval Shaia 

RDMA MAD kernel module (ibcm) disallow more than one MAD-agent for a
given MAD class.
This does not go hand-by-hand with qemu pvrdma device's requirements
where each VM is MAD agent.
Fix it by adding implementation of RDMA MAD multiplexer service which on
one hand register as a sole MAD agent with the kernel module and on the
other hand gives service to more than one VM.

Design Overview:
Reviewed-by: Shamir Rabinovitch 

A server process is registered to UMAD framework (for this to work the
rdma_cm kernel module needs to be unloaded) and creates a unix socket to
listen to incoming request from clients.
A client process (such as QEMU) connects to this unix socket and
registers with its own GID.

TX:

When client needs to send rdma_cm MAD message it construct it the same
way as without this multiplexer, i.e. creates a umad packet but this
time it writes its content to the socket instead of calling umad_send().
The server, upon receiving such a message fetch local_comm_id from it so
a context for this session can be maintain and relay the message to UMAD
layer by calling umad_send().

RX:

The server creates a worker thread to process incoming rdma_cm MAD
messages. When an incoming message arrived (umad_recv()) the server,
depending on the message type (attr_id) looks for target client by
either searching in gid->fd table or in local_comm_id->fd table. With
the extracted fd the server relays to incoming message to the client.

Signed-off-by: Yuval Shaia 
Reviewed-by: Shamir Rabinovitch 
Signed-off-by: Marcel Apfelbaum 

Signed-off-by: Marcel Apfelbaum 
---
 MAINTAINERS  |   1 +
 Makefile |   3 +
 Makefile.objs|   1 +
 contrib/rdmacm-mux/Makefile.objs |   4 +
 contrib/rdmacm-mux/main.c| 798 +++
 contrib/rdmacm-mux/rdmacm-mux.h  |  61 +++
 6 files changed, 868 insertions(+)
 create mode 100644 contrib/rdmacm-mux/Makefile.objs
 create mode 100644 contrib/rdmacm-mux/main.c
 create mode 100644 contrib/rdmacm-mux/rdmacm-mux.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3b31e07b26..856d379b0a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2412,6 +2412,7 @@ S: Maintained
 F: hw/rdma/*
 F: hw/rdma/vmw/*
 F: docs/pvrdma.txt
+F: contrib/rdmacm-mux/*
 
 Build and test automation
 -
diff --git a/Makefile b/Makefile
index 038780c6d0..dd53965f77 100644
--- a/Makefile
+++ b/Makefile
@@ -362,6 +362,7 @@ dummy := $(call unnest-vars,, \
 elf2dmp-obj-y \
 ivshmem-client-obj-y \
 ivshmem-server-obj-y \
+rdmacm-mux-obj-y \
 libvhost-user-obj-y \
 vhost-user-scsi-obj-y \
 vhost-user-blk-obj-y \
@@ -579,6 +580,8 @@ vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) 
libvhost-user.a
$(call LINK, $^)
 vhost-user-blk$(EXESUF): $(vhost-user-blk-obj-y) libvhost-user.a
$(call LINK, $^)
+rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS)
+   $(call LINK, $^)
 
 module_block.h: $(SRC_PATH)/scripts/modules/module_block.py config-host.mak
$(call quiet-command,$(PYTHON) $< $@ \
diff --git a/Makefile.objs b/Makefile.objs
index 56af0347d3..319f14d937 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -133,6 +133,7 @@ vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
 vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
 vhost-user-scsi-obj-y = contrib/vhost-user-scsi/
 vhost-user-blk-obj-y = contrib/vhost-user-blk/
+rdmacm-mux-obj-y = contrib/rdmacm-mux/
 
 ##
 trace-events-subdirs =
diff --git a/contrib/rdmacm-mux/Makefile.objs b/contrib/rdmacm-mux/Makefile.objs
new file mode 100644
index 00..be3eacb6f7
--- /dev/null
+++ b/contrib/rdmacm-mux/Makefile.objs
@@ -0,0 +1,4 @@
+ifdef CONFIG_PVRDMA
+CFLAGS += -libumad -Wno-format-truncation
+rdmacm-mux-obj-y = main.o
+endif
diff --git a/contrib/rdmacm-mux/main.c b/contrib/rdmacm-mux/main.c
new file mode 100644
index 00..835a7f9214
--- /dev/null
+++ b/contrib/rdmacm-mux/main.c
@@ -0,0 +1,798 @@
+/*
+ * QEMU paravirtual RDMA - rdmacm-mux implementation
+ *
+ * Copyright (C) 2018 Oracle
+ * Copyright (C) 2018 Red Hat Inc
+ *
+ * Authors:
+ * Yuval Shaia 
+ * Marcel Apfelbaum 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sys/poll.h"
+#include "sys/ioctl.h"
+#include "pthread.h"
+#include "syslog.h"
+
+#include "infiniband/verbs.h"
+#include "infiniband/umad.h"
+#include "infiniband/umad_types.h"
+#include "infiniband/umad_sa.h"
+#include "infiniband/umad_cm.h"
+
+#include "rdmacm-mux.h"
+
+#define SCALE_US 1000
+#define COMMID_TTL 2 /* How many SCALE_US a context of MAD session is saved */
+#define SLEEP_SECS 5 /* This is used both in poll() and thread */
+#define

Re: [Qemu-devel] [RFC PATCH 0/7] virtio-fs: shared file system for virtual machines3

2018-12-22 Thread jiangyiwen

On 2018/12/11 1:31, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Hi,
>   This is the first RFC for the QEMU side of 'virtio-fs';
> a new mechanism for mounting host directories into the guest
> in a fast, consistent and secure manner.  Our primary use
> case is kata containers, but it should be usable in other scenarios
> as well.
> 
> There are corresponding patches being posted to Linux kernel,
> libfuse and kata lists.
> 
> For a fuller design description, and benchmark numbers, please see
> Vivek's posting of the kernel set here:
> 
> https://marc.info/?l=linux-kernel=154446243024251=2
> 
> We've got a small website with instructions on how to use it, here:
> 
> https://virtio-fs.gitlab.io/
> 
> and all the code is available on gitlab at:
> 
> https://gitlab.com/virtio-fs
> 
> QEMU's changes
> --
> 
> The QEMU changes are pretty small; 
> 
> There's a new vhost-user device, which is used to carry a stream of
> FUSE messages to an external daemon that actually performs
> all the file IO.  The FUSE daemon is an external process in order to
> achieve better isolation for security and resource control (e.g. number
> of file descriptors) and also because it's cleaner than trying to
> integrate libfuse into QEMU.
> 
> This device has an extra BAR that contains (up to) 3 regions:
> 
>  a) a DAX mapping range ('the cache') - into which QEMU mmap's
> files on behalf of the external daemon; those files are
> then directly mapped by the guest in a way similar to a DAX
> backed file system;  one advantage of this is that multiple
> guests all accessing the same files should all be sharing
> those pages of host cache.
> 
>  b) An experimental set of mappings for use by a metadata versioning
> daemon;  this mapping is shared between multiple guests and
> the daemon, but only contains a set of version counters that
> allow a guest to quickly tell if its metadata is stale.
> 
> TODO
> 
> 
> This is the first RFC, we know we have a bunch of things to clear up:
> 
>   a) The virtio device specificiation is still in flux and is expected
>  to change
> 
>   b) We'd like to find ways of reducing the map/unmap latency for DAX
> 
>   c) The metadata versioning scheme needs to settle out.
> 
>   d) mmap'ing host files has some interesting side effects; for example
>  if the file gets truncated by the host and then the guest accesses
>  the mapping, KVM can fail the guest hard.
> 
> Dr. David Alan Gilbert (6):
>   virtio: Add shared memory capability
>   virtio-fs: Add cache BAR
>   virtio-fs: Add vhost-user slave commands for mapping
>   virtio-fs: Fill in slave commands for mapping
>   virtio-fs: Allow mapping of meta data version table
>   virtio-fs: Allow mapping of journal
> 
> Stefan Hajnoczi (1):
>   virtio: add vhost-user-fs-pci device
> 
>  configure   |  10 +
>  contrib/libvhost-user/libvhost-user.h   |   3 +
>  docs/interop/vhost-user.txt |  35 ++
>  hw/virtio/Makefile.objs |   1 +
>  hw/virtio/vhost-user-fs.c   | 517 
>  hw/virtio/vhost-user.c  |  16 +
>  hw/virtio/virtio-pci.c  | 115 +
>  hw/virtio/virtio-pci.h  |  19 +
>  include/hw/pci/pci.h|   1 +
>  include/hw/virtio/vhost-user-fs.h   |  79 +++
>  include/standard-headers/linux/virtio_fs.h  |  48 ++
>  include/standard-headers/linux/virtio_ids.h |   1 +
>  include/standard-headers/linux/virtio_pci.h |   9 +
>  13 files changed, 854 insertions(+)
>  create mode 100644 hw/virtio/vhost-user-fs.c
>  create mode 100644 include/hw/virtio/vhost-user-fs.h
>  create mode 100644 include/standard-headers/linux/virtio_fs.h
> 

Hi Dave,

I encounter a problem after running qemu with virtio-fs,

I find I only can mount virtio-fs using the following command:
mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o 
tag=myfs,rootmode=04,user_id=0,group_id=0
or mount -t virtio_fs /dev/null /mnt/virtio_fs/ -o 
tag=myfs,rootmode=04,user_id=0,group_id=0,dax

Then, I want to know how to use "cache=always" or "cache=none", even 
"cache=auto", "cache=writeback"?

Thanks,
Yiwen.

Re: [Qemu-devel] Segfaults in chardev due to races

2018-12-22 Thread Paolo Bonzini

On 21/12/18 23:31, Max Reitz wrote:
> I suppose the issue is that QMP events are sent by one thread, and
> client disconnects are handled by a different one.  So if a QMP event is
> sent while a client disconnects concurrently, races may occur; and the
> only protection against concurrent access appears to be the
> chr_write_lock, which I don't think is enough.

I think disconnection (tcp_chr_disconnect) has to take the
chr_write_lock too.

Paolo

Re: [Qemu-devel] [PATCH] i386: remove the 'INTEL_PT' CPUID bit from named CPU models

2018-12-22 Thread Paolo Bonzini

On 22/12/18 02:01, Robert Hoo wrote:
> On Fri, 2018-12-21 at 16:27 +0100, Paolo Bonzini wrote:
>> On 21/12/18 16:22, Philippe Mathieu-Daudé wrote:
>>> Hi Paolo,
>>>
>>> On 12/21/18 7:30 AM, Paolo Bonzini wrote:
 From: Robert Hoo 

 Processor tracing is not yet implemented for KVM and it will be
 an
 opt in feature requiring a special module parameter.
 Disable it, because it is wrong to enable it by default and
 it is impossible that no one has ever used it.

 Cc: qemu-sta...@nongnu.org
>>>
>>> Does this patch misses Robert S-o-b?
>>> Signed-off-by: Robert Hoo 
> 
> Paolo's right. It didn't come from me.
>>
>> No, the author is wrong, it should be me.  "git commit -c" apparently
>> copies the author from the original commit.
>>
>> Paolo
> 
> Hi Paolo, would you hold on INTEL_PT removal for a moment? I think I
> need Luwei's double confirm.

I'm aware of Luwei's patches, they will be in 4.21.  As mentioned in the
commit message, they will be an opt-in feature, not enabled by default;
the default is system-wide tracing and no INTEL_PT CPUID bit available
in the guest.

Paolo

Re: [Qemu-devel] [PULL v4 00/35] Misc patches for 2018-12-21

2018-12-22 Thread Paolo Bonzini

On 21/12/18 22:09, Peter Maydell wrote:
> I don't really understand what's going on here, or why
> it only happens with this one system (my main x86-64
> Linux Ubuntu 16.04.5 box) and not the various others I'm
> running test builds on. But it does seem to be 100%
> reliable with any of these pullreqs with the new test
> driver in them :-(

I'm afraid something in your setup is causing make's stdout to have
O_NONBLOCK set.  Make doesn't use O_NONBLOCK at all, so it must be
something above it.  I also checked Perl with strace and, at least here,
it doesn't set O_NONBLOCK.

So here are some ideas... First, can you try applying something like
this to reproduce?

--- a/Makefile
+++ b/Makefile
@@ -17,9 +17,13 @@ print-%:
 # All following code might depend on configuration variables
 ifneq ($(wildcard config-host.mak),)
 # Put the all: rule here so that config-host.mak can contain dependencies.
-all:
+all: lotsofoutput
 include config-host.mak

+.PHONY: lotsofoutput
+lotsofoutput:
+   yes 1234567890 | head -n 1
+
 git-submodule-update:

 .PHONY: git-submodule-update


And please try applying this, which is a bit of a shot in the dark but
1) it is a good idea anyway; 2) it may help, if not alone, together with
the workarounds below:

diff --git a/scripts/tap-driver.pl b/scripts/tap-driver.pl
index 5e59b5db49..6621a5cd67 100755
--- a/scripts/tap-driver.pl
+++ b/scripts/tap-driver.pl
@@ -313,6 +313,7 @@ sub main ()
   my $iterator = TAP::Parser::Iterator::Stream->new(\*STDIN);
   my $parser = TAP::Parser->new ({iterator => $iterator });

+  STDOUT->autoflush(1);
   while (defined (my $cur = $parser->next))
 {
   # Parsing of TAP input should stop after a "Bail out!" directive.
diff --git a/scripts/tap-merge.pl b/scripts/tap-merge.pl
index 59e3fa5007..10ccf57bb2 100755
--- a/scripts/tap-merge.pl
+++ b/scripts/tap-merge.pl
@@ -53,6 +53,7 @@ sub main ()
   my $testno = 0; # Number of test results seen so far.
   my $bailed_out = 0; # Whether a "Bail out!" directive has been seen.

+  STDOUT->autoflush(1);
   while (defined (my $cur = $parser->next))
 {
   if ($cur->is_bailout)

Possible workarounds include:

- using "make -Oline" or "make -Onone" (for -Oline, it may require the
above autoflush patch).

- running this Python script before invoking make

import os
from fcntl import *
fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~os.O_NONBLOCK)

Paolo

54 matches

Mail list logo