Re: [PATCHv7 4/4] virtio_console: Add support for remoteproc serial

2012-11-01 Thread Rusty Russell
Amit Shah  writes:
> On (Tue) 23 Oct 2012 [12:17:49], Rusty Russell wrote:
>> sjur.brandel...@stericsson.com writes:
>> > From: Sjur Brændeland 
>
>> > @@ -1415,7 +1524,16 @@ static void remove_port_data(struct port *port)
>> >  
>> >/* Remove buffers we queued up for the Host to send us data in. */
>> >while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
>> > -  free_buf(buf);
>> > +  free_buf(buf, true);
>> > +
>> > +  /*
>> > +   * Remove buffers from out queue for rproc-serial. We cannot afford
>> > +   * to leak any DMA mem, so reclaim this memory even if this might be
>> > +   * racy for the remote processor.
>> > +   */
>> > +  if (is_rproc_serial(port->portdev->vdev))
>> > +  while ((buf = virtqueue_detach_unused_buf(port->out_vq)))
>> > +  free_buf(buf, true);
>> >  }
>> 
>> This seems wrong; either this is needed even if !is_rproc_serial(), or
>> it's not necessary as the out_vq is empty.
>> 
>> Every path I can see has the device reset (in which case the queues
>> should not be active), or we got a VIRTIO_CONSOLE_PORT_REMOVE event (in
>> which case, the same).
>> 
>> I think we can have non-blocking writes which could leave buffers in
>> out_vq: Amit?
>
> Those get 'reclaimed' just above this hunk:
>
>
> static void remove_port_data(struct port *port)
> {
>   struct port_buffer *buf;
>
>   /* Remove unused data this port might have received. */
>   discard_port_data(port);
>
>   reclaim_consumed_buffers(port);
>
>   /* Remove buffers we queued up for the Host to send us data in. */
>   while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
> free_buf(buf, true);

No, that's pending input buffers, not output buffers.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: linux-next: Tree for Nov 1 (xen)

2012-11-01 Thread Konrad Rzeszutek Wilk
On Thu, Nov 01, 2012 at 10:18:47AM -0700, Randy Dunlap wrote:
> On 10/31/2012 10:36 PM, Stephen Rothwell wrote:
> 
> > Hi all,
> > 
> > New trees: rr-fixes and swiotlb
> > 
> > Changes since 20121031:
> > 
> 
> 
> arch/x86/xen/enlighten.c:109:0: warning: "xen_pvh_domain" redefined
> include/xen/xen.h:23:0: note: this is the location of the previous definition

I think this was due to a patch that mistakenly was put in the tmem.git
tree. It should not surface tomorrow.

> 
> Full randconfig file is attached.
> 
> -- 
> ~Randy

> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86_64 3.7.0-rc3 Kernel Configuration
> #
> CONFIG_64BIT=y
> CONFIG_X86_64=y
> CONFIG_X86=y
> CONFIG_INSTRUCTION_DECODER=y
> CONFIG_OUTPUT_FORMAT="elf64-x86-64"
> CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_LATENCYTOP_SUPPORT=y
> CONFIG_MMU=y
> CONFIG_NEED_DMA_MAP_STATE=y
> CONFIG_NEED_SG_DMA_LENGTH=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_GENERIC_GPIO=y
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> CONFIG_GENERIC_CALIBRATE_DELAY=y
> CONFIG_ARCH_HAS_CPU_RELAX=y
> CONFIG_ARCH_HAS_DEFAULT_IDLE=y
> CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
> CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
> CONFIG_HAVE_SETUP_PER_CPU_AREA=y
> CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
> CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
> CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> CONFIG_ARCH_SUSPEND_POSSIBLE=y
> CONFIG_ZONE_DMA32=y
> CONFIG_AUDIT_ARCH=y
> CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi 
> -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 
> -fcall-saved-r10 -fcall-saved-r11"
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> CONFIG_HAVE_IRQ_WORK=y
> CONFIG_IRQ_WORK=y
> CONFIG_BUILDTIME_EXTABLE_SORT=y
> 
> #
> # General setup
> #
> CONFIG_EXPERIMENTAL=y
> CONFIG_BROKEN_ON_SMP=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> CONFIG_CROSS_COMPILE=""
> CONFIG_LOCALVERSION=""
> # CONFIG_LOCALVERSION_AUTO is not set
> CONFIG_HAVE_KERNEL_GZIP=y
> CONFIG_HAVE_KERNEL_BZIP2=y
> CONFIG_HAVE_KERNEL_LZMA=y
> CONFIG_HAVE_KERNEL_XZ=y
> CONFIG_HAVE_KERNEL_LZO=y
> # CONFIG_KERNEL_GZIP is not set
> CONFIG_KERNEL_BZIP2=y
> # CONFIG_KERNEL_LZMA is not set
> # CONFIG_KERNEL_XZ is not set
> # CONFIG_KERNEL_LZO is not set
> CONFIG_DEFAULT_HOSTNAME="(none)"
> CONFIG_SYSVIPC=y
> CONFIG_FHANDLE=y
> CONFIG_HAVE_GENERIC_HARDIRQS=y
> 
> #
> # IRQ subsystem
> #
> CONFIG_GENERIC_HARDIRQS=y
> CONFIG_GENERIC_IRQ_PROBE=y
> CONFIG_GENERIC_IRQ_SHOW=y
> CONFIG_IRQ_DOMAIN=y
> # CONFIG_IRQ_DOMAIN_DEBUG is not set
> CONFIG_IRQ_FORCED_THREADING=y
> CONFIG_SPARSE_IRQ=y
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_ARCH_CLOCKSOURCE_DATA=y
> CONFIG_GENERIC_TIME_VSYSCALL=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
> CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
> CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> 
> #
> # Timers subsystem
> #
> CONFIG_TICK_ONESHOT=y
> # CONFIG_NO_HZ is not set
> CONFIG_HIGH_RES_TIMERS=y
> 
> #
> # CPU/Task time and stats accounting
> #
> CONFIG_TICK_CPU_ACCOUNTING=y
> # CONFIG_IRQ_TIME_ACCOUNTING is not set
> # CONFIG_BSD_PROCESS_ACCT is not set
> 
> #
> # RCU Subsystem
> #
> CONFIG_TINY_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> # CONFIG_TREE_RCU_TRACE is not set
> CONFIG_RCU_BOOST=y
> CONFIG_RCU_BOOST_PRIO=1
> CONFIG_RCU_BOOST_DELAY=500
> CONFIG_IKCONFIG=m
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
> CONFIG_CHECKPOINT_RESTORE=y
> CONFIG_NAMESPACES=y
> CONFIG_UTS_NS=y
> # CONFIG_IPC_NS is not set
> # CONFIG_PID_NS is not set
> # CONFIG_SCHED_AUTOGROUP is not set
> CONFIG_SYSFS_DEPRECATED=y
> # CONFIG_SYSFS_DEPRECATED_V2 is not set
> # CONFIG_RELAY is not set
> # CONFIG_BLK_DEV_INITRD is not set
> # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
> CONFIG_ANON_INODES=y
> CONFIG_EXPERT=y
> CONFIG_HAVE_UID16=y
> CONFIG_UID16=y
> CONFIG_SYSCTL_EXCEPTION_TRACE=y
> CONFIG_KALLSYMS=y
> CONFIG_KALLSYMS_ALL=y
> CONFIG_HOTPLUG=y
> # CONFIG_PRINTK is not set
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> # CONFIG_PCSPKR_PLATFORM is not set
> CONFIG_HAVE_PCSPKR_PLATFORM=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> # CONFIG_EPOLL is not set
> # CONFIG_SIGNALFD is not set
> # CONFIG_TIMERFD is not set
> # CONFIG_EVENTFD is not set
> # CONFIG_SHMEM is not set
> # CONFIG_AIO is not set
> # CONFIG_EMBEDDED is not set
> CONFIG_HAVE_PERF_EVENTS=y
> 
> #
> # Kernel Performance Events And Counters
> #
> CONFIG_PERF_EVENTS=y
> # CONFIG_DEBUG_PERF_USE_VMALLOC is not set
> CONFIG_VM_EVENT_COUNTERS=y
> CONFIG_SLUB_DEBUG=y
> CONFIG_COMPAT_BRK=y
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> # CONFIG_SLOB is not set
> CONFIG_PROFILING=y
> # CONFIG_OPROFILE is not set
> CONFIG_HAVE_OPROFILE=y
> CONFIG_OPROFILE_NMI_TIMER=y
> CONFIG_KPROBES=y
> CONFIG_JUMP_LABEL=y
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=

[PATCHv3 net-next 5/8] vhost: track zero copy failures using DMA length

2012-11-01 Thread Michael S. Tsirkin
This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 7 ---
 drivers/vhost/vhost.h | 4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f23cf89..5affce3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -425,7 +425,7 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
int j = 0;
 
for (i = vq->done_idx; i != vq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if ((vq->heads[i].len == VHOST_DMA_DONE_LEN)) {
+   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq->dev, vq,
  vq->heads[i].id, 0);
@@ -1600,13 +1600,14 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref 
*ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool status)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
 {
struct vhost_ubuf_ref *ubufs = ubuf->ctx;
struct vhost_virtqueue *vq = ubufs->vq;
 
vhost_poll_queue(&vq->poll);
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
+   vq->heads[ubuf->desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index b6538ee..464469d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -17,6 +17,8 @@
  * For transmit, used buffer len is unused; we override it to track buffer
  * status internally; used for zerocopy tx only.
  */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
 /* Lower device DMA done */
 #define VHOST_DMA_DONE_LEN 2
 /* Lower device DMA in progress */
@@ -24,6 +26,8 @@
 /* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
+#define VHOST_DMA_IS_DONE(len) ((len) >= VHOST_DMA_DONE_LEN)
+
 struct vhost_device;
 
 struct vhost_work;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 4/8] vhost-net: cleanup macros for DMA status tracking

2012-11-01 Thread Michael S. Tsirkin
Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c   |  3 ++-
 drivers/vhost/vhost.c |  2 +-
 drivers/vhost/vhost.h | 12 +---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 072cbba..f80ae5f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -237,7 +237,8 @@ static void handle_tx(struct vhost_net *net)
} else {
struct ubuf_info *ubuf = &vq->ubuf_info[head];
 
-   vq->heads[vq->upend_idx].len = len;
+   vq->heads[vq->upend_idx].len =
+   VHOST_DMA_IN_PROGRESS;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = vq->ubufs;
ubuf->desc = vq->upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 73d08db..f23cf89 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1606,7 +1606,7 @@ void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool 
status)
struct vhost_virtqueue *vq = ubufs->vq;
 
vhost_poll_queue(&vq->poll);
-   /* set len = 1 to mark this desc buffers done DMA */
+   /* set len to mark this desc buffers done DMA */
vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2de4ce2..b6538ee 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,9 +13,15 @@
 #include 
 #include 
 
-/* This is for zerocopy, used buffer len is set to 1 when lower device DMA
- * done */
-#define VHOST_DMA_DONE_LEN 1
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
 struct vhost_device;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 8/8] vhost-net: reduce vq polling on tx zerocopy

2012-11-01 Thread Michael S. Tsirkin
It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7a9f67add11ad3161096fb95233aa2e.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 93f2d67..28ad775 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -197,8 +197,18 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, bool success)
 {
struct vhost_ubuf_ref *ubufs = ubuf->ctx;
struct vhost_virtqueue *vq = ubufs->vq;
-
-   vhost_poll_queue(&vq->poll);
+   int cnt = atomic_read(&ubufs->kref.refcount);
+
+   /*
+* Trigger polling thread if guest stopped submitting new buffers:
+* in this case, the refcount after decrement will eventually reach 1
+* so here it is 2.
+* We also trigger polling periodically after each 16 packets
+* (the value 16 here is more or less arbitrary, it's tuned to trigger
+* less than 10% of times).
+*/
+   if (cnt <= 2 || !(cnt % 16))
+   vhost_poll_queue(&vq->poll);
/* set len to mark this desc buffers done DMA */
vq->heads[ubuf->desc].len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 7/8] vhost-net: select tx zero copy dynamically

2012-11-01 Thread Michael S. Tsirkin
Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 61 ++---
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 532fc88..93f2d67 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -42,6 +42,21 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Experimental 
Zero Copy TX");
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
 
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
+#define VHOST_DMA_CLEAR_LEN0
+
+#define VHOST_DMA_IS_DONE(len) ((len) >= VHOST_DMA_DONE_LEN)
+
 enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
@@ -62,8 +77,33 @@ struct vhost_net {
 * We only do this when socket buffer fills up.
 * Protected by tx vq lock. */
enum vhost_net_poll_state tx_poll_state;
+   /* Number of TX recently submitted.
+* Protected by tx vq lock. */
+   unsigned tx_packets;
+   /* Number of times zerocopy TX recently failed.
+* Protected by tx vq lock. */
+   unsigned tx_zcopy_err;
 };
 
+static void vhost_net_tx_packet(struct vhost_net *net)
+{
+   ++net->tx_packets;
+   if (net->tx_packets < 1024)
+   return;
+   net->tx_packets = 0;
+   net->tx_zcopy_err = 0;
+}
+
+static void vhost_net_tx_err(struct vhost_net *net)
+{
+   ++net->tx_zcopy_err;
+}
+
+static bool vhost_net_tx_select_zcopy(struct vhost_net *net)
+{
+   return net->tx_packets / 64 >= net->tx_zcopy_err;
+}
+
 static bool vhost_sock_zcopy(struct socket *sock)
 {
return unlikely(experimental_zcopytx) &&
@@ -131,12 +171,15 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
  * guest used idx.
  */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+static int vhost_zerocopy_signal_used(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
 {
int i;
int j = 0;
 
for (i = vq->done_idx; i != vq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+   vhost_net_tx_err(net);
if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq->dev, vq,
@@ -150,15 +193,15 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
return j;
 }
 
-static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
+static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 {
struct vhost_ubuf_ref *ubufs = ubuf->ctx;
struct vhost_virtqueue *vq = ubufs->vq;
 
vhost_poll_queue(&vq->poll);
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = status ?
-   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
+   vq->heads[ubuf->desc].len = success ?
+   VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
vhost_ubuf_put(ubufs);
 }
 
@@ -208,7 +251,7 @@ static void handle_tx(struct vhost_net *net)
for (;;) {
/* Release DMAs done buffers first */
if (zcopy)
-   vhost_zerocopy_signal_used(vq);
+   vhost_zerocopy_signal_used(net, vq);
 
head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 ARRAY_SIZE(vq->iov),
@@ -263,7 +306,8 @@ static void handle_tx(struct vhost_net *net)
/* use msg_control to pass vhost zerocopy ubuf info to skb */
if (zcopy) {
vq->heads[vq->upend_idx].id = head;
-   if (len < VHOST_GOODCOPY_LEN) {
+   if (!vhost_net_tx_select_zcopy(net) ||
+   len < VHOST_GOODCOPY_LEN) {
/* copy don't need to wait for DMA done */
vq->heads[vq->upend_idx].len =
VHOST_DMA_DONE_LEN;
@@ -305,8 +349,9 @@ static void handle_tx(struct

[PATCHv3 net-next 6/8] vhost: move -net specific code out

2012-11-01 Thread Michael S. Tsirkin
Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c   | 45 
 drivers/vhost/tcm_vhost.c |  1 +
 drivers/vhost/vhost.c | 53 +++
 drivers/vhost/vhost.h | 21 +++
 4 files changed, 56 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f80ae5f..532fc88 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -126,6 +126,42 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
net->tx_poll_state = VHOST_NET_POLL_STARTED;
 }
 
+/* In case of DMA done not in order in lower device driver for some reason.
+ * upend_idx is used to track end of used idx, done_idx is used to track head
+ * of used idx. Once lower device DMA done contiguously, we will signal KVM
+ * guest used idx.
+ */
+int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+{
+   int i;
+   int j = 0;
+
+   for (i = vq->done_idx; i != vq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
+   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+   vhost_add_used_and_signal(vq->dev, vq,
+ vq->heads[i].id, 0);
+   ++j;
+   } else
+   break;
+   }
+   if (j)
+   vq->done_idx = i;
+   return j;
+}
+
+static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
+{
+   struct vhost_ubuf_ref *ubufs = ubuf->ctx;
+   struct vhost_virtqueue *vq = ubufs->vq;
+
+   vhost_poll_queue(&vq->poll);
+   /* set len to mark this desc buffers done DMA */
+   vq->heads[ubuf->desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
+   vhost_ubuf_put(ubufs);
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -594,9 +630,18 @@ static int vhost_net_release(struct inode *inode, struct 
file *f)
struct vhost_net *n = f->private_data;
struct socket *tx_sock;
struct socket *rx_sock;
+   int i;
 
vhost_net_stop(n, &tx_sock, &rx_sock);
vhost_net_flush(n);
+   vhost_dev_stop(&n->dev);
+   for (i = 0; i < n->dev.nvqs; ++i) {
+   /* Wait for all lower device DMAs done. */
+   if (n->dev.vqs[i].ubufs)
+   vhost_ubuf_put_and_wait(n->dev.vqs[i].ubufs);
+
+   vhost_zerocopy_signal_used(n, &n->dev.vqs[i]);
+   }
vhost_dev_cleanup(&n->dev, false);
if (tx_sock)
fput(tx_sock->file);
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
index aa31692..23c138f 100644
--- a/drivers/vhost/tcm_vhost.c
+++ b/drivers/vhost/tcm_vhost.c
@@ -895,6 +895,7 @@ static int vhost_scsi_release(struct inode *inode, struct 
file *f)
vhost_scsi_clear_endpoint(s, &backend);
}
 
+   vhost_dev_stop(&s->dev);
vhost_dev_cleanup(&s->dev, false);
kfree(s);
return 0;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5affce3..ef8f598 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -26,10 +26,6 @@
 #include 
 #include 
 
-#include 
-#include 
-#include 
-
 #include "vhost.h"
 
 enum {
@@ -414,28 +410,16 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
return 0;
 }
 
-/* In case of DMA done not in order in lower device driver for some reason.
- * upend_idx is used to track end of used idx, done_idx is used to track head
- * of used idx. Once lower device DMA done contiguously, we will signal KVM
- * guest used idx.
- */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+void vhost_dev_stop(struct vhost_dev *dev)
 {
int i;
-   int j = 0;
-
-   for (i = vq->done_idx; i != vq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
-   vhost_add_used_and_signal(vq->dev, vq,
- vq->heads[i].id, 0);
-   ++j;
-   } else
-   break;
+
+   for (i = 0; i < dev->nvqs; ++i) {
+   if (dev->vqs[i].kick && dev->vqs[i].handle_kick) {
+   vhost_poll_stop(&dev->vqs[i].poll);
+   vhost_poll_flush(&dev->vqs[i].poll);
+   }
}
-   if (j)
-   vq->done_idx = i;
-   return j;
 }
 
 /* Caller should have device mutex if and only if locked is set */
@@ -444,17 +428,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
int i;
 
for (i = 0; i < dev->nvqs; ++i) {
-

[PATCHv3 net-next 3/8] tun: report orphan frags errors to zero copy callback

2012-11-01 Thread Michael S. Tsirkin
When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9e28768..b44d7b7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -728,6 +728,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 drop:
dev->stats.tx_dropped++;
+   skb_tx_error(skb);
kfree_skb(skb);
rcu_read_unlock();
return NETDEV_TX_OK;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 2/8] skb: api to report errors for zero copy skbs

2012-11-01 Thread Michael S. Tsirkin
Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c  | 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e5eae5b..f2af494 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -568,6 +568,7 @@ static inline struct rtable *skb_rtable(const struct 
sk_buff *skb)
 }
 
 extern void kfree_skb(struct sk_buff *skb);
+extern void skb_tx_error(struct sk_buff *skb);
 extern void consume_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4abdf71..d9addea 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -635,6 +635,26 @@ void kfree_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(kfree_skb);
 
 /**
+ * skb_tx_error - report an sk_buff xmit error
+ * @skb: buffer that triggered an error
+ *
+ * Report xmit error if a device callback is tracking this skb.
+ * skb must be freed afterwards.
+ */
+void skb_tx_error(struct sk_buff *skb)
+{
+   if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
+   struct ubuf_info *uarg;
+
+   uarg = skb_shinfo(skb)->destructor_arg;
+   if (uarg->callback)
+   uarg->callback(uarg, false);
+   skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
+   }
+}
+EXPORT_SYMBOL(skb_tx_error);
+
+/**
  * consume_skb - free an skbuff
  * @skb: buffer to free
  *
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread Michael S. Tsirkin
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c  | 2 +-
 drivers/vhost/vhost.h  | 2 +-
 include/linux/skbuff.h | 4 +++-
 net/core/skbuff.c  | 4 ++--
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99ac2cb..73d08db 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1600,7 +1600,7 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool status)
 {
struct vhost_ubuf_ref *ubufs = ubuf->ctx;
struct vhost_virtqueue *vq = ubufs->vq;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1125af3..2de4ce2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -191,7 +191,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct 
vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
unsigned int log_num, u64 len);
-void vhost_zerocopy_callback(struct ubuf_info *);
+void vhost_zerocopy_callback(struct ubuf_info *, bool);
 int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {  \
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a2a0bdb..e5eae5b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -235,11 +235,13 @@ enum {
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
+ * The zerocopy_success argument is true if zero copy transmit occurred,
+ * false on data copy or out of memory error caused by data copy attempt.
  * The ctx field is used to track device context.
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-   void (*callback)(struct ubuf_info *);
+   void (*callback)(struct ubuf_info *, bool zerocopy_success);
void *ctx;
unsigned long desc;
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6e04b1f..4abdf71 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -519,7 +519,7 @@ static void skb_release_data(struct sk_buff *skb)
 
uarg = skb_shinfo(skb)->destructor_arg;
if (uarg->callback)
-   uarg->callback(uarg);
+   uarg->callback(uarg, true);
}
 
if (skb_has_frag_list(skb))
@@ -797,7 +797,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
for (i = 0; i < num_frags; i++)
skb_frag_unref(skb, i);
 
-   uarg->callback(uarg);
+   uarg->callback(uarg, false);
 
/* skb frags point to kernel buffers */
for (i = num_frags - 1; i >= 0; i--) {
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCHv3 net-next 0/8] enable/disable zero copy tx dynamically

2012-11-01 Thread Michael S. Tsirkin

tun supports zero copy transmit since 0690899b4d4501b3505be069b9a687e68ccbe15b,
however you can only enable this mode if you know your workload does not
trigger heavy guest to host/host to guest traffic - otherwise you
get a (minor) performance regression.
This patchset addresses this problem by notifying the owner
device when callback is invoked because of a data copy.
This makes it possible to detect whether zero copy is appropriate
dynamically: we start in zero copy mode, when we detect
data copied we disable zero copy for a while.

With this patch applied, I get the same performance for
guest to host and guest to guest both with and without zero copy tx.

Changes from v2:
  change callback parameter from int to bool
  accordingly, drop err parameter from skb_tx_error

Changes from v1:
  Comment fixups in patches 2 and 8 suggested by Vlad Yasevich,
 no changes to other patches

Michael S. Tsirkin (8):
  skb: report completion status for zero copy skbs
  skb: api to report errors for zero copy skbs
  tun: report orphan frags errors to zero copy callback
  vhost-net: cleanup macros for DMA status tracking
  vhost: track zero copy failures using DMA length
  vhost: move -net specific code out
  vhost-net: select tx zero copy dynamically
  vhost-net: reduce vq polling on tx zerocopy

 drivers/net/tun.c |   1 +
 drivers/vhost/net.c   | 111 +++---
 drivers/vhost/tcm_vhost.c |   1 +
 drivers/vhost/vhost.c |  52 +++---
 drivers/vhost/vhost.h |  11 ++---
 include/linux/skbuff.h|   5 ++-
 net/core/skbuff.c |  24 +-
 7 files changed, 144 insertions(+), 61 deletions(-)

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/12] VMCI: Some header and config files.

2012-11-01 Thread George Zhang
VMCI head config patch Adds all the necessary files to enable building of the
VMCI module with the Linux Makefiles and Kconfig systems. Also adds the header
files used for building modules against the driver.


Signed-off-by: George Zhang 
---
 drivers/misc/Kconfig|1 
 drivers/misc/Makefile   |2 
 drivers/misc/vmw_vmci/Kconfig   |   16 +
 drivers/misc/vmw_vmci/Makefile  |4 
 drivers/misc/vmw_vmci/vmci_common_int.h |   34 +
 include/linux/vmw_vmci_api.h|   82 +++
 include/linux/vmw_vmci_defs.h   |  973 +++
 7 files changed, 1112 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/Kconfig
 create mode 100644 drivers/misc/vmw_vmci/Makefile
 create mode 100644 drivers/misc/vmw_vmci/vmci_common_int.h
 create mode 100644 include/linux/vmw_vmci_api.h
 create mode 100644 include/linux/vmw_vmci_defs.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2661f6e..fe38c7a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -517,4 +517,5 @@ source "drivers/misc/lis3lv02d/Kconfig"
 source "drivers/misc/carma/Kconfig"
 source "drivers/misc/altera-stapl/Kconfig"
 source "drivers/misc/mei/Kconfig"
+source "drivers/misc/vmw_vmci/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 456972f..21ed953 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -51,3 +51,5 @@ obj-y += carma/
 obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
 obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
 obj-$(CONFIG_INTEL_MEI)+= mei/
+obj-$(CONFIG_MAX8997_MUIC) += max8997-muic.o
+obj-$(CONFIG_VMWARE_VMCI)  += vmw_vmci/
diff --git a/drivers/misc/vmw_vmci/Kconfig b/drivers/misc/vmw_vmci/Kconfig
new file mode 100644
index 000..55015e7
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -0,0 +1,16 @@
+#
+# VMware VMCI device
+#
+
+config VMWARE_VMCI
+   tristate "VMware VMCI Driver"
+   depends on X86
+   help
+ This is VMware's Virtual Machine Communication Interface.  It enables
+ high-speed communication between host and guest in a virtual
+ environment via the VMCI virtual device.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called vmw_vmci.
diff --git a/drivers/misc/vmw_vmci/Makefile b/drivers/misc/vmw_vmci/Makefile
new file mode 100644
index 000..4da9893
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
+vmw_vmci-y += vmci_context.o vmci_datagram.o vmci_doorbell.o \
+   vmci_driver.o vmci_event.o vmci_guest.o vmci_handle_array.o \
+   vmci_host.o vmci_queue_pair.o vmci_resource.o vmci_route.o
diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h 
b/drivers/misc/vmw_vmci/vmci_common_int.h
new file mode 100644
index 000..77667ec
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_common_int.h
@@ -0,0 +1,34 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_COMMONINT_H_
+#define _VMCI_COMMONINT_H_
+
+#include 
+
+#define ASSERT(cond) BUG_ON(!(cond))
+
+#define PCI_VENDOR_ID_VMWARE   0x15AD
+#define PCI_DEVICE_ID_VMWARE_VMCI  0x0740
+#define VMCI_DRIVER_VERSION_STRING "1.0.0.0-k"
+#define MODULE_NAME "vmw_vmci"
+
+/* Print magic... whee! */
+#ifdef pr_fmt
+#undef pr_fmt
+#define pr_fmt(fmt) MODULE_NAME ": " fmt
+#endif
+
+#endif /* _VMCI_COMMONINT_H_ */
diff --git a/include/linux/vmw_vmci_api.h b/include/linux/vmw_vmci_api.h
new file mode 100644
index 000..193129d
--- /dev/null
+++ b/include/linux/vmw_vmci_api.h
@@ -0,0 +1,82 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef __VMW_VMCI_API_H__
+#define __VMW_VMCI_API_H__
+
+#include 
+#include 
+
+#undef  VMCI_KERNEL_API_VERSION
+#define VMCI_KERNEL_API_VERSION_1 1
+#define VMCI_KERNEL_API_VERSION_2 2
+#define VMCI_KERNEL_API_VERSIO

[PATCH 11/12] VMCI: host side driver implementation.

2012-11-01 Thread George Zhang
VMCI host side driver code implementation.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_host.c | 1036 +
 1 files changed, 1036 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_host.c

diff --git a/drivers/misc/vmw_vmci/vmci_host.c 
b/drivers/misc/vmw_vmci/vmci_host.c
new file mode 100644
index 000..82a8ef9
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -0,0 +1,1036 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_handle_array.h"
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+enum {
+   VMCI_NOTIFY_RESOURCE_QUEUE_PAIR = 0,
+   VMCI_NOTIFY_RESOURCE_DOOR_BELL = 1,
+};
+
+enum {
+   VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY = 0,
+   VMCI_NOTIFY_RESOURCE_ACTION_CREATE = 1,
+   VMCI_NOTIFY_RESOURCE_ACTION_DESTROY = 2,
+};
+
+/*
+ * VMCI driver initialization. This block can also be used to
+ * pass initial group membership etc.
+ */
+struct vmci_init_blk {
+   u32 cid;
+   u32 flags;
+};
+
+/* VMCIqueue_pairAllocInfo_VMToVM */
+struct vmci_qp_alloc_info_vmvm {
+   struct vmci_handle handle;
+   u32 peer;
+   u32 flags;
+   u64 produce_size;
+   u64 consume_size;
+   u64 produce_page_file;/* User VA. */
+   u64 consume_page_file;/* User VA. */
+   u64 produce_page_file_size;  /* Size of the file name array. */
+   u64 consume_page_file_size;  /* Size of the file name array. */
+   s32 result;
+   u32 _pad;
+};
+
+/* VMCISetNotifyInfo: Used to pass notify flag's address to the host driver. */
+struct vmci_set_notify_info {
+   u64 notify_uva;
+   s32 result;
+   u32 _pad;
+};
+
+/*
+ * Per-instance host state
+ */
+struct vmci_host_dev {
+   struct vmci_ctx *context;
+   int user_version;
+   enum vmci_obj_type ct_type;
+   struct mutex lock;  /* Mutex lock for vmci context access */
+};
+
+static struct vmci_ctx *host_context;
+static bool vmci_host_device_initialized;
+static atomic_t vmci_host_active_users = ATOMIC_INIT(0);
+
+/*
+ * Determines whether the VMCI host personality is
+ * available. Since the core functionality of the host driver is
+ * always present, all guests could possibly use the host
+ * personality. However, to minimize the deviation from the
+ * pre-unified driver state of affairs, we only consider the host
+ * device active if there is no active guest device or if there
+ * are VMX'en with active VMCI contexts using the host device.
+ */
+bool vmci_host_code_active(void)
+{
+   return vmci_host_device_initialized &&
+   (!vmci_guest_code_active() ||
+atomic_read(&vmci_host_active_users) > 0);
+}
+
+/*
+ * Called on open of /dev/vmci.
+ */
+static int vmci_host_open(struct inode *inode, struct file *filp)
+{
+   struct vmci_host_dev *vmci_host_dev;
+
+   vmci_host_dev = kzalloc(sizeof(struct vmci_host_dev), GFP_KERNEL);
+   if (vmci_host_dev == NULL)
+   return -ENOMEM;
+
+   vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+   mutex_init(&vmci_host_dev->lock);
+   filp->private_data = vmci_host_dev;
+
+   return 0;
+}
+
+/*
+ * Called on close of /dev/vmci, most often when the process
+ * exits.
+ */
+static int vmci_host_close(struct inode *inode, struct file *filp)
+{
+   struct vmci_host_dev *vmci_host_dev = filp->private_data;
+
+   if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
+   vmci_ctx_destroy(vmci_host_dev->context);
+   vmci_host_dev->context = NULL;
+
+   /*
+* The number of active contexts is used to track whether any
+* VMX'en are using the host personality. It is incremented when
+* a context is created through the IOCTL_VMCI_INIT_CONTEXT
+* ioctl.
+*/
+   atomic_dec(&vmci_host_active_users);
+   }
+   vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+
+   kfree(vmci_host_dev);
+   filp->private_data = NULL;
+   return 0;
+}
+
+/*
+ * This is us

[PATCH 10/12] VMCI: guest side driver implementation.

2012-11-01 Thread George Zhang
VMCI guest side driver code implementation.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_guest.c |  762 
 1 files changed, 762 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_guest.c

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c
new file mode 100644
index 000..eedbe4d
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -0,0 +1,762 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+static bool vmci_disable_msi;
+module_param_named(disable_msi, vmci_disable_msi, bool, 0);
+MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
+
+static bool vmci_disable_msix;
+module_param_named(disable_msix, vmci_disable_msix, bool, 0);
+MODULE_PARM_DESC(disable_msix, "Disable MSI-X use in driver - (default=0)");
+
+static u32 ctx_update_sub_id = VMCI_INVALID_ID;
+static u32 vm_context_id = VMCI_INVALID_ID;
+
+struct vmci_guest_device {
+   struct device *dev; /* PCI device we are attached to */
+   void __iomem *iobase;
+
+   unsigned int irq;
+   unsigned int intr_type;
+   bool exclusive_vectors;
+   struct msix_entry msix_entries[VMCI_MAX_INTRS];
+
+   struct tasklet_struct datagram_tasklet;
+   struct tasklet_struct bm_tasklet;
+
+   void *data_buffer;
+   void *notification_bitmap;
+};
+
+/* vmci_dev singleton device and supporting data*/
+static struct vmci_guest_device *vmci_dev_g;
+static DEFINE_SPINLOCK(vmci_dev_spinlock);
+
+static atomic_t vmci_num_guest_devices = ATOMIC_INIT(0);
+
+bool vmci_guest_code_active(void)
+{
+   return atomic_read(&vmci_num_guest_devices) != 0;
+}
+
+u32 vmci_get_vm_context_id(void)
+{
+   if (vm_context_id == VMCI_INVALID_ID) {
+   u32 result;
+   struct vmci_datagram get_cid_msg;
+   get_cid_msg.dst =
+   vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+VMCI_GET_CONTEXT_ID);
+   get_cid_msg.src = VMCI_ANON_SRC_HANDLE;
+   get_cid_msg.payload_size = 0;
+   result = vmci_send_datagram(&get_cid_msg);
+   if (result >= 0)
+   vm_context_id = result;
+   }
+   return vm_context_id;
+}
+
+/*
+ * VM to hypervisor call mechanism. We use the standard VMware naming
+ * convention since shared code is calling this function as well.
+ */
+int vmci_send_datagram(struct vmci_datagram *dg)
+{
+   unsigned long flags;
+   int result;
+
+   /* Check args. */
+   if (dg == NULL)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /*
+* Need to acquire spinlock on the device because the datagram
+* data may be spread over multiple pages and the monitor may
+* interleave device user rpc calls from multiple
+* VCPUs. Acquiring the spinlock precludes that
+* possibility. Disabling interrupts to avoid incoming
+* datagrams during a "rep out" and possibly landing up in
+* this function.
+*/
+   spin_lock_irqsave(&vmci_dev_spinlock, flags);
+
+   if (vmci_dev_g) {
+   iowrite8_rep(vmci_dev_g->iobase + VMCI_DATA_OUT_ADDR,
+dg, VMCI_DG_SIZE(dg));
+   result = ioread32(vmci_dev_g->iobase + VMCI_RESULT_LOW_ADDR);
+   } else {
+   result = VMCI_ERROR_UNAVAILABLE;
+   }
+
+   spin_unlock_irqrestore(&vmci_dev_spinlock, flags);
+
+   return result;
+}
+EXPORT_SYMBOL_GPL(vmci_send_datagram);
+
+/*
+ * Gets called with the new context id if updated or resumed.
+ * Context id.
+ */
+static void vmci_guest_cid_update(u32 sub_id,
+ const struct vmci_event_data *event_data,
+ void *client_data)
+{
+   const struct vmci_event_payld_ctx *ev_payload =
+   vmci_event_data_const_payload(event_data);
+
+   if (sub_id != ctx_update_sub_id) {
+   pr_devel("Invalid subscriber (ID=0x%x).\n", sub_id);
+   return;
+   }
+
+   if (!even

[PATCH 09/12] VMCI: routing implementation.

2012-11-01 Thread George Zhang
VMCI routing code is responsible for routing between various hosts/guests as
well as routing in nested scenarios.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_route.c |  233 
 drivers/misc/vmw_vmci/vmci_route.h |   30 +
 2 files changed, 263 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_route.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_route.h

diff --git a/drivers/misc/vmw_vmci/vmci_route.c 
b/drivers/misc/vmw_vmci/vmci_route.c
new file mode 100644
index 000..7917d5e
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_route.c
@@ -0,0 +1,233 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+/*
+ * Make a routing decision for the given source and destination handles.
+ * This will try to determine the route using the handles and the available
+ * devices.  Will set the source context if it is invalid.
+ */
+int vmci_route(struct vmci_handle *src,
+  const struct vmci_handle *dst,
+  bool from_guest,
+  enum vmci_route *route)
+{
+   bool has_host_device = vmci_host_code_active();
+   bool has_guest_device = vmci_guest_code_active();
+
+   *route = VMCI_ROUTE_NONE;
+
+   /*
+* "from_guest" is only ever set to true by
+* IOCTL_VMCI_DATAGRAM_SEND (or by the vmkernel equivalent),
+* which comes from the VMX, so we know it is coming from a
+* guest.
+*
+* To avoid inconsistencies, test these once.  We will test
+* them again when we do the actual send to ensure that we do
+* not touch a non-existent device.
+*/
+
+   /* Must have a valid destination context. */
+   if (VMCI_INVALID_ID == dst->context)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /* Anywhere to hypervisor. */
+   if (VMCI_HYPERVISOR_CONTEXT_ID == dst->context) {
+
+   /*
+* If this message already came from a guest then we
+* cannot send it to the hypervisor.  It must come
+* from a local client.
+*/
+   if (from_guest)
+   return VMCI_ERROR_DST_UNREACHABLE;
+
+   /*
+* We must be acting as a guest in order to send to
+* the hypervisor.
+*/
+   if (!has_guest_device)
+   return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+   /* And we cannot send if the source is the host context. */
+   if (VMCI_HOST_CONTEXT_ID == src->context)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   /*
+* If the client passed the ANON source handle then
+* respect it (both context and resource are invalid).
+* However, if they passed only an invalid context,
+* then they probably mean ANY, in which case we
+* should set the real context here before passing it
+* down.
+*/
+   if (VMCI_INVALID_ID == src->context &&
+   VMCI_INVALID_ID != src->resource)
+   src->context = vmci_get_context_id();
+
+   /* Send from local client down to the hypervisor. */
+   *route = VMCI_ROUTE_AS_GUEST;
+   return VMCI_SUCCESS;
+   }
+
+   /* Anywhere to local client on host. */
+   if (VMCI_HOST_CONTEXT_ID == dst->context) {
+   /*
+* If it is not from a guest but we are acting as a
+* guest, then we need to send it down to the host.
+* Note that if we are also acting as a host then this
+* will prevent us from sending from local client to
+* local client, but we accept that restriction as a
+* way to remove any ambiguity from the host context.
+*/
+   if (src->context == VMCI_HYPERVISOR_CONTEXT_ID) {
+   /*
+* If the hypervisor is the source, this is
+* host local communication. The hypervisor
+* may send vmci event datagrams to the host
+* itself, but it will never send datagrams to
+* an "outer host" t

[PATCH 08/12] VMCI: resource object implementation.

2012-11-01 Thread George Zhang
VMCI resource tracks all used resources within the vmci code.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_resource.c |  232 +
 drivers/misc/vmw_vmci/vmci_resource.h |   59 
 2 files changed, 291 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_resource.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_resource.h

diff --git a/drivers/misc/vmw_vmci/vmci_resource.c 
b/drivers/misc/vmw_vmci/vmci_resource.c
new file mode 100644
index 000..b241dde
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_resource.c
@@ -0,0 +1,232 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+
+
+#define VMCI_RESOURCE_HASH_BITS 7
+#define VMCI_RESOURCE_HASH_BUCKETS  (1 << VMCI_RESOURCE_HASH_BITS)
+
+struct vmci_hash_table {
+   spinlock_t lock;
+   struct hlist_head entries[VMCI_RESOURCE_HASH_BUCKETS];
+};
+
+static struct vmci_hash_table vmci_resource_table = {
+   .lock = __SPIN_LOCK_UNLOCKED(vmci_resource_table.lock),
+};
+
+static unsigned int vmci_resource_hash(struct vmci_handle handle)
+{
+   return hash_32(VMCI_HANDLE_TO_RESOURCE_ID(handle),
+  VMCI_RESOURCE_HASH_BITS);
+}
+
+/*
+ * Gets a resource (if one exists) matching given handle from the hash table.
+ */
+static struct vmci_resource *vmci_resource_lookup(struct vmci_handle handle)
+{
+   struct vmci_resource *r, *resource = NULL;
+   struct hlist_node *node;
+   unsigned int idx = vmci_resource_hash(handle);
+
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(r, node,
+&vmci_resource_table.entries[idx], node) {
+   u32 rid = VMCI_HANDLE_TO_RESOURCE_ID(r->handle);
+   u32 cid = VMCI_HANDLE_TO_CONTEXT_ID(r->handle);
+
+   if (rid == VMCI_HANDLE_TO_RESOURCE_ID(handle) &&
+   (cid == VMCI_HANDLE_TO_CONTEXT_ID(handle) ||
+cid == VMCI_INVALID_ID)) {
+   resource = r;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return resource;
+}
+
+/*
+ * Find an unused resource ID and return it. The first
+ * VMCI_RESERVED_RESOURCE_ID_MAX are reserved so we start from
+ * its value + 1.
+ * Returns VMCI resource id on success, VMCI_INVALID_ID on failure.
+ */
+static u32 vmci_resource_find_id(u32 context_id)
+{
+   static u32 resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+   u32 old_rid = resource_id;
+   u32 current_rid;
+
+   /*
+* Generate a unique resource ID.  Keep on trying until we wrap around
+* in the RID space.
+*/
+   BUG_ON(old_rid <= VMCI_RESERVED_RESOURCE_ID_MAX);
+
+   do {
+   struct vmci_handle handle;
+
+   current_rid = resource_id;
+   resource_id++;
+   if (unlikely(resource_id == VMCI_INVALID_ID)) {
+   /* Skip the reserved rids. */
+   resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+   }
+
+   handle = vmci_make_handle(context_id, current_rid);
+   if (!vmci_resource_lookup(handle))
+   return current_rid;
+   } while (resource_id != old_rid);
+
+   return VMCI_INVALID_ID;
+}
+
+
+int vmci_resource_add(struct vmci_resource *resource,
+ enum vmci_resource_type resource_type,
+ struct vmci_handle handle)
+
+{
+   unsigned int idx;
+   int result;
+
+   spin_lock(&vmci_resource_table.lock);
+
+   if (handle.resource == VMCI_INVALID_ID) {
+   handle.resource = vmci_resource_find_id(handle.context);
+   if (handle.resource == VMCI_INVALID_ID) {
+   result = VMCI_ERROR_NO_HANDLE;
+   goto out;
+   }
+   } else if (vmci_resource_lookup(handle)) {
+   result = VMCI_ERROR_ALREADY_EXISTS;
+   goto out;
+   }
+
+   resource->handle = handle;
+   resource->type = resource_type;
+   INIT_HLIST_NODE(&resource->node);
+   kref_init(&resource->kref);
+   init_completion(&resource->done);
+
+   idx = vmci_resource_hash(resource->handle);
+   BUG_ON(idx >= VMCI_RESOURCE_HASH_BUCKETS);
+   hlist_add_head_rcu(&resource->node, &vmci_

[PATCH 06/12] VMCI: handle array implementation.

2012-11-01 Thread George Zhang
VMCI handle code adds support for dynamic arrays that will grow if they need to.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_handle_array.c |  142 +
 drivers/misc/vmw_vmci/vmci_handle_array.h |   52 +++
 2 files changed, 194 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h

diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.c 
b/drivers/misc/vmw_vmci/vmci_handle_array.c
new file mode 100644
index 000..9122373
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_handle_array.c
@@ -0,0 +1,142 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include "vmci_handle_array.h"
+
+static size_t handle_arr_calc_size(size_t capacity)
+{
+   return sizeof(struct vmci_handle_arr) +
+   capacity * sizeof(struct vmci_handle);
+}
+
+struct vmci_handle_arr *vmci_handle_arr_create(size_t capacity)
+{
+   struct vmci_handle_arr *array;
+
+   if (capacity == 0)
+   capacity = VMCI_HANDLE_ARRAY_DEFAULT_SIZE;
+
+   array = kmalloc(handle_arr_calc_size(capacity), GFP_ATOMIC);
+   if (!array)
+   return NULL;
+
+   array->capacity = capacity;
+   array->size = 0;
+
+   return array;
+}
+
+void vmci_handle_arr_destroy(struct vmci_handle_arr *array)
+{
+   kfree(array);
+}
+
+void vmci_handle_arr_append_entry(struct vmci_handle_arr **array_ptr,
+ struct vmci_handle handle)
+{
+   struct vmci_handle_arr *array = *array_ptr;
+
+   if (unlikely(array->size >= array->capacity)) {
+   /* reallocate. */
+   struct vmci_handle_arr *new_array;
+   size_t new_capacity = array->capacity * VMCI_ARR_CAP_MULT;
+   size_t new_size = handle_arr_calc_size(new_capacity);
+
+   new_array = krealloc(array, new_size, GFP_ATOMIC);
+   if (!new_array)
+   return;
+
+   new_array->capacity = new_capacity;
+   *array_ptr = array = new_array;
+   }
+
+   array->entries[array->size] = handle;
+   array->size++;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if entry not found.
+ */
+struct vmci_handle vmci_handle_arr_remove_entry(struct vmci_handle_arr *array,
+   struct vmci_handle entry_handle)
+{
+   struct vmci_handle handle = VMCI_INVALID_HANDLE;
+   size_t i;
+
+   for (i = 0; i < array->size; i++) {
+   if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle)) {
+   handle = array->entries[i];
+   array->size--;
+   array->entries[i] = array->entries[array->size];
+   array->entries[array->size] = VMCI_INVALID_HANDLE;
+   break;
+   }
+   }
+
+   return handle;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if array was empty.
+ */
+struct vmci_handle vmci_handle_arr_remove_tail(struct vmci_handle_arr *array)
+{
+   struct vmci_handle handle = VMCI_INVALID_HANDLE;
+
+   if (array->size) {
+   array->size--;
+   handle = array->entries[array->size];
+   array->entries[array->size] = VMCI_INVALID_HANDLE;
+   }
+
+   return handle;
+}
+
+/*
+ * Handle at given index, VMCI_INVALID_HANDLE if invalid index.
+ */
+struct vmci_handle
+vmci_handle_arr_get_entry(const struct vmci_handle_arr *array, size_t index)
+{
+   if (unlikely(index >= array->size))
+   return VMCI_INVALID_HANDLE;
+
+   return array->entries[index];
+}
+
+bool vmci_handle_arr_has_entry(const struct vmci_handle_arr *array,
+  struct vmci_handle entry_handle)
+{
+   size_t i;
+
+   for (i = 0; i < array->size; i++)
+   if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle))
+   return true;
+
+   return false;
+}
+
+/*
+ * NULL if the array is empty. Otherwise, a pointer to the array
+ * of VMCI handles in the handle array.
+ */
+struct vmci_handle *vmci_handle_arr_get_handles(struct vmci_handle_arr *array)
+{
+   if (array->size)
+   return array->entries;
+
+   return NULL;
+}
diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.h 
b/drivers/misc/vmw_vmci/vmci_handle_array.h
new file mode 100644
index 000..b5f3a

[PATCH 05/12] VMCI: event handling implementation.

2012-11-01 Thread George Zhang
VMCI event code that manages event handlers and handles callbacks when specific
events fire.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_event.c |  229 
 drivers/misc/vmw_vmci/vmci_event.h |   25 
 2 files changed, 254 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_event.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_event.h

diff --git a/drivers/misc/vmw_vmci/vmci_event.c 
b/drivers/misc/vmw_vmci/vmci_event.c
new file mode 100644
index 000..e956c18
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_event.c
@@ -0,0 +1,229 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define EVENT_MAGIC 0xEABE
+#define VMCI_EVENT_MAX_ATTEMPTS 10
+
+struct vmci_subscription {
+   u32 id;
+   u32 event;
+   vmci_event_cb callback;
+   void *callback_data;
+   struct list_head node;  /* on one of subscriber lists */
+};
+
+static struct list_head subscriber_array[VMCI_EVENT_MAX];
+static DEFINE_MUTEX(subscriber_mutex);
+
+int __init vmci_event_init(void)
+{
+   int i;
+
+   for (i = 0; i < VMCI_EVENT_MAX; i++)
+   INIT_LIST_HEAD(&subscriber_array[i]);
+
+   return VMCI_SUCCESS;
+}
+
+void vmci_event_exit(void)
+{
+   int e;
+
+   /* We free all memory at exit. */
+   for (e = 0; e < VMCI_EVENT_MAX; e++) {
+   struct vmci_subscription *cur, *p2;
+   list_for_each_entry_safe(cur, p2, &subscriber_array[e], node) {
+
+   /*
+* We should never get here because all events
+* should have been unregistered before we try
+* to unload the driver module.
+*/
+   pr_warn("Unexpected free events occurring.\n");
+   list_del(&cur->node);
+   kfree(cur);
+   }
+   }
+}
+
+/*
+ * Find entry. Assumes subscriber_mutex is held.
+ */
+static struct vmci_subscription *event_find(u32 sub_id)
+{
+   int e;
+
+   for (e = 0; e < VMCI_EVENT_MAX; e++) {
+   struct vmci_subscription *cur;
+   list_for_each_entry(cur, &subscriber_array[e], node) {
+   if (cur->id == sub_id)
+   return cur;
+   }
+   }
+   return NULL;
+}
+
+/*
+ * Actually delivers the events to the subscribers.
+ * The callback function for each subscriber is invoked.
+ */
+static void event_deliver(struct vmci_event_msg *event_msg)
+{
+   struct vmci_subscription *cur;
+   struct list_head *subscriber_list;
+
+   rcu_read_lock();
+   subscriber_list = &subscriber_array[event_msg->event_data.event];
+   list_for_each_entry_rcu(cur, subscriber_list, node) {
+   BUG_ON(cur->event != event_msg->event_data.event);
+
+   cur->callback(cur->id, &event_msg->event_data,
+ cur->callback_data);
+   }
+   rcu_read_unlock();
+}
+
+/*
+ * Dispatcher for the VMCI_EVENT_RECEIVE datagrams. Calls all
+ * subscribers for given event.
+ */
+int vmci_event_dispatch(struct vmci_datagram *msg)
+{
+   struct vmci_event_msg *event_msg = (struct vmci_event_msg *)msg;
+
+   BUG_ON(msg->src.context != VMCI_HYPERVISOR_CONTEXT_ID ||
+  msg->dst.resource != VMCI_EVENT_HANDLER);
+
+   if (msg->payload_size < sizeof(u32) ||
+   msg->payload_size > sizeof(struct vmci_event_data_max))
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (!VMCI_EVENT_VALID(event_msg->event_data.event))
+   return VMCI_ERROR_EVENT_UNKNOWN;
+
+   event_deliver(event_msg);
+   return VMCI_SUCCESS;
+}
+
+/*
+ * vmci_event_subscribe() - Subscribe to a given event.
+ * @event:  The event to subscribe to.
+ * @callback:   The callback to invoke upon the event.
+ * @callback_data:  Data to pass to the callback.
+ * @subscription_id:ID used to track subscription.  Used with
+ *  vmci_event_unsubscribe()
+ *
+ * Subscribes to the provided event. The callback specified will be
+ * fired from RCU critical section and therefore must not sleep.
+ */
+int vmci_event_subscribe(u32 event,
+vmci_event_cb callback,
+void *callback_data,
+ 

[PATCH 04/12] VMCI: device driver implementaton.

2012-11-01 Thread George Zhang
VMCI driver code implementes both the host and guest personalities of
the VMCI driver.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_driver.c |  116 +++
 drivers/misc/vmw_vmci/vmci_driver.h |   50 +++
 2 files changed, 166 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_driver.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_driver.h

diff --git a/drivers/misc/vmw_vmci/vmci_driver.c 
b/drivers/misc/vmw_vmci/vmci_driver.c
new file mode 100644
index 000..8c5db38
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.c
@@ -0,0 +1,116 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+static bool vmci_disable_host;
+module_param_named(disable_host, vmci_disable_host, bool, 0);
+MODULE_PARM_DESC(disable_host,
+"Disable driver host personality (default=enabled)");
+
+static bool vmci_disable_guest;
+module_param_named(disable_guest, vmci_disable_guest, bool, 0);
+MODULE_PARM_DESC(disable_guest,
+"Disable driver guest personality (default=enabled)");
+
+static bool vmci_guest_personality_initialized;
+static bool vmci_host_personality_initialized;
+
+/*
+ * vmci_get_context_id() - Gets the current context ID.
+ *
+ * Returns the current context ID.  Note that since this is accessed only
+ * from code running in the host, this always returns the host context ID.
+ */
+u32 vmci_get_context_id(void)
+{
+   if (vmci_guest_code_active())
+   return vmci_get_vm_context_id();
+   else if (vmci_host_code_active())
+   return VMCI_HOST_CONTEXT_ID;
+
+   return VMCI_INVALID_ID;
+}
+EXPORT_SYMBOL_GPL(vmci_get_context_id);
+
+static int __init vmci_drv_init(void)
+{
+   int vmci_err;
+   int error;
+
+   vmci_err = vmci_event_init();
+   if (vmci_err < VMCI_SUCCESS) {
+   pr_err("Failed to initialize VMCIEvent (result=%d).\n", 
vmci_err);
+   return -EINVAL;
+   }
+
+   if (!vmci_disable_guest) {
+   error = vmci_guest_init();
+   if (error) {
+   pr_warn("Failed to initialize guest personality 
(err=%d).\n",
+   error);
+   } else {
+   vmci_guest_personality_initialized = true;
+   pr_info("Guest personality initialized and is %s\n",
+   vmci_guest_code_active() ?
+   "active" : "inactive");
+   }
+   }
+
+   if (!vmci_disable_host) {
+   error = vmci_host_init();
+   if (error) {
+   pr_warn("Unable to initialize host personality 
(err=%d).\n",
+   error);
+   } else {
+   vmci_host_personality_initialized = true;
+   pr_info("Initialized host personality\n");
+   }
+   }
+
+   if (!vmci_guest_personality_initialized &&
+   !vmci_host_personality_initialized) {
+   vmci_event_exit();
+   return -ENODEV;
+   }
+
+   return 0;
+}
+module_init(vmci_drv_init);
+
+static void __exit vmci_drv_exit(void)
+{
+   if (vmci_guest_personality_initialized)
+   vmci_guest_exit();
+
+   if (vmci_host_personality_initialized)
+   vmci_host_exit();
+
+   vmci_event_exit();
+}
+module_exit(vmci_drv_exit);
+
+MODULE_AUTHOR("VMware, Inc.");
+MODULE_DESCRIPTION("VMware Virtual Machine Communication Interface.");
+MODULE_VERSION(VMCI_DRIVER_VERSION_STRING);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/misc/vmw_vmci/vmci_driver.h 
b/drivers/misc/vmw_vmci/vmci_driver.h
new file mode 100644
index 000..f69156a
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.h
@@ -0,0 +1,50 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public

[PATCH 03/12] VMCI: doorbell implementation.

2012-11-01 Thread George Zhang
VMCI doorbell code allows for notifcations between host and guest.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_doorbell.c |  645 +
 drivers/misc/vmw_vmci/vmci_doorbell.h |   53 +++
 2 files changed, 698 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.h

diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c 
b/drivers/misc/vmw_vmci/vmci_doorbell.c
new file mode 100644
index 000..a54ec1e
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.c
@@ -0,0 +1,645 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+
+#define VMCI_DOORBELL_INDEX_BITS   6
+#define VMCI_DOORBELL_INDEX_TABLE_SIZE (1 << VMCI_DOORBELL_INDEX_BITS)
+#define VMCI_DOORBELL_HASH(_idx)   hash_32(_idx, VMCI_DOORBELL_INDEX_BITS)
+
+/*
+ * DoorbellEntry describes the a doorbell notification handle allocated by the
+ * host.
+ */
+struct dbell_entry {
+   struct vmci_resource resource;
+   struct hlist_node node;
+   struct work_struct work;
+   vmci_callback notify_cb;
+   void *client_data;
+   u32 idx;
+   u32 priv_flags;
+   bool run_delayed;
+   atomic_t active;/* Only used by guest personality */
+};
+
+/* The VMCI index table keeps track of currently registered doorbells. */
+struct dbell_index_table {
+   spinlock_t lock;/* Index table lock */
+   struct hlist_head entries[VMCI_DOORBELL_INDEX_TABLE_SIZE];
+};
+
+static struct dbell_index_table vmci_doorbell_it = {
+   .lock = __SPIN_LOCK_UNLOCKED(vmci_doorbell_it.lock),
+};
+
+/*
+ * The max_notify_idx is one larger than the currently known bitmap index in
+ * use, and is used to determine how much of the bitmap needs to be scanned.
+ */
+static u32 max_notify_idx;
+
+/*
+ * The notify_idx_count is used for determining whether there are free entries
+ * within the bitmap (if notify_idx_count + 1 < max_notify_idx).
+ */
+static u32 notify_idx_count;
+
+/*
+ * The last_notify_idx_reserved is used to track the last index handed out - in
+ * the case where multiple handles share a notification index, we hand out
+ * indexes round robin based on last_notify_idx_reserved.
+ */
+static u32 last_notify_idx_reserved;
+
+/* This is a one entry cache used to by the index allocation. */
+static u32 last_notify_idx_released = PAGE_SIZE;
+
+
+/*
+ * Utility function that retrieves the privilege flags associated
+ * with a given doorbell handle. For guest endpoints, the
+ * privileges are determined by the context ID, but for host
+ * endpoints privileges are associated with the complete
+ * handle. Hypervisor endpoints are not yet supported.
+ */
+int vmci_dbell_get_priv_flags(struct vmci_handle handle, u32 *priv_flags)
+{
+   if (priv_flags == NULL || handle.context == VMCI_INVALID_ID)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (handle.context == VMCI_HOST_CONTEXT_ID) {
+   struct dbell_entry *entry;
+   struct vmci_resource *resource;
+
+   resource = vmci_resource_by_handle(handle,
+  VMCI_RESOURCE_TYPE_DOORBELL);
+   if (!resource)
+   return VMCI_ERROR_NOT_FOUND;
+
+   entry = container_of(resource, struct dbell_entry, resource);
+   *priv_flags = entry->priv_flags;
+   vmci_resource_put(resource);
+   } else if (handle.context == VMCI_HYPERVISOR_CONTEXT_ID) {
+   /*
+* Hypervisor endpoints for notifications are not
+* supported (yet).
+*/
+   return VMCI_ERROR_INVALID_ARGS;
+   } else {
+   *priv_flags = vmci_context_get_priv_flags(handle.context);
+   }
+
+   return VMCI_SUCCESS;
+}
+
+/*
+ * Find doorbell entry by bitmap index.
+ */
+static struct dbell_entry *dbell_index_table_find(u32 idx)
+{
+   u32 bucket = VMCI_DOORBELL_HASH(idx);
+   struct dbell_entry *dbell;
+   struct hlist_node *node;
+
+   hlist_for_each_entry(dbell, node, &vmci_doorbell_it.entries[bucket],
+node) {
+   if (

[PATCH 02/12] VMCI: datagram implementation.

2012-11-01 Thread George Zhang
VMCI datagram Implements datagrams to allow data to be sent between host
and guest.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_datagram.c |  506 +
 drivers/misc/vmw_vmci/vmci_datagram.h |   52 +++
 2 files changed, 558 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.h

diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c 
b/drivers/misc/vmw_vmci/vmci_datagram.c
new file mode 100644
index 000..d755e31
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_datagram.c
@@ -0,0 +1,506 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+#include "vmci_route.h"
+
+/*
+ * struct datagram_entry describes the datagram entity. It is used for datagram
+ * entities created only on the host.
+ */
+struct datagram_entry {
+   struct vmci_resource resource;
+   u32 flags;
+   bool run_delayed;
+   vmci_datagram_recv_cb recv_cb;
+   void *client_data;
+   u32 priv_flags;
+};
+
+struct delayed_datagram_info {
+   struct datagram_entry *entry;
+   struct vmci_datagram msg;
+   struct work_struct work;
+   bool in_dg_host_queue;
+};
+
+/* Number of in-flight host->host datagrams */
+static atomic_t delayed_dg_host_queue_size = ATOMIC_INIT(0);
+
+/*
+ * Create a datagram entry given a handle pointer.
+ */
+static int dg_create_handle(u32 resource_id,
+   u32 flags,
+   u32 priv_flags,
+   vmci_datagram_recv_cb recv_cb,
+   void *client_data, struct vmci_handle *out_handle)
+{
+   int result;
+   u32 context_id;
+   struct vmci_handle handle;
+   struct datagram_entry *entry;
+
+   BUG_ON(!recv_cb);
+   BUG_ON(!out_handle);
+   BUG_ON(priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS);
+
+   if ((flags & VMCI_FLAG_WELLKNOWN_DG_HND) != 0)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if ((flags & VMCI_FLAG_ANYCID_DG_HND) != 0) {
+   context_id = VMCI_INVALID_ID;
+   } else {
+   context_id = vmci_get_context_id();
+   if (context_id == VMCI_INVALID_ID)
+   return VMCI_ERROR_NO_RESOURCES;
+   }
+
+   handle = vmci_make_handle(context_id, resource_id);
+
+   entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+   if (!entry) {
+   pr_warn("Failed allocating memory for datagram entry.\n");
+   return VMCI_ERROR_NO_MEM;
+   }
+
+   entry->run_delayed = (flags & VMCI_FLAG_DG_DELAYED_CB) ? true : false;
+   entry->flags = flags;
+   entry->recv_cb = recv_cb;
+   entry->client_data = client_data;
+   entry->priv_flags = priv_flags;
+
+   /* Make datagram resource live. */
+   result = vmci_resource_add(&entry->resource,
+  VMCI_RESOURCE_TYPE_DATAGRAM,
+  handle);
+   if (result != VMCI_SUCCESS) {
+   pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: 
%d\n",
+   handle.context, handle.resource, result);
+   kfree(entry);
+   return result;
+   }
+
+   *out_handle = vmci_resource_handle(&entry->resource);
+   return VMCI_SUCCESS;
+}
+
+/*
+ * Internal utility function with the same purpose as
+ * vmci_datagram_get_priv_flags that also takes a context_id.
+ */
+static int vmci_datagram_get_priv_flags(u32 context_id,
+   struct vmci_handle handle,
+   u32 *priv_flags)
+{
+   if (context_id == VMCI_INVALID_ID)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   if (context_id == VMCI_HOST_CONTEXT_ID) {
+   struct datagram_entry *src_entry;
+   struct vmci_resource *resource;
+
+   resource = vmci_resource_by_handle(handle,
+  VMCI_RESOURCE_TYPE_DATAGRAM);
+   if (!resource)
+   return VMCI_ERROR_INVALID_ARGS;
+
+   src_entry = container_of(resource, struct datagram_entry,
+resource);
+   

[PATCH 01/12] VMCI: context implementation.

2012-11-01 Thread George Zhang
VMCI Context code maintains state for vmci and allows the driver to communicate
with multiple VMs.


Signed-off-by: George Zhang 
---
 drivers/misc/vmw_vmci/vmci_context.c | 1247 ++
 drivers/misc/vmw_vmci/vmci_context.h |  183 +
 2 files changed, 1430 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_context.c
 create mode 100644 drivers/misc/vmw_vmci/vmci_context.h

diff --git a/drivers/misc/vmw_vmci/vmci_context.c 
b/drivers/misc/vmw_vmci/vmci_context.c
new file mode 100644
index 000..0df23e0
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_context.c
@@ -0,0 +1,1247 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+/*
+ * List of current VMCI contexts.  Contexts can be added by
+ * vmci_ctx_create() and removed via vmci_ctx_destroy().
+ * These, along with context lookup, are protected by the
+ * list structure's lock.
+ */
+static struct {
+   struct list_head head;
+   spinlock_t lock; /* Spinlock for context list operations */
+} ctx_list = {
+   .head = LIST_HEAD_INIT(ctx_list.head),
+   .lock = __SPIN_LOCK_UNLOCKED(ctx_list.lock),
+};
+
+/* Used by contexts that did not set up notify flag pointers */
+static bool ctx_dummy_notify;
+
+static void ctx_signal_notify(struct vmci_ctx *context)
+{
+   *context->notify = true;
+}
+
+static void ctx_clear_notify(struct vmci_ctx *context)
+{
+   *context->notify = false;
+}
+
+/*
+ * If nothing requires the attention of the guest, clears both
+ * notify flag and call.
+ */
+static void ctx_clear_notify_call(struct vmci_ctx *context)
+{
+   if (context->pending_datagrams == 0 &&
+   vmci_handle_arr_get_size(context->pending_doorbell_array) == 0)
+   ctx_clear_notify(context);
+}
+
+/*
+ * Sets the context's notify flag iff datagrams are pending for this
+ * context.  Called from vmci_setup_notify().
+ */
+void vmci_ctx_check_signal_notify(struct vmci_ctx *context)
+{
+   spin_lock(&context->lock);
+   if (context->pending_datagrams)
+   ctx_signal_notify(context);
+   spin_unlock(&context->lock);
+}
+
+/*
+ * Allocates and initializes a VMCI context.
+ */
+struct vmci_ctx *vmci_ctx_create(u32 cid, u32 priv_flags,
+uintptr_t event_hnd,
+int user_version,
+const struct cred *cred)
+{
+   struct vmci_ctx *context;
+   int error;
+
+   if (cid == VMCI_INVALID_ID) {
+   pr_devel("Invalid context ID for VMCI context.\n");
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   if (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS) {
+   pr_devel("Invalid flag (flags=0x%x) for VMCI context.\n",
+priv_flags);
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   if (user_version == 0) {
+   pr_devel("Invalid suer_version %d\n", user_version);
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   context = kzalloc(sizeof(*context), GFP_KERNEL);
+   if (!context) {
+   pr_warn("Failed to allocate memory for VMCI context.\n");
+   error = -EINVAL;
+   goto err_out;
+   }
+
+   kref_init(&context->kref);
+   spin_lock_init(&context->lock);
+   INIT_LIST_HEAD(&context->list_item);
+   INIT_LIST_HEAD(&context->datagram_queue);
+   INIT_LIST_HEAD(&context->notifier_list);
+
+   /* Initialize host-specific VMCI context. */
+   init_waitqueue_head(&context->host_context.wait_queue);
+
+   context->queue_pair_array = vmci_handle_arr_create(0);
+   if (!context->queue_pair_array) {
+   error = -ENOMEM;
+   goto err_free_ctx;
+   }
+
+   context->doorbell_array = vmci_handle_arr_create(0);
+   if (!context->doorbell_array) {
+   error = -ENOMEM;
+   goto err_free_qp_array;
+   }
+
+   context->pending_doorbell_array = vmci_handle_arr_create(0);
+   if (!context->pending_doorbell_array) {
+   error = -ENOMEM;
+   goto err_free_db_array;
+   }
+
+   c

[PATCH 00/12] VMCI for Linux upstreaming

2012-11-01 Thread George Zhang


* * *
This series of VMCI linux upstreaming patches include latest udpate from
VMware.

Summary of changes:
- Use RCU for context lookup.
- Remove redundant init()/empty functions.
- Remove unnecessary ASSERTs.
- Remove delayed event callbacks.
- Cleanup some comments.
- Cleanup makefiles.
- Drop -I$(src)/shared custom flags from Makefile for VSOCK.
- Rename vmci_ctx_init_ctx() to vmci_ctx_create().
- Get rid of vmci_ctx_init().
- Rename vmci_ctx_release() to vmci_ctx_put().



* * *

In an effort to improve the out-of-the-box experience with Linux
kernels for VMware users, VMware is working on readying the Virtual
Machine Communication Interface (vmw_vmci) and VMCI Sockets
(vmw_vsock) kernel modules for inclusion in the Linux kernel. The
purpose of this post is to acquire feedback on the vmw_vmci kernel
module. The vmw_vsock kernel module will be presented in a later post.


* * *

VMCI allows virtual machines to communicate with host kernel modules
and the VMware hypervisors. User level applications both in a virtual
machine and on the host can use vmw_vmci through VMCI Sockets, a socket
address family designed to be compatible with UDP and TCP at the
interface level. Today, VMCI and VMCI Sockets are used by the VMware
shared folders (HGFS) and various VMware Tools components inside the
guest for zero-config, network-less access to VMware host services. In
addition to this, VMware's users are using VMCI Sockets for various
applications, where network access of the virtual machine is
restricted or non-existent. Examples of this are VMs communicating
with device proxies for proprietary hardware running as host
applications and automated testing of applications running within
virtual machines.

In a virtual machine, VMCI is exposed as a regular PCI device. The
primary communication mechanisms supported are a point-to-point
bidirectional transport based on a pair of memory-mapped queues, and
asynchronous notifications in the form of datagrams and
doorbells. These features are available to kernel level components
such as HGFS and VMCI Sockets through the VMCI kernel API. In addition
to this, the VMCI kernel API provides support for receiving events
related to the state of the VMCI communication channels, and the
virtual machine itself.

Outside the virtual machine, the host side support of the VMCI kernel
module makes the same VMCI kernel API available to VMCI endpoints on
the host. In addition to this, the host side manages each VMCI device
in a virtual machine through a context object. This context object
serves to identify the virtual machine for communication, and to track
the resource consumption of the given VMCI device. Both operations
related to communication between the virtual machine and the host
kernel, and those related to the management of the VMCI device state
in the host kernel, are invoked by the user level component of the
hypervisor through a set of ioctls on the VMCI device node.  To
provide seamless support for nested virtualization, where a virtual
machine may use both a VMCI PCI device to talk to its hypervisor, and
the VMCI host side support to run nested virtual machines, the VMCI
host and virtual machine support are combined in a single kernel
module.

For additional information about the use of VMCI and in particular
VMCI Sockets, please refer to the VMCI Socket Programming Guide
available at https://www.vmware.com/support/developer/vmci-sdk/.


---

George Zhang (12):
  VMCI: context implementation.
  VMCI: datagram implementation.
  VMCI: doorbell implementation.
  VMCI: device driver implementaton.
  VMCI: event handling implementation.
  VMCI: handle array implementation.
  VMCI: queue pairs implementation.
  VMCI: resource object implementation.
  VMCI: routing implementation.
  VMCI: guest side driver implementation.
  VMCI: host side driver implementation.
  VMCI: Some header and config files.


 drivers/misc/Kconfig  |1 
 drivers/misc/Makefile |2 
 drivers/misc/vmw_vmci/Kconfig |   16 
 drivers/misc/vmw_vmci/Makefile|4 
 drivers/misc/vmw_vmci/vmci_common_int.h   |   34 
 drivers/misc/vmw_vmci/vmci_context.c  | 1247 ++
 drivers/misc/vmw_vmci/vmci_context.h  |  183 ++
 drivers/misc/vmw_vmci/vmci_datagram.c |  506 
 drivers/misc/vmw_vmci/vmci_datagram.h |   52 
 drivers/misc/vmw_vmci/vmci_doorbell.c |  645 +
 drivers/misc/vmw_vmci/vmci_doorbell.h |   53 
 drivers/misc/vmw_vmci/vmci_driver.c   |  116 +
 drivers/misc/vmw_vmci/vmci_driver.h   |   50 
 drivers/misc/vmw_vmci/vmci_event.c|  229 ++
 drivers/misc/vmw_vmci/vmci_event.h|   25 
 drivers/misc/vmw_vmci/vmci_guest.c|  762 ++
 drivers/misc/vmw_vmci/vmci_handle_array.c |  142 +
 drivers/misc/vmw_vmci/vmci_handle_array.h |   52 
 drivers/misc/vmw_vmci

Re: linux-next: Tree for Nov 1 (xen)

2012-11-01 Thread Randy Dunlap
On 10/31/2012 10:36 PM, Stephen Rothwell wrote:

> Hi all,
> 
> New trees: rr-fixes and swiotlb
> 
> Changes since 20121031:
> 


arch/x86/xen/enlighten.c:109:0: warning: "xen_pvh_domain" redefined
include/xen/xen.h:23:0: note: this is the location of the previous definition

Full randconfig file is attached.

-- 
~Randy
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.7.0-rc3 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_GPIO=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_BZIP2=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_FHANDLE=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set

#
# RCU Subsystem
#
CONFIG_TINY_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_IKCONFIG=m
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
# CONFIG_IPC_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_SYSFS_DEPRECATED_V2 is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_ANON_INODES=y
CONFIG_EXPERT=y
CONFIG_HAVE_UID16=y
CONFIG_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
# CONFIG_PRINTK is not set
CONFIG_BUG=y
CONFIG_ELF_CORE=y
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
# CONFIG_EPOLL is not set
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_

Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 1 Nov 2012 18:16:11 +0200

> Do you think it's over-engineering, or a good idea?

Engineer what you need, not what you might need.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread Michael S. Tsirkin
On Thu, Nov 01, 2012 at 11:50:24AM -0400, David Miller wrote:
> From: "Michael S. Tsirkin" 
> Date: Wed, 31 Oct 2012 12:31:06 +0200
> 
> > -void vhost_zerocopy_callback(struct ubuf_info *ubuf)
> > +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)
> 
> If you're only reporting true/false values, even just for now,
> please use 'bool' for this.

In fact next patch reports -ENOMEM when tun hits OOM so callback can
distinguish between copy (>0 value) and error (<0 value)
and reduce zerocopy more aggressively in case of errors.

The *callback* in vhost-net currently handles all non-zero
values identically, but I am not sure it's the optimal behaviour
so I thought it's worth it to give callbacks the info.

Do you think it's over-engineering, or a good idea?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 31 Oct 2012 12:31:06 +0200

> -void vhost_zerocopy_callback(struct ubuf_info *ubuf)
> +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)

If you're only reporting true/false values, even just for now,
please use 'bool' for this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH -next] xen/x86: remove duplicated include from enlighten.c

2012-11-01 Thread Wei Yongjun
From: Wei Yongjun 

Remove duplicated include.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 arch/x86/xen/enlighten.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index eb9a567..4c694a7 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -45,7 +45,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv7 4/4] virtio_console: Add support for remoteproc serial

2012-11-01 Thread Amit Shah
On (Tue) 23 Oct 2012 [12:17:49], Rusty Russell wrote:
> sjur.brandel...@stericsson.com writes:
> > From: Sjur Brændeland 

> > @@ -1415,7 +1524,16 @@ static void remove_port_data(struct port *port)
> >  
> > /* Remove buffers we queued up for the Host to send us data in. */
> > while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
> > -   free_buf(buf);
> > +   free_buf(buf, true);
> > +
> > +   /*
> > +* Remove buffers from out queue for rproc-serial. We cannot afford
> > +* to leak any DMA mem, so reclaim this memory even if this might be
> > +* racy for the remote processor.
> > +*/
> > +   if (is_rproc_serial(port->portdev->vdev))
> > +   while ((buf = virtqueue_detach_unused_buf(port->out_vq)))
> > +   free_buf(buf, true);
> >  }
> 
> This seems wrong; either this is needed even if !is_rproc_serial(), or
> it's not necessary as the out_vq is empty.
> 
> Every path I can see has the device reset (in which case the queues
> should not be active), or we got a VIRTIO_CONSOLE_PORT_REMOVE event (in
> which case, the same).
> 
> I think we can have non-blocking writes which could leave buffers in
> out_vq: Amit?

Those get 'reclaimed' just above this hunk:


static void remove_port_data(struct port *port)
{
struct port_buffer *buf;

/* Remove unused data this port might have received. */
discard_port_data(port);

reclaim_consumed_buffers(port);

/* Remove buffers we queued up for the Host to send us data in. */
while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
  free_buf(buf, true);

  ...




> >  static void __exit fini(void)
> >  {
> > +   reclaim_dma_bufs();
> 
> Hmm, you didn't protect it here anyway...
> 
> Cheers,
> Rusty.

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings

2012-11-01 Thread Rusty Russell
Sjur Brændeland  writes:
> Zero-Copy data transport on the modem is primary goal for CAIF Virtio.
> In order to achieve Zero-Copy the direction of the Virtio rings are
> flipped in the RX direction. So we have implemented the Virtio
> access-function similar to what is found in vhost.c.

So, this adds another host-side virtqueue implementation.

Can we combine them together conveniently?  You pulled out more stuff
into vring.h which is a start, but it's a bit overloaded.

Perhaps we should separate the common fields into struct vring, and use
it to build:

struct vring_guest {
struct vring vr;
u16 last_used_idx;
};

struct vring_host {
struct vring vr;
u16 last_avail_idx;
};

I haven't looked closely at vhost to see what it wants, but I would
think we could share more code.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization