date:20110201

[Qemu-devel] [PATCH v2] linux-user: implement sched_{g, s}etaffinity

2011-02-01 Thread Mike Frysinger

Signed-off-by: Mike Frysinger 
---
v2
- handle host/target size mismatches (32 on a 64)

 linux-user/syscall.c |   67 ++
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 64ba768..e405f56 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -235,6 +235,12 @@ _syscall6(int,sys_futex,int *,uaddr,int,op,int,val,
   const struct timespec *,timeout,int *,uaddr2,int,val3)
 #endif
 #endif
+#define __NR_sys_sched_getaffinity __NR_sched_getaffinity
+_syscall3(int, sys_sched_getaffinity, pid_t, pid, unsigned int, len,
+  unsigned long *, user_mask_ptr);
+#define __NR_sys_sched_setaffinity __NR_sched_setaffinity
+_syscall3(int, sys_sched_setaffinity, pid_t, pid, unsigned int, len,
+  unsigned long *, user_mask_ptr);
 
 static bitmask_transtbl fcntl_flags_tbl[] = {
   { TARGET_O_ACCMODE,   TARGET_O_WRONLY,O_ACCMODE,   O_WRONLY,},
@@ -6374,6 +6380,67 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
return value. */
 ret = -TARGET_ENOTDIR;
 break;
+case TARGET_NR_sched_getaffinity:
+{
+unsigned int mask_size;
+unsigned long *mask;
+
+/*
+ * sched_getaffinity needs multiples of ulong, so need to take
+ * care of mismatches between target ulong and host ulong sizes.
+ */
+if (arg2 & (sizeof(abi_ulong) - 1)) {
+ret = -TARGET_EINVAL;
+break;
+}
+mask_size = (arg2 + (sizeof(*mask) - 1)) & ~(sizeof(*mask) - 1);
+
+mask = alloca(mask_size);
+ret = get_errno(sys_sched_getaffinity(arg1, mask_size, mask));
+
+if (!is_error(ret)) {
+if (arg2 > ret) {
+/* Zero out any extra space kernel didn't fill */
+unsigned long zero = arg2 - ret;
+p = alloca(zero);
+memset(p, 0, zero);
+if (copy_to_user(arg3 + zero, p, zero)) {
+goto efault;
+}
+arg2 = ret;
+}
+if (copy_to_user(arg3, mask, arg2)) {
+goto efault;
+}
+ret = arg2;
+}
+}
+break;
+case TARGET_NR_sched_setaffinity:
+{
+unsigned int mask_size;
+unsigned long *mask;
+
+/*
+ * sched_setaffinity needs multiples of ulong, so need to take
+ * care of mismatches between target ulong and host ulong sizes.
+ */
+if (arg2 & (sizeof(abi_ulong) - 1)) {
+ret = -TARGET_EINVAL;
+break;
+}
+mask_size = (arg2 + (sizeof(*mask) - 1)) & ~(sizeof(*mask) - 1);
+
+mask = alloca(mask_size);
+if (!lock_user_struct(VERIFY_READ, p, arg3, 1)) {
+goto efault;
+}
+memcpy(mask, p, arg2);
+unlock_user_struct(p, arg2, 0);
+
+ret = get_errno(sys_sched_setaffinity(arg1, mask_size, mask));
+}
+break;
 case TARGET_NR_sched_setparam:
 {
 struct sched_param *target_schp;
-- 
1.7.4.rc3

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Alexander Graf

On 01.02.2011, at 05:19, Mike Frysinger wrote:

> On Mon, Jan 31, 2011 at 09:00, Edgar E. Iglesias wrote:
>> * Some operations seem to translate to large amounts of tcg code,
>>  maybe you should consider using helpers for those. An example
>>  is gen_rot_tl.
> 
> seems like generated native code is better even if it is larger

Is this true even when you declare the helper function PURE and CONST? I'm not 
saying it isn't, but it's certainly worth giving it a shot since you seem to 
have implemented the helper way now anyways :)

> 
>> * Maybe you should rename target-bfin/opcode/bfin.h into
>>  target-bfin/opcodes.h? or similiar..
> 
> i prefer existing usage since it mirrors existing sourceware tree
> where this file comes from
> 
>> * gen_hwloop_check() seems to be looking at env->lbreg from the
>>  translator, is that OK? What happens if env->lbreg changes at
>>  runtime?
> 
> this is taken care of already
> 
>> * cpu_get_tb_cpu_state() doesn't define any tb flags?
> 
> no idea what this func is supposed to do

Qemu uses the tb flags as identifier for a translation block. So a tb is only 
reused, when the physical address, flags and code segment (x86 thing) are equal.

This can be used to have privileged operation checks in translated code and 
just put the privileged state flag in the tb flags. Then the same code address 
run in user mode is different from when it's run in kernel mode.

> 
>> I think QEMU should aim to be fast and correct. There is no need to
>> be cycle accurate but emulation should IMO always aim to produce the
>> correct results.
> 
> it does produce the correct results with correct insns.  it is
> referring to invalid insns.  the hardware itself sometimes produces
> undefined behavior with some invalid insns.
> 
>> "- transactional parallel instructions
>> - on the hardware, if a load/store causes an exception, the other
>>   insns do not change register states either.  in qemu, they do,
>>   but since those exceptions will kill the program anyways, who
>>   cares.  no intermediate store buffers!"
>> 
>> This might be true for user emulation but I assume you'd want to fix this
>> for system emulation (at some point in time). Maybe you should just note
>> that it's not supported and leave it at that.
> 
> i dont see the point there either.  in order to handle this, the
> majority of insns would need to store all their results in
> intermediates and then move them to their final destination.

I guess you could just save off the register state on transaction start and 
restore it on transaction abort (page fault)? But yes, this is only really 
useful for system emulation, so you don't really have to care for now.

Alex

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Peter Maydell

On 1 February 2011 09:38, Alexander Graf  wrote:
> On 01.02.2011, at 05:19, Mike Frysinger wrote:
>> On Mon, Jan 31, 2011 at 09:00, Edgar E. Iglesias wrote:
>>> * cpu_get_tb_cpu_state() doesn't define any tb flags?
>>
>> no idea what this func is supposed to do
>
> Qemu uses the tb flags as identifier for a translation block.
> So a tb is only reused, when the physical address, flags and
> code segment (x86 thing) are equal.

Put another way, when you are generating code, if the code
you generate will change based on the contents of something
in the CPUState struct, then the bit of CPUState you are
looking at has to be one of:
 (1) encoded in the TB flags
  (as Alexander says, "is CPU in privileged mode"
  is a popular choice)
 (2) always constant for the life of the simulation
  (eg env->features if you have some sort of
  "target feature support" flags)
 (3) specially handled so that when it changes then
  all TBs or at least all affected TBs are flushed
  (env->breakpoints is in this category)

...because the CPUState you're passed in is not guaranteed
to be the state of the CPU at the PC which you are starting
translation from.

Otherwise you have case
 (4) a very hard to track down bug (qemu will in some
cases ask you to retranslate a block and if you don't
generate binary-identical output the second time around
things will go badly wrong)

So for instance here:

+static void gen_hwloop_check(DisasContext *dc)
+{
+bool loop1, loop0;
+int endl;
+
+loop1 = (dc->pc == dc->env->lbreg[1]);
+loop0 = (dc->pc == dc->env->lbreg[0]);

I suspect that this check of pc against the lbreg[]
values should be being done in the generated code,
not at translate time.

-- PMM

Re: [Qemu-devel] [V4 PATCH 2/8] Provide chroot environment server side interfaces

2011-02-01 Thread Daniel P. Berrange

On Tue, Feb 01, 2011 at 10:55:39AM +0530, M. Mohan Kumar wrote:
> Implement chroot server side interfaces like sending the file
> descriptor to qemu process, reading the object request from socket etc.
> Also add chroot main function and other helper routines.
> 
> Signed-off-by: M. Mohan Kumar 
> ---
>  Makefile.objs  |1 +
>  hw/9pfs/virtio-9p-chroot.c |  212 
> 
>  hw/9pfs/virtio-9p-chroot.h |   41 +
>  hw/9pfs/virtio-9p.c|   33 +++
>  hw/file-op-9p.h|3 +
>  5 files changed, 290 insertions(+), 0 deletions(-)
>  create mode 100644 hw/9pfs/virtio-9p-chroot.c
>  create mode 100644 hw/9pfs/virtio-9p-chroot.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index bc0344c..3007b6d 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> +/*
> + * Fork a process and chroot into the share path. Communication
> + * between qemu process and chroot process happens via socket
> + * All file descriptors (including stdout and stderr) are closed
> + * except one socket descriptor (which is used for communicating
> + * between qemu process and chroot process)
> + */
> +int v9fs_chroot(FsContext *fs_ctx)
> +{
> +int fd_pair[2], chroot_sock, error;
> +V9fsFileObjectRequest request;
> +pid_t pid;
> +uint64_t code;
> +FdInfo fd_info;
> +
> +if (socketpair(PF_UNIX, SOCK_STREAM, 0, fd_pair) < 0) {
> +error_report("socketpair %s", strerror(errno));
> +return -1;
> +}
> +
> +pid = fork();
> +if (pid < 0) {
> +error_report("fork %s", strerror(errno));
> +return -1;
> +}
> +if (pid != 0) {
> +fs_ctx->chroot_socket = fd_pair[0];
> +close(fd_pair[1]);
> +return 0;
> +}
> +
> +close(fd_pair[0]);
> +chroot_sock = fd_pair[1];
> +if (chroot(fs_ctx->fs_root) < 0) {
> +code = CHROOT_ERROR << 32 | errno;
> +error = qemu_write_full(chroot_sock, &code, sizeof(code));
> +_exit(1);
> +}
> +
> +error = chroot_daemonize(chroot_sock);
> +if (error) {
> +code = SETSID_ERROR << 32 | error;
> +error = qemu_write_full(chroot_sock, &code, sizeof(code));
> +_exit(1);
> +}
> +
> +/*
> + * Write 0 to chroot socket to indicate chroot process creation is
> + * successful
> + */
> +code = 0;
> +if (qemu_write_full(chroot_sock, &code, sizeof(code))
> +!= sizeof(code)) {
> +_exit(1);
> +}
> +/* get the request from the socket */
> +while (1) {
> +memset(&fd_info, 0, sizeof(fd_info));
> +if (chroot_read_request(chroot_sock, &request) == EIO) {
> +fd_info.fi_fd = 0;
> +fd_info.fi_error = EIO;
> +fd_info.fi_flags = FI_SOCKERR;
> +chroot_sendfd(chroot_sock, &fd_info);
> +continue;
> +}
> +qemu_free((void *)request.path.path);
> +if (request.data.oldpath_len) {
> +qemu_free((void *)request.path.old_path);
> +}
> +}
> +}

There is a subtle problem with using fork() in a multi-threaded
program that I was recently made aware of in libvirt. In short
if you have a multi-threaded program that calls fork(), then
the child process must only use POSIX functions that are
declared 'async signal safe', until the child calls exec() or
exit().  In particular any malloc()/free() related functions
are *not* async signal safe.

  http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html

  "If a multi-threaded process calls fork(), the new process shall contain
  a replica of the calling thread and its entire address space, possibly
  including the states of mutexes and other resources. Consequently, to
  avoid errors, the child process may only execute async-signal-safe
  operations until such time as one of the exec functions is called."

One example problem scenario. Thread 1 is currently doing a
malloc() and the malloc() impl is holding a mutex. Thread 2
now does a fork(), and in the child process calls malloc().
The child process will deadlock / hang forever because there
is nothing which will ever release the malloc() mutex that
was originally held by Thread 1. See also this thread which
brought the problem to my attention:

  http://lists.gnu.org/archive/html/coreutils/2011-01/msg00085.html

Regards,
Daniel

[Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Jes . Sorensen

From: Jes Sorensen 

Hi

This is a first attempt to add fsfreeze support to virtagent. The idea
is for the guest agent to walk the list of locally mounted file
systems in the guest, and issuing an ioctl to freeze them. The host
can then do a live snapshot of the guest, obtaining stable file
systems. After the snapshot, the host then calls the thaw function in
virtagent, which goes through the list of previously frozen file
systems and unfreezes them.

The list walking ignores remote file systems such as NFS and CIFS as
well as all pseudo file systems.

The guest agent code is in the first patch, and host agent code is in
the second patch. For now there is only human monitor support, but it
should be pretty straight forward to add QMP support as well.

Patches are against the virtagent-dev git tree.

Comments and suggestions welcome!

Cheers,
Jes


Jes Sorensen (2):
  Add virtagent file system freeze/thaw
  Add monitor commands for fsfreeze support

 hmp-commands.hx|   48 +++
 virtagent-common.h |9 ++
 virtagent-server.c |  196 +++
 virtagent.c|  235 
 virtagent.h|9 ++
 5 files changed, 497 insertions(+), 0 deletions(-)

-- 
1.7.3.5

[Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Jes . Sorensen

From: Jes Sorensen 

Implement freeze/thaw support in the guest, allowing the host to
request the guest freezes all it's file systems before a live snapshot
is performed.
 - fsfreeze(): Walk the list of mounted local real file systems,
   and freeze them.
 - fsthaw():   Walk the list of previously frozen file systems and
   thaw them.
 - fsstatus(): Return the current status of freeze/thaw. The host must
   poll this function, in case fsfreeze() returned with a
   timeout, to wait for the operation to finish.

Signed-off-by: Jes Sorensen 
---
 virtagent-common.h |8 ++
 virtagent-server.c |  196 
 2 files changed, 204 insertions(+), 0 deletions(-)

diff --git a/virtagent-common.h b/virtagent-common.h
index 5d8f5c1..220a4b6 100644
--- a/virtagent-common.h
+++ b/virtagent-common.h
@@ -61,6 +61,14 @@ typedef struct VAContext {
 const char *channel_path;
 } VAContext;
 
+enum vs_fsfreeze_status {
+FREEZE_ERROR = -1,
+FREEZE_THAWED = 0,
+FREEZE_INPROGRESS = 1,
+FREEZE_FROZEN = 2,
+FREEZE_THAWINPROGRESS = 3,
+};
+
 enum va_job_status {
 VA_JOB_STATUS_PENDING = 0,
 VA_JOB_STATUS_OK,
diff --git a/virtagent-server.c b/virtagent-server.c
index 7bb35b2..cf2a3f0 100644
--- a/virtagent-server.c
+++ b/virtagent-server.c
@@ -14,6 +14,13 @@
 #include 
 #include "qemu_socket.h"
 #include "virtagent-common.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 static VAServerData *va_server_data;
 static bool va_enable_syslog = false; /* enable syslog'ing of RPCs */
@@ -217,6 +224,189 @@ static xmlrpc_value *va_hello(xmlrpc_env *env,
 return result;
 }
 
+
+/*
+ * Walk the mount table and build a list of local file systems
+ */
+
+struct direntry {
+char *dirname;
+char *devtype;
+struct direntry *next;
+};
+
+static struct direntry *mount_list;
+static int fsfreeze_status;
+
+static int build_mount_list(void)
+{
+struct mntent *mnt;
+struct direntry *entry;
+struct direntry *next;
+char const *mtab = MOUNTED;
+FILE *fp;
+
+fp = setmntent(mtab, "r");
+if (!fp) {
+   fprintf(stderr, "unable to read mtab\n");
+   goto fail;
+}
+
+while ((mnt = getmntent(fp))) {
+   /*
+* An entry which device name doesn't start with a '/' is
+* either a dummy file system or a network file system.
+* Add special handling for smbfs and cifs as is done by
+* coreutils as well.
+*/
+   if ((mnt->mnt_fsname[0] != '/') ||
+   (strcmp(mnt->mnt_type, "smbfs") == 0) ||
+   (strcmp(mnt->mnt_type, "cifs") == 0)) {
+   continue;
+   }
+
+   entry = qemu_malloc(sizeof(struct direntry));
+   if (!entry) {
+   goto fail;
+   }
+   entry->dirname = qemu_strdup(mnt->mnt_dir);
+   entry->devtype = qemu_strdup(mnt->mnt_type);
+   entry->next = mount_list;
+
+   mount_list = entry;
+}
+
+endmntent(fp);
+
+return 0;
+ 
+fail:
+while(mount_list) {
+   next = mount_list->next;
+   qemu_free(mount_list->dirname);
+   qemu_free(mount_list->devtype);
+   qemu_free(mount_list);
+   mount_list = next;
+}
+
+return -1;
+}
+
+/*
+ * va_fsfreeze(): Walk list of mounted file systems in the guest, and
+ *   freeze the ones which are real local file systems.
+ * rpc return values: Number of file systems frozen, -1 on error.
+ */
+static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
+ xmlrpc_value *params,
+ void *user_data)
+{
+xmlrpc_int32 ret = 0, i = 0;
+xmlrpc_value *result;
+struct direntry *entry;
+int fd;
+SLOG("va_fsfreeze()");
+
+if (fsfreeze_status == FREEZE_FROZEN) {
+ret = 0;
+goto out;
+}
+
+ret = build_mount_list();
+if (ret < 0) {
+goto out;
+}
+
+fsfreeze_status = FREEZE_INPROGRESS;
+
+entry = mount_list;
+while(entry) {
+fd = qemu_open(entry->dirname, O_RDONLY);
+if (fd == -1) {
+ret = errno;
+goto error;
+}
+ret = ioctl(fd, FIFREEZE);
+if (ret < 0 && ret != EOPNOTSUPP) {
+goto error;
+}
+
+close(fd);
+entry = entry->next;
+i++;
+}
+
+fsfreeze_status = FREEZE_FROZEN;
+ret = i;
+out:
+result = xmlrpc_build_value(env, "i", ret);
+return result;
+error:
+if (i > 0) {
+fsfreeze_status = FREEZE_ERROR;
+}
+goto out;
+}
+
+/*
+ * va_fsthaw(): Walk list of frozen file systems in the guest, and
+ *   thaw them.
+ * rpc return values: Number of file systems thawed on success, -1 on error.
+ */
+static xmlrpc_value *va_fsthaw(xmlrpc_env *env,
+   xmlrpc_value *params,
+   void *user_data)
+{
+xmlrpc_int32 ret;
+xmlrpc_value *result;
+struct direntr

[Qemu-devel] [PATCH 2/2] Add monitor commands for fsfreeze support

2011-02-01 Thread Jes . Sorensen

From: Jes Sorensen 

This patch adds the following monitor commands:

agent_fsfreeze:
 - Freezes all local file systems in the guest. Command will print
   the number of file systems that were frozen.
agent_fsthaw:
 - Thaws all local file systems in the guest. Command will print
   the number of file systems that were thawed.
agent_fsstatus:
 - Prints the current status of file systems in the guest:
   Thawed, frozen, thaw in progress, freeze in progress, error.

Signed-off-by: Jes Sorensen 
---
 hmp-commands.hx|   48 +++
 virtagent-common.h |1 +
 virtagent.c|  235 
 virtagent.h|9 ++
 4 files changed, 293 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9c7ac0b..f4150da 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1310,6 +1310,54 @@ STEXI
 Fetch and re-negotiate guest agent capabilties
 ETEXI
 
+{
+.name   = "agent_fsfreeze",
+.args_type  = "",
+.params = "",
+.help   = "Freeze all local file systems mounted in the guest",
+.user_print = do_agent_fsfreeze_print,
+.mhandler.cmd_async = do_agent_fsfreeze,
+.flags  = MONITOR_CMD_ASYNC,
+},
+
+STEXI
+@item agent_fsfreeze
+@findex agent_fsfreeze
+Freeze all local mounted file systems in guest
+ETEXI
+
+{
+.name   = "agent_fsthaw",
+.args_type  = "",
+.params = "",
+.help   = "Thaw all local file systems mounted in the guest",
+.user_print = do_agent_fsthaw_print,
+.mhandler.cmd_async = do_agent_fsthaw,
+.flags  = MONITOR_CMD_ASYNC,
+},
+
+STEXI
+@item agent_fsthaw
+@findex agent_fsthaw
+Thaw all local mounted file systems in guest
+ETEXI
+
+{
+.name   = "agent_fsstatus",
+.args_type  = "",
+.params = "",
+.help   = "Display status of file system freeze progress in guest",
+.user_print = do_agent_fsstatus_print,
+.mhandler.cmd_async = do_agent_fsstatus,
+.flags  = MONITOR_CMD_ASYNC,
+},
+
+STEXI
+@item agent_fsstatus
+@findex agent_fsstatus
+Get status of file system freeze in guest
+ETEXI
+
 STEXI
 @end table
 ETEXI
diff --git a/virtagent-common.h b/virtagent-common.h
index 220a4b6..383de84 100644
--- a/virtagent-common.h
+++ b/virtagent-common.h
@@ -24,6 +24,7 @@
 #include "monitor.h"
 #include "virtagent-server.h"
 #include "virtagent.h"
+#include "qint.h"
 
 #define DEBUG_VA
 
diff --git a/virtagent.c b/virtagent.c
index b5e7944..4277802 100644
--- a/virtagent.c
+++ b/virtagent.c
@@ -640,3 +640,238 @@ int va_send_hello(void)
 xmlrpc_DECREF(params);
 return ret;
 }
+
+void do_agent_fsfreeze_print(Monitor *mon, const QObject *data)
+{
+TRACE("called");
+
+monitor_printf(mon, "File systems frozen: %" PRId64 "\n",
+   qint_get_int((qobject_to_qint(data;
+}
+
+static void do_agent_fsfreeze_cb(const char *resp_data,
+ size_t resp_data_len,
+ MonitorCompletion *mon_cb,
+ void *mon_data)
+{
+xmlrpc_value *resp = NULL;
+xmlrpc_env env;
+xmlrpc_int32 retval = 0;
+QInt *qint;
+
+TRACE("called");
+
+if (resp_data == NULL) {
+LOG("error handling RPC request");
+return;
+}
+
+xmlrpc_env_init(&env);
+resp = xmlrpc_parse_response(&env, resp_data, resp_data_len);
+if (va_rpc_has_error(&env)) {
+LOG("error parsing RPC response");
+return;
+}
+
+xmlrpc_parse_value(&env, resp, "i", &retval);
+if (va_rpc_has_error(&env)) {
+retval = -1;
+goto out;
+}
+
+out:
+qint = qint_from_int(retval);
+xmlrpc_DECREF(resp);
+if (mon_cb) {
+mon_cb(mon_data, QOBJECT(qint));
+}
+qobject_decref(QOBJECT(qint));
+}
+
+int do_agent_fsfreeze(Monitor *mon, const QDict *mon_params,
+  MonitorCompletion cb, void *opaque)
+{
+xmlrpc_env env;
+xmlrpc_value *params;
+int ret;
+
+TRACE("called");
+
+xmlrpc_env_init(&env);
+params = xmlrpc_build_value(&env, "()");
+if (va_rpc_has_error(&env)) {
+return -1;
+}
+
+ret = va_do_rpc(&env, "va.fsfreeze", params, do_agent_fsfreeze_cb,
+cb, opaque);
+if (ret) {
+qerror_report(QERR_VA_FAILED, ret, strerror(ret));
+}
+xmlrpc_DECREF(params);
+return ret;
+}
+
+void do_agent_fsthaw_print(Monitor *mon, const QObject *data)
+{
+TRACE("called");
+
+monitor_printf(mon, "File systems thawed: %" PRId64 "\n",
+   qint_get_int((qobject_to_qint(data;
+}
+
+static void do_agent_fsthaw_cb(const char *resp_data,
+   size_t resp_data_len,
+   MonitorCompletion *mon_cb,
+   void *mon_data)
+{
+xmlrpc_value *resp = NULL;
+xmlrpc_env env

Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Vasiliy G Tolstov

On Tue, 2011-02-01 at 11:58 +0100, jes.soren...@redhat.com wrote:
> From: Jes Sorensen 
> 
> Hi
> 
> This is a first attempt to add fsfreeze support to virtagent. The idea
> is for the guest agent to walk the list of locally mounted file
> systems in the guest, and issuing an ioctl to freeze them. The host
> can then do a live snapshot of the guest, obtaining stable file
> systems. After the snapshot, the host then calls the thaw function in
> virtagent, which goes through the list of previously frozen file
> systems and unfreezes them.
> 
> The list walking ignores remote file systems such as NFS and CIFS as
> well as all pseudo file systems.
> 
> The guest agent code is in the first patch, and host agent code is in
> the second patch. For now there is only human monitor support, but it
> should be pretty straight forward to add QMP support as well.
> 
> Patches are against the virtagent-dev git tree.
> 
> Comments and suggestions welcome!
> 
> Cheers,
> Jes

Hello. Very nice feature. Sorry for offropic, but can this feature can
be used to modify partiotion table on already mounted device (for
example root on ext3? )
Thank You.

-- 
Vasiliy G Tolstov 
Selfip.Ru

Re: [Qemu-devel] Re: [PATCH][RFC] Add stdio char device on windows

2011-02-01 Thread Fabien Chouteau


On 01/31/2011 06:43 PM, Paolo Bonzini wrote:

On 01/31/2011 06:09 PM, Fabien Chouteau wrote:

Simple implementation of an stdio char device on Windows.

Signed-off-by: Fabien Chouteau
---
  qemu-char.c |  171 
+++

  1 files changed, 171 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index edc9ad6..c18e668 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1436,6 +1436,11 @@ static CharDriverState 
*qemu_chr_open_pp(QemuOpts *opts)


  #else /* _WIN32 */

+#define STDIO_MAX_CLIENTS 2


Why 2?  If it were 1, you could use a single definition for Unix and 
Win32.



+
+static int stdio_nb_clients;
+static CharDriverState *stdio_clients[STDIO_MAX_CLIENTS];
+
  typedef struct {
  int max_size;
  HANDLE hcom, hrecv, hsend;
@@ -1788,6 +1793,171 @@ static CharDriverState 
*qemu_chr_open_win_file_out(QemuOpts *opts)


  return qemu_chr_open_win_file(fd_out);
  }
+
+static int win_stdio_write(CharDriverState *chr, const uint8_t *buf, 
int len)

+{
+HANDLE *Output = GetStdHandle(STD_OUTPUT_HANDLE);


Please use Hungarian notation for Windows objects, i.e. hStdOut.


+DWORD   size;
+int len1;
+
+len1 = len;
+
+while (len1>  0) {
+if (!WriteFile(Output, buf, len1,&size, NULL)) {
+break;
+}
+buf  += size;
+len1 -= size;
+}
+
+return len - len1;
+}
+
+static HANDLE *Input;


and hStdIn


+
+static void win_stdio_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+INPUT_RECORD buf[4];
+int  ret;
+DWORDsize;
+int  i;
+
+ret = ReadConsoleInput(Input, buf, sizeof(buf)/sizeof(*buf),&size);
+
+if (!ret) {
+/* Avoid error storm */
+qemu_del_wait_object(Input, NULL, NULL);
+return;
+}
+
+for (i = 0; i<  size; i++) {
+KEY_EVENT_RECORD *kev =&buf[i].Event.KeyEvent;
+
+if (buf[i].EventType == KEY_EVENT&&  kev->bKeyDown) {
+int j;
+if (kev->uChar.AsciiChar != 0) {
+for (j = 0; j<  kev->wRepeatCount; j++)
+if (qemu_chr_can_read(chr)) {
+uint8_t c = kev->uChar.AsciiChar;
+qemu_chr_read(chr,&c, 1);
+}
+}
+}
+}
+}
+
+static HANDLE  InputReadyEvent;
+static HANDLE  InputDoneEvent;
+static uint8_t InputBuf;


and hInputReadyEvent, hInputDoneEvent.  Stuff that is not Windows 
objects should use underscores as in win_stdio_buf.



+
+static DWORD WINAPI win_stdio_thread(LPVOID param)
+{
+int   ret;
+DWORD size;
+
+while (1) {
+
+/* Wait for one byte */
+ret = ReadFile(Input,&InputBuf, 1,&size, NULL);
+
+/* Exit in case of error, continue if nothing read */
+if (!ret) {
+break;
+}
+if (!size) {
+continue;
+}
+
+/* Some terminal emulator returns \r\n for Enter, just pass 
\n */

+if (InputBuf == '\r') {
+continue;
+}
+
+/* Signal the main thread and wait until the byte was eaten */
+if (!SetEvent(InputReadyEvent)) {
+break;
+}
+if (WaitForSingleObject(InputDoneEvent, INFINITE) != 
WAIT_OBJECT_0) {

+break;
+}
+}
+
+qemu_del_wait_object(InputReadyEvent, NULL, NULL);
+return 0;
+}
+
+static void win_stdio_thread_wait_func(void *opaque)
+{
+CharDriverState *chr = opaque;
+
+if (qemu_chr_can_read(chr)) {
+qemu_chr_read(chr,&InputBuf, 1);
+}
+
+SetEvent(InputDoneEvent);
+}
+
+static CharDriverState *qemu_chr_open_win_stdio(QemuOpts *opts)
+{
+CharDriverState *chr;
+DWORDmode;
+int  is_console = 0;
+
+Input = GetStdHandle(STD_INPUT_HANDLE);
+if (Input == INVALID_HANDLE_VALUE) {
+fprintf(stderr, "cannot open stdio: invalid handle\n");
+exit(1);
+}
+
+is_console = GetConsoleMode(Input,&mode) != 0;
+
+if (stdio_nb_clients>= STDIO_MAX_CLIENTS
+|| ((display_type != DT_NOGRAPHIC)&&  (stdio_nb_clients != 
0))) {

+return NULL;
+}
+
+chr = qemu_mallocz(sizeof(CharDriverState));
+if (!chr) {
+return NULL;
+}
+
+chr->chr_write = win_stdio_write;
+
+if (stdio_nb_clients == 0) {
+if (is_console) {
+if (qemu_add_wait_object(InputReadyEvent,
+ win_stdio_thread_wait_func, 
chr)) {

+fprintf(stderr, "qemu_add_wait_object: failed\n");
+}
+} else {
+DWORD   id;
+HANDLE *InputThread;
+
+InputReadyEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
+InputDoneEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
+InputThread = CreateThread(NULL, 0, win_stdio_thread, 
chr, 0,&id);

+
+if (InputThread == INVALID_HANDLE_VALUE
+|| Inp

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Alexander Graf


On 01.02.2011, at 11:31, Peter Maydell wrote:

> On 1 February 2011 09:38, Alexander Graf  wrote:
>> On 01.02.2011, at 05:19, Mike Frysinger wrote:
>>> On Mon, Jan 31, 2011 at 09:00, Edgar E. Iglesias wrote:
 * cpu_get_tb_cpu_state() doesn't define any tb flags?
>>> 
>>> no idea what this func is supposed to do
>> 
>> Qemu uses the tb flags as identifier for a translation block.
>> So a tb is only reused, when the physical address, flags and
>> code segment (x86 thing) are equal.
> 
> Put another way, when you are generating code, if the code
> you generate will change based on the contents of something
> in the CPUState struct, then the bit of CPUState you are
> looking at has to be one of:
> (1) encoded in the TB flags
>  (as Alexander says, "is CPU in privileged mode"
>  is a popular choice)
> (2) always constant for the life of the simulation
>  (eg env->features if you have some sort of
>  "target feature support" flags)
> (3) specially handled so that when it changes then
>  all TBs or at least all affected TBs are flushed
>  (env->breakpoints is in this category)
> 
> ...because the CPUState you're passed in is not guaranteed
> to be the state of the CPU at the PC which you are starting
> translation from.
> 
> Otherwise you have case
> (4) a very hard to track down bug (qemu will in some
> cases ask you to retranslate a block and if you don't
> generate binary-identical output the second time around
> things will go badly wrong)
> 
> So for instance here:
> 
> +static void gen_hwloop_check(DisasContext *dc)
> +{
> +bool loop1, loop0;
> +int endl;
> +
> +loop1 = (dc->pc == dc->env->lbreg[1]);
> +loop0 = (dc->pc == dc->env->lbreg[0]);
> 
> I suspect that this check of pc against the lbreg[]
> values should be being done in the generated code,
> not at translate time.

Ouch. Yes. General rule: never access env from translate.c :). If you need to 
check for anything, go through the tb flags. Or generate inline code that does 
the checks.

Thanks for the nice explanation Peter :)


Alex

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Peter Maydell

On 1 February 2011 11:46, Alexander Graf  wrote:

> Ouch. Yes. General rule: never access env from translate.c :).

With the arm target I was tempted to restructure translate.c so
that we copied the handful of "safe" bits of env that we used
into the disas context struct right at the start, and then didn't
pass env into the main translation functions at all. That would
make it much harder to accidentally use bits of env that we
should not, but in the end I decided against it because it would
make my qemu-meego merging job that much harder...

-- PMM

Re: [Qemu-devel] [V4 PATCH 2/8] Provide chroot environment server side interfaces

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 10:32 AM, Daniel P. Berrange  wrote:
> There is a subtle problem with using fork() in a multi-threaded
> program that I was recently made aware of in libvirt. In short
> if you have a multi-threaded program that calls fork(), then
> the child process must only use POSIX functions that are
> declared 'async signal safe', until the child calls exec() or
> exit().  In particular any malloc()/free() related functions
> are *not* async signal safe.

In this particular patch the fork() call happens quite early so the
risk should be low but it would be nice to investigate this issue
fully.

Stefan

[Qemu-devel] [PATCH 0.14] tap: safe sndbuf default

2011-02-01 Thread Michael S. Tsirkin

With current sndbuf default value, a blocked
target guest can prevent another guest from
transmitting any packets. While current
sndbuf value (1M) is reported to help some
UDP based workloads, the default should
be safe (0).

Signed-off-by: Michael S. Tsirkin 
---

I think this should go into 0.14.
Comments?

 net/tap-linux.c |   13 +
 qemu-options.hx |2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/tap-linux.c b/net/tap-linux.c
index f7aa904..ff8cad0 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -82,12 +82,17 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 return fd;
 }
 
-/* sndbuf should be set to a value lower than the tx queue
- * capacity of any destination network interface.
+/* sndbuf implements a kind of flow control for tap.
+ * Unfortunately when it's enabled, and packets are sent
+ * to other guests on the same host, the receiver
+ * can lock up the transmitter indefinitely.
+ *
+ * To avoid packet loss, sndbuf should be set to a value lower than the tx
+ * queue capacity of any destination network interface.
  * Ethernet NICs generally have txqueuelen=1000, so 1Mb is
- * a good default, given a 1500 byte MTU.
+ * a good value, given a 1500 byte MTU.
  */
-#define TAP_DEFAULT_SNDBUF 1024*1024
+#define TAP_DEFAULT_SNDBUF 0
 
 int tap_set_sndbuf(int fd, QemuOpts *opts)
 {
diff --git a/qemu-options.hx b/qemu-options.hx
index 11c93a2..ca4d5c9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1057,7 +1057,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "use '[down]script=no' to disable script execution\n"
 "use 'fd=h' to connect to an already opened TAP 
interface\n"
 "use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n"
-"default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n"
+"default is disabled 'sndbuf=0' to enable flow control set 
'sndbuf=1048576')\n"
 "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n"
 "use vnet_hdr=on to make the lack of IFF_VNET_HDR support 
an error condition\n"
 "use vhost=on to enable experimental in kernel 
accelerator\n"
-- 
1.7.3.2.91.g446ac

[Qemu-devel] Re: [PATCH][RFC] Add stdio char device on windows

2011-02-01 Thread Paolo Bonzini


On 02/01/2011 12:24 PM, Fabien Chouteau wrote:

Otherwise looks good, can you wait for
http://permalink.gmane.org/gmane.comp.emulators.qemu/88490 to be
merged so that you can add the set_echo implementation too?


OK, I can wait. When do you expect your patches to be ready?


They are, I'm waiting for someone to pick them.

Paolo

[Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Paolo Bonzini


On 01/31/2011 10:39 PM, Anthony Liguori wrote:

On 01/31/2011 12:10 PM, Jan Kiszka wrote:

On 2011-01-31 11:02, Juan Quintela wrote:

Please send in any agenda items you are interested incovering.


o KVM upstream merge: status, plans, coordination


o QMP support status for 0.14. Luiz and I already chatted about it today
but would be good to discuss in the call just to see if anyone has
opinions. Basically, declare it fully supported with a few minor caveats
(like human-monitor-passthrough is no more supported than the actual
monitor and recommendations about how to deal with devices in the device
tree).

Reminder: tomorrow is 0.14 stable fork.


Then I may add to the topics: pings of already-posted 0.14 patches. :)

Paolo

Re: [Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Luiz Capitulino

On Mon, 31 Jan 2011 15:39:22 -0600
Anthony Liguori  wrote:

> On 01/31/2011 12:10 PM, Jan Kiszka wrote:
> > On 2011-01-31 11:02, Juan Quintela wrote:
> >
> >> Please send in any agenda items you are interested incovering.
> >>
> >>  
> >   o KVM upstream merge: status, plans, coordination
> >
> 
>   o QMP support status for 0.14.  Luiz and I already chatted about it 
> today but would be good to discuss in the call just to see if anyone has 
> opinions.  Basically, declare it fully supported with a few minor 
> caveats (like human-monitor-passthrough is no more supported than the 
> actual monitor and recommendations about how to deal with devices in the 
> device tree).

o Summer of code 2011

> 
> Reminder: tomorrow is 0.14 stable fork.
> 
> Regards,
> 
> Anthony Liguori
> 
> > Jan
> >
> >
> 
>

Re: [Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Luiz Capitulino

On Tue, 1 Feb 2011 10:53:21 -0200
Luiz Capitulino  wrote:

> On Mon, 31 Jan 2011 15:39:22 -0600
> Anthony Liguori  wrote:
> 
> > On 01/31/2011 12:10 PM, Jan Kiszka wrote:
> > > On 2011-01-31 11:02, Juan Quintela wrote:
> > >
> > >> Please send in any agenda items you are interested incovering.
> > >>
> > >>  
> > >   o KVM upstream merge: status, plans, coordination
> > >
> > 
> >   o QMP support status for 0.14.  Luiz and I already chatted about it 
> > today but would be good to discuss in the call just to see if anyone has 
> > opinions.  Basically, declare it fully supported with a few minor 
> > caveats (like human-monitor-passthrough is no more supported than the 
> > actual monitor and recommendations about how to deal with devices in the 
> > device tree).
> 
> o Summer of code 2011

 Forgot to mention the wiki page:

   http://wiki.qemu.org/Google_Summer_of_Code_2011

[Qemu-devel] Re: [PATCH 2/3] Correct alarm deadline computation

2011-02-01 Thread Jan Kiszka

On 2011-01-31 22:51, Paolo Bonzini wrote:
> When the QEMU_CLOCK_HOST clock was added, computation of its
> deadline was added to qemu_next_deadline, which is correct but
> incomplete.
> 
> I noticed this by reading the very convoluted rules whereby
> qemu_next_deadline_dyntick is computed, which miss QEMU_CLOCK_HOST
> when use_icount is true.  This patch inlines qemu_next_deadline
> into qemu_next_deadline_dyntick, and then corrects the logic to skip
> only QEMU_CLOCK_VIRTUAL when use_icount is true.
> 
> Signed-off-by: Paolo Bonzini 
> Cc: Jan Kiszka 
> ---
>  qemu-timer.c |   15 +++
>  1 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 60283a8..c19d0a2 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -724,11 +724,18 @@ static uint64_t qemu_next_deadline_dyntick(void)
>  int64_t delta;
>  int64_t rtdelta;
>  
> -if (use_icount)
> +if (!use_icount && active_timers[QEMU_CLOCK_VIRTUAL]) {
> +delta = active_timers[QEMU_CLOCK_VIRTUAL]->expire_time -
> + qemu_get_clock(vm_clock);
> +} else {
>  delta = INT32_MAX;
> -else
> -delta = qemu_next_deadline();
> -
> +}
> +if (active_timers[QEMU_CLOCK_HOST]) {
> +int64_t hdelta = active_timers[QEMU_CLOCK_HOST]->expire_time -
> + qemu_get_clock(host_clock);
> +if (hdelta < delta)
> +delta = hdelta;
> +}
>  if (active_timers[QEMU_CLOCK_REALTIME]) {
>  rtdelta = (active_timers[QEMU_CLOCK_REALTIME]->expire_time -
>   qemu_get_clock_ns(rt_clock));

Looks good to me. I guess this applies without the first patch? Then it
should go in (unless you are working on a new version for 1/3).

Thanks for fixing this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Jes Sorensen

On 02/01/11 12:25, Vasiliy G Tolstov wrote:
> Hello. Very nice feature. Sorry for offropic, but can this feature can
> be used to modify partiotion table on already mounted device (for
> example root on ext3? )
> Thank You.
> 

No it cannot, that would be a totally different issue that wouldn't be
affected by this.

Cheers,
Jes

[Qemu-devel] Re: [PATCH 2/3] Correct alarm deadline computation

2011-02-01 Thread Paolo Bonzini


On 02/01/2011 02:01 PM, Jan Kiszka wrote:

Looks good to me. I guess this applies without the first patch? Then it
should go in (unless you are working on a new version for 1/3).


It's wrong without the first patch (micro instead of nanoseconds). 
However, I read Anthony's message as a suggestion rather than a rejection.


Paolo

[Qemu-devel] Re: [PATCH 2/3] Correct alarm deadline computation

2011-02-01 Thread Jan Kiszka

On 2011-02-01 14:04, Paolo Bonzini wrote:
> On 02/01/2011 02:01 PM, Jan Kiszka wrote:
>> Looks good to me. I guess this applies without the first patch? Then it
>> should go in (unless you are working on a new version for 1/3).
> 
> It's wrong without the first patch (micro instead of nanoseconds). 
> However, I read Anthony's message as a suggestion rather than a rejection.
> 

Yes, it's probably a better idea anyway to do this stepwise.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 12/22] kvm: Call qemu_kvm_eat_signals also under !CONFIG_IOTHREAD

2011-02-01 Thread Marcelo Tosatti

On Thu, Jan 27, 2011 at 02:09:56PM +0100, Jan Kiszka wrote:
> Move qemu_kvm_eat_signals around and call it also when the IO-thread is
> not used. Do not yet process SIGBUS, will be armed in a separate step.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  cpus.c |   88 ---
>  1 files changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 9071848..558c0d3 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -261,6 +261,45 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>  }
>  }
>  
> +static void qemu_kvm_eat_signals(CPUState *env)
> +{
> +struct timespec ts = { 0, 0 };
> +siginfo_t siginfo;
> +sigset_t waitset;
> +sigset_t chkset;
> +int r;
> +
> +sigemptyset(&waitset);
> +sigaddset(&waitset, SIG_IPI);
> +sigaddset(&waitset, SIGBUS);
> +
> +do {
> +r = sigtimedwait(&waitset, &siginfo, &ts);
> +if (r == -1 && !(errno == EAGAIN || errno == EINTR)) {
> +perror("sigtimedwait");
> +exit(1);
> +}
> +
> +switch (r) {
> +#ifdef CONFIG_IOTHREAD
> +case SIGBUS:
> +if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
> +sigbus_reraise();
> +}
> +break;
> +#endif
> +default:
> +break;
> +}
> +
> +r = sigpending(&chkset);
> +if (r == -1) {
> +perror("sigpending");
> +exit(1);
> +}
> +} while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
> +}
> +
>  #else /* _WIN32 */
>  
>  HANDLE qemu_event_handle;
> @@ -292,6 +331,10 @@ static void qemu_event_increment(void)
>  static void qemu_kvm_init_cpu_signals(CPUState *env)
>  {
>  }
> +
> +static void qemu_kvm_eat_signals(CPUState *env)
> +{
> +}
>  #endif /* _WIN32 */
>  
>  #ifndef CONFIG_IOTHREAD
> @@ -631,43 +674,6 @@ static void sigbus_handler(int n, struct 
> qemu_signalfd_siginfo *siginfo,
>  }
>  }
>  
> -static void qemu_kvm_eat_signals(CPUState *env)
> -{
> -struct timespec ts = { 0, 0 };
> -siginfo_t siginfo;
> -sigset_t waitset;
> -sigset_t chkset;
> -int r;
> -
> -sigemptyset(&waitset);
> -sigaddset(&waitset, SIG_IPI);
> -sigaddset(&waitset, SIGBUS);
> -
> -do {
> -r = sigtimedwait(&waitset, &siginfo, &ts);
> -if (r == -1 && !(errno == EAGAIN || errno == EINTR)) {
> -perror("sigtimedwait");
> -exit(1);
> -}
> -
> -switch (r) {
> -case SIGBUS:
> -if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
> -sigbus_reraise();
> -}
> -break;
> -default:
> -break;
> -}
> -
> -r = sigpending(&chkset);
> -if (r == -1) {
> -perror("sigpending");
> -exit(1);
> -}
> -} while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
> -}
> -
>  static void qemu_kvm_wait_io_event(CPUState *env)
>  {
>  while (!cpu_has_work(env))
> @@ -932,6 +938,8 @@ static int qemu_cpu_exec(CPUState *env)
>  
>  bool cpu_exec_all(void)
>  {
> +int r;
> +
>  if (next_cpu == NULL)
>  next_cpu = first_cpu;
>  for (; next_cpu != NULL && !exit_request; next_cpu = next_cpu->next_cpu) 
> {
> @@ -943,7 +951,11 @@ bool cpu_exec_all(void)
>  if (qemu_alarm_pending())
>  break;
>  if (cpu_can_run(env)) {
> -if (qemu_cpu_exec(env) == EXCP_DEBUG) {
> +r = qemu_cpu_exec(env);
> +if (kvm_enabled()) {
> +qemu_kvm_eat_signals(env);
> +}
> +if (r == EXCP_DEBUG) {
>  break;
>  }

As mentioned before, signal processing should be independent of
cpu_can_run (still want to process SIGALRM if vm is stopped, for
example).

[Qemu-devel] Re: [PATCH 12/22] kvm: Call qemu_kvm_eat_signals also under !CONFIG_IOTHREAD

2011-02-01 Thread Marcelo Tosatti

On Tue, Feb 01, 2011 at 10:38:35AM -0200, Marcelo Tosatti wrote:
> > @@ -943,7 +951,11 @@ bool cpu_exec_all(void)
> >  if (qemu_alarm_pending())
> >  break;
> >  if (cpu_can_run(env)) {
> > -if (qemu_cpu_exec(env) == EXCP_DEBUG) {
> > +r = qemu_cpu_exec(env);
> > +if (kvm_enabled()) {
> > +qemu_kvm_eat_signals(env);
> > +}
> > +if (r == EXCP_DEBUG) {
> >  break;
> >  }
> 
> As mentioned before, signal processing should be independent of
> cpu_can_run (still want to process SIGALRM if vm is stopped, for
> example).

Nevermind that comment.

[Qemu-devel] Re: [PATCH 16/22] Introduce VCPU self-signaling service

2011-02-01 Thread Marcelo Tosatti

On Thu, Jan 27, 2011 at 02:10:00PM +0100, Jan Kiszka wrote:
> Introduce qemu_cpu_kick_self to send SIG_IPI to the calling VCPU
> context. First user will be kvm.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  cpus.c|   21 +
>  qemu-common.h |1 +
>  2 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index bba59e5..88bed4e 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -531,6 +531,17 @@ void qemu_cpu_kick(void *env)
>  return;
>  }
>  
> +void qemu_cpu_kick_self(void)
> +{
> +#ifndef _WIN32
> +assert(cpu_single_env);
> +
> +raise(SIG_IPI);
> +#else
> +abort();
> +#endif
> +}
> +
>  void qemu_notify_event(void)
>  {
>  CPUState *env = cpu_single_env;
> @@ -808,6 +819,16 @@ void qemu_cpu_kick(void *_env)
>  }
>  }
>  
> +void qemu_cpu_kick_self(void)
> +{
> +assert(cpu_single_env);
> +
> +if (!cpu_single_env->thread_kicked) {
> +qemu_thread_signal(cpu_single_env->thread, SIG_IPI);
> +cpu_single_env->thread_kicked = true;
> +}
> +}
> +

There is no need to use cpu_single_env, can pass CPUState instead.

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Marcelo Tosatti

On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
> checking for exit_request on vcpu entry and timer signals arriving
> before KVM starts to catch them. Plug it by blocking both timer related
> signals also on !CONFIG_IOTHREAD and process those via signalfd.
> 
> Signed-off-by: Jan Kiszka 
> CC: Stefan Hajnoczi 
> ---
>  cpus.c |   18 ++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index fc3f222..29b1070 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>  pthread_sigmask(SIG_BLOCK, NULL, &set);
>  sigdelset(&set, SIG_IPI);
>  sigdelset(&set, SIGBUS);
> +#ifndef CONFIG_IOTHREAD
> +sigdelset(&set, SIGIO);
> +sigdelset(&set, SIGALRM);
> +#endif

I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
section.

>  r = kvm_set_signal_mask(env, &set);
>  if (r) {
>  fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(-r));
> @@ -351,6 +355,12 @@ static void qemu_kvm_eat_signals(CPUState *env)
>  exit(1);
>  }
>  } while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
> +
> +#ifndef CONFIG_IOTHREAD
> +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
> +qemu_notify_event();
> +}
> +#endif

Why is this necessary?

You should break out of cpu_exec_all if there's a pending alarm (see
qemu_alarm_pending()).

>  }
>  
>  #else /* _WIN32 */
> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
>  int ret;
>  
>  sigemptyset(&blocked_signals);
> +if (kvm_enabled()) {
> +/*
> + * We need to process timer signals synchronously to avoid a race
> + * between exit_request check and KVM vcpu entry.
> + */
> +sigaddset(&blocked_signals, SIGIO);
> +sigaddset(&blocked_signals, SIGALRM);
> +}

A block_io_signals() function for !IOTHREAD would be nicer.

>  
>  ret = qemu_signalfd_init(blocked_signals);
>  if (ret) {
> -- 
> 1.7.1

[Qemu-devel] Re: [PATCH 12/22] kvm: Call qemu_kvm_eat_signals also under !CONFIG_IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 13:38, Marcelo Tosatti wrote:
> On Thu, Jan 27, 2011 at 02:09:56PM +0100, Jan Kiszka wrote:
>> Move qemu_kvm_eat_signals around and call it also when the IO-thread is
>> not used. Do not yet process SIGBUS, will be armed in a separate step.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  cpus.c |   88 
>> ---
>>  1 files changed, 50 insertions(+), 38 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 9071848..558c0d3 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -261,6 +261,45 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>  }
>>  }
>>  
>> +static void qemu_kvm_eat_signals(CPUState *env)
>> +{
>> +struct timespec ts = { 0, 0 };
>> +siginfo_t siginfo;
>> +sigset_t waitset;
>> +sigset_t chkset;
>> +int r;
>> +
>> +sigemptyset(&waitset);
>> +sigaddset(&waitset, SIG_IPI);
>> +sigaddset(&waitset, SIGBUS);
>> +
>> +do {
>> +r = sigtimedwait(&waitset, &siginfo, &ts);
>> +if (r == -1 && !(errno == EAGAIN || errno == EINTR)) {
>> +perror("sigtimedwait");
>> +exit(1);
>> +}
>> +
>> +switch (r) {
>> +#ifdef CONFIG_IOTHREAD
>> +case SIGBUS:
>> +if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
>> +sigbus_reraise();
>> +}
>> +break;
>> +#endif
>> +default:
>> +break;
>> +}
>> +
>> +r = sigpending(&chkset);
>> +if (r == -1) {
>> +perror("sigpending");
>> +exit(1);
>> +}
>> +} while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
>> +}
>> +
>>  #else /* _WIN32 */
>>  
>>  HANDLE qemu_event_handle;
>> @@ -292,6 +331,10 @@ static void qemu_event_increment(void)
>>  static void qemu_kvm_init_cpu_signals(CPUState *env)
>>  {
>>  }
>> +
>> +static void qemu_kvm_eat_signals(CPUState *env)
>> +{
>> +}
>>  #endif /* _WIN32 */
>>  
>>  #ifndef CONFIG_IOTHREAD
>> @@ -631,43 +674,6 @@ static void sigbus_handler(int n, struct 
>> qemu_signalfd_siginfo *siginfo,
>>  }
>>  }
>>  
>> -static void qemu_kvm_eat_signals(CPUState *env)
>> -{
>> -struct timespec ts = { 0, 0 };
>> -siginfo_t siginfo;
>> -sigset_t waitset;
>> -sigset_t chkset;
>> -int r;
>> -
>> -sigemptyset(&waitset);
>> -sigaddset(&waitset, SIG_IPI);
>> -sigaddset(&waitset, SIGBUS);
>> -
>> -do {
>> -r = sigtimedwait(&waitset, &siginfo, &ts);
>> -if (r == -1 && !(errno == EAGAIN || errno == EINTR)) {
>> -perror("sigtimedwait");
>> -exit(1);
>> -}
>> -
>> -switch (r) {
>> -case SIGBUS:
>> -if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
>> -sigbus_reraise();
>> -}
>> -break;
>> -default:
>> -break;
>> -}
>> -
>> -r = sigpending(&chkset);
>> -if (r == -1) {
>> -perror("sigpending");
>> -exit(1);
>> -}
>> -} while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
>> -}
>> -
>>  static void qemu_kvm_wait_io_event(CPUState *env)
>>  {
>>  while (!cpu_has_work(env))
>> @@ -932,6 +938,8 @@ static int qemu_cpu_exec(CPUState *env)
>>  
>>  bool cpu_exec_all(void)
>>  {
>> +int r;
>> +
>>  if (next_cpu == NULL)
>>  next_cpu = first_cpu;
>>  for (; next_cpu != NULL && !exit_request; next_cpu = 
>> next_cpu->next_cpu) {
>> @@ -943,7 +951,11 @@ bool cpu_exec_all(void)
>>  if (qemu_alarm_pending())
>>  break;
>>  if (cpu_can_run(env)) {
>> -if (qemu_cpu_exec(env) == EXCP_DEBUG) {
>> +r = qemu_cpu_exec(env);
>> +if (kvm_enabled()) {
>> +qemu_kvm_eat_signals(env);
>> +}
>> +if (r == EXCP_DEBUG) {
>>  break;
>>  }
> 
> As mentioned before, signal processing should be independent of
> cpu_can_run (still want to process SIGALRM if vm is stopped, for
> example).
> 

For those signals that matter, it is.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [RFC][PATCH 00/12] qcow2: Convert qcow2 to use coroutines for async I/O

2011-02-01 Thread Kevin Wolf

Am 24.01.2011 00:31, schrieb Anthony Liguori:
> On 01/22/2011 03:29 AM, Stefan Hajnoczi wrote:
>> This patch series prototypes making QCOW2 fully asynchronous to eliminate the
>> timing jitter and poor performance that has been observed.  QCOW2 has
>> asynchronous I/O code paths for some of the read/write common cases but
>> metadata access is always synchronous.
>>
>> One solution is to rewrite QCOW2 to be fully asynchronous by splitting all
>> functions that perform blocking I/O into a series of callbacks.  Due to the
>> complexity of QCOW2, this conversion and the maintenance prospects are
>> unattractive.
>>
>> This patch series prototypes an alternative solution to make QCOW2
>> asynchronous.  It introduces coroutines, cooperative userspace threads of
>> control, so that each QCOW2 request has its own call stack.  To perform I/O,
>> the coroutine submits an asynchronous I/O request and then yields back to 
>> QEMU.
>> The coroutine stays suspended while the I/O operation is being processed by
>> lower layers of the stack.  When the asynchronous I/O completes, the 
>> coroutine
>> is resumed.
>>
>> The upshot of this is that QCOW2 can be implemented in a sequential fashion
>> without explicit callbacks but all I/O actually happens asynchronously under
>> the covers.
>>
>> This prototype implements reads, writes, and flushes.  Should install or boot
>> VMs successfully.  However, it has the following limitations:
>>
>> 1. QCOW2 requests are serialized because the code is not yet safe for
>> concurrent requests.  See the last patch for details.
>>
>> 2. Coroutines are unoptimized.  We should pool coroutines (and their mmapped
>> stacks) to avoid the cost of coroutine creation.
>>
>> 3. The qcow2_aio_read_cb() and qcow2_aoi_write_cb() functions should be
>> refactored into sequential code now that callbacks are no longer needed.
>>
>> I think this approach can solve the performance and functional problems of 
>> the
>> current QCOW2 implementation.  It does not require invasive changes, much of
>> QCOW2 works unmodified.
>>
>> Kevin: Do you like this approach and do you want to develop it further?
>>
> 
> Couple of additional comments:
> 
> 1) The coroutine code is copied in a number of projects.  I plan on 
> splitting it into a shared library.

When are you going to do so? Should we wait with using coroutines in
qemu until then or just start with maintaining a copy and remove it later?

I think we'll have to carry a copy of the library in the qemu source
anyway for a while if we depend on it and distributions don't have a
package for the library yet.

Kevin

[Qemu-devel] Re: [PATCH V6 1/4] nmi: convert cpu_index to cpu-index

2011-02-01 Thread Luiz Capitulino

On Thu, 27 Jan 2011 16:20:27 +0800
Lai Jiangshan  wrote:

> "cpu-index" which uses hyphen is better name.
> 
> Signed-off-by:  Lai Jiangshan 

It looks ok from a quick pass, but I can't apply it on current master, what
commit HEAD did you?

Btw, please, do include the patch 0/0 with a general description about the
series and a small changelog between changes.

> ---
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 5d4cb9e..e43ac7c 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -721,7 +721,7 @@ ETEXI
>  #if defined(TARGET_I386)
>  {
>  .name   = "nmi",
> -.args_type  = "cpu_index:i",
> +.args_type  = "cpu-index:i",
>  .params = "cpu",
>  .help   = "inject an NMI on the given CPU",
>  .mhandler.cmd = do_inject_nmi,
> diff --git a/monitor.c b/monitor.c
> index 27883f8..a916771 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2545,7 +2545,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
> *qdict)
>  static void do_inject_nmi(Monitor *mon, const QDict *qdict)
>  {
>  CPUState *env;
> -int cpu_index = qdict_get_int(qdict, "cpu_index");
> +int cpu_index = qdict_get_int(qdict, "cpu-index");
>  
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  if (env->cpu_index == cpu_index) {
>

[Qemu-devel] [Bug 550863] Re: MicroBlaze QEMU skips jumps when using single steps in GDB

2011-02-01 Thread Christophe

I'm having a similar issue with Qemu 0.13.0, program counter is just
incremented by 4 when single stepping. I am using GDB 7.2. Stephan, did
you get it to work in single step mode?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/550863

Title:
  MicroBlaze QEMU skips jumps when using single steps in GDB

Status in QEMU:
  New

Bug description:
  I'm trying to debug a MicroBlaze application (with the MicroBlaze
  system qemu, not the user mode qemu) using GDB. When I'm trying to
  single step through instructions all the branches are ignored: the
  program counter is just incremented by 4 (default), even for
  unconditional branches. This only occurs when using single step mode,
  everything runs as expected when using breakpoints only.

  Qemu Versions tested: 0.11.0, 0.12.3, GIT version (March 29, 2010)
  Qemu command used: qemu-system-microblaze -kernel test.elf  -S -s

  GDB Versions used: 7.1 (official MicroBlaze support), 6.5 (from Xilinx EDK 
10.1), 5.3 (from Xilinx EDK 9.1)
  GCC Versions used: 4.1.1 (from Xilinx EDK 11.1)

  I've attached my program, it uses a custom linker script and startup
  code. It runs fine without using singlestep mode.

  The bug looks similar to one submitted for PPC a while ago (181951).

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 13:47, Marcelo Tosatti wrote:
> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>> checking for exit_request on vcpu entry and timer signals arriving
>> before KVM starts to catch them. Plug it by blocking both timer related
>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>
>> Signed-off-by: Jan Kiszka 
>> CC: Stefan Hajnoczi 
>> ---
>>  cpus.c |   18 ++
>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index fc3f222..29b1070 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>  pthread_sigmask(SIG_BLOCK, NULL, &set);
>>  sigdelset(&set, SIG_IPI);
>>  sigdelset(&set, SIGBUS);
>> +#ifndef CONFIG_IOTHREAD
>> +sigdelset(&set, SIGIO);
>> +sigdelset(&set, SIGALRM);
>> +#endif
> 
> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> section.

You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?

> 
>>  r = kvm_set_signal_mask(env, &set);
>>  if (r) {
>>  fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(-r));
>> @@ -351,6 +355,12 @@ static void qemu_kvm_eat_signals(CPUState *env)
>>  exit(1);
>>  }
>>  } while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
>> +
>> +#ifndef CONFIG_IOTHREAD
>> +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>> +qemu_notify_event();
>> +}
>> +#endif
> 
> Why is this necessary?
> 
> You should break out of cpu_exec_all if there's a pending alarm (see
> qemu_alarm_pending()).

qemu_alarm_pending() is not true until the signal is actually taken. The
alarm handler sets the required flags.

> 
>>  }
>>  
>>  #else /* _WIN32 */
>> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
>>  int ret;
>>  
>>  sigemptyset(&blocked_signals);
>> +if (kvm_enabled()) {
>> +/*
>> + * We need to process timer signals synchronously to avoid a race
>> + * between exit_request check and KVM vcpu entry.
>> + */
>> +sigaddset(&blocked_signals, SIGIO);
>> +sigaddset(&blocked_signals, SIGALRM);
>> +}
> 
> A block_io_signals() function for !IOTHREAD would be nicer.

Well, we aren't blocking all I/O signals, so I decided against causing
confusion to people that try to compare the result against real
block_io_signals. If you mean just pushing those lines that set up
blocked_signals into a separate function, then I need to find a good
name for it.

> 
>>  
>>  ret = qemu_signalfd_init(blocked_signals);
>>  if (ret) {
>> -- 
>> 1.7.1

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 16/22] Introduce VCPU self-signaling service

2011-02-01 Thread Jan Kiszka

On 2011-02-01 14:14, Marcelo Tosatti wrote:
> On Thu, Jan 27, 2011 at 02:10:00PM +0100, Jan Kiszka wrote:
>> Introduce qemu_cpu_kick_self to send SIG_IPI to the calling VCPU
>> context. First user will be kvm.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  cpus.c|   21 +
>>  qemu-common.h |1 +
>>  2 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index bba59e5..88bed4e 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -531,6 +531,17 @@ void qemu_cpu_kick(void *env)
>>  return;
>>  }
>>  
>> +void qemu_cpu_kick_self(void)
>> +{
>> +#ifndef _WIN32
>> +assert(cpu_single_env);
>> +
>> +raise(SIG_IPI);
>> +#else
>> +abort();
>> +#endif
>> +}
>> +
>>  void qemu_notify_event(void)
>>  {
>>  CPUState *env = cpu_single_env;
>> @@ -808,6 +819,16 @@ void qemu_cpu_kick(void *_env)
>>  }
>>  }
>>  
>> +void qemu_cpu_kick_self(void)
>> +{
>> +assert(cpu_single_env);
>> +
>> +if (!cpu_single_env->thread_kicked) {
>> +qemu_thread_signal(cpu_single_env->thread, SIG_IPI);
>> +cpu_single_env->thread_kicked = true;
>> +}
>> +}
>> +
> 
> There is no need to use cpu_single_env, can pass CPUState instead.
> 

It's done intentionally this way: function shall not be used for a
remote env.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] qcow2: Really use cache=unsafe for image creation

2011-02-01 Thread Stefan Hajnoczi

On Thu, Jan 27, 2011 at 3:50 PM, Kevin Wolf  wrote:
> For cache=unsafe we also need to set BDRV_O_CACHE_WB, otherwise we have some
> strange unsafe writethrough mode.
>
> Signed-off-by: Kevin Wolf 
> ---
>  block/qcow2.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Reviewed-by: Stefan Hajnoczi

[Qemu-devel] Re: [PATCH 16/22] Introduce VCPU self-signaling service

2011-02-01 Thread Marcelo Tosatti

On Tue, Feb 01, 2011 at 02:33:45PM +0100, Jan Kiszka wrote:
> >> +++ b/cpus.c
> >> @@ -531,6 +531,17 @@ void qemu_cpu_kick(void *env)
> >>  return;
> >>  }
> >>  
> >> +void qemu_cpu_kick_self(void)
> >> +{
> >> +#ifndef _WIN32
> >> +assert(cpu_single_env);
> >> +
> >> +raise(SIG_IPI);
> >> +#else
> >> +abort();
> >> +#endif
> >> +}
> >> +
> >>  void qemu_notify_event(void)
> >>  {
> >>  CPUState *env = cpu_single_env;
> >> @@ -808,6 +819,16 @@ void qemu_cpu_kick(void *_env)
> >>  }
> >>  }
> >>  
> >> +void qemu_cpu_kick_self(void)
> >> +{
> >> +assert(cpu_single_env);
> >> +
> >> +if (!cpu_single_env->thread_kicked) {
> >> +qemu_thread_signal(cpu_single_env->thread, SIG_IPI);
> >> +cpu_single_env->thread_kicked = true;
> >> +}
> >> +}
> >> +
> > 
> > There is no need to use cpu_single_env, can pass CPUState instead.
> > 
> 
> It's done intentionally this way: function shall not be used for a
> remote env.
> 
> Jan

Can assert on qemu_cpu_self(env) for that.

Re: [Qemu-devel] [PATCH 1/3] FVD: Added support for 'qemu-img update'

2011-02-01 Thread Stefan Hajnoczi

On Mon, Jan 31, 2011 at 2:49 PM, Chunqiang Tang  wrote:
> The paper [2] below from VMware is informative but cannot be adopted by us
> directly, as the problem domain is different. I previously had a paper on
> general congestion control,
> https://sites.google.com/site/tangchq/papers/NCI-USENIX09.pdf?attredirects=0
> , and attended a conference together with an author of paper [2] , and had
> some discussions. It is an interesting paper.
>
> [1] http://suif.stanford.edu/papers/nsdi05.pdf .
> [2] http://www.usenix.org/events/fast09/tech/full_papers/gulati/gulati.pdf

Thanks for the links, I will take a look.  I can see both latency and
throughput being problematic.  With throughput the problem is that the
I/O pattern may be sporadic or bursty.  It's hard for a heuristic to
follow large changes in the guest's I/O pattern.

Stefan

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 14:48, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
 Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
 checking for exit_request on vcpu entry and timer signals arriving
 before KVM starts to catch them. Plug it by blocking both timer related
 signals also on !CONFIG_IOTHREAD and process those via signalfd.

 Signed-off-by: Jan Kiszka 
 CC: Stefan Hajnoczi 
 ---
  cpus.c |   18 ++
  1 files changed, 18 insertions(+), 0 deletions(-)

 diff --git a/cpus.c b/cpus.c
 index fc3f222..29b1070 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
  pthread_sigmask(SIG_BLOCK, NULL, &set);
  sigdelset(&set, SIG_IPI);
  sigdelset(&set, SIGBUS);
 +#ifndef CONFIG_IOTHREAD
 +sigdelset(&set, SIGIO);
 +sigdelset(&set, SIGALRM);
 +#endif
>>>
>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>> section.
>>
>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
> 
> Yes, so to avoid #ifdefs spread.

Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
delta though.

> 
 +
 +#ifndef CONFIG_IOTHREAD
 +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
 +qemu_notify_event();
 +}
 +#endif
>>>
>>> Why is this necessary?
>>>
>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>> qemu_alarm_pending()).
>>
>> qemu_alarm_pending() is not true until the signal is actually taken. The
>> alarm handler sets the required flags.
> 
> Right. What i mean is you need to execute the signal handler inside
> cpu_exec_all loop (so that alarm pending is set).
> 
> So, if there is a SIGALRM pending, qemu_run_timers has highest
> priority, not vcpu execution.

We leave the vcpu loop (thanks to notify_event), process the signal in
the event loop and run the timer handler. This pattern is IMO less
invasive to the existing code, specifically as it is about to die
long-term anyway.

> 
  
  #else /* _WIN32 */
 @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
  int ret;
  
  sigemptyset(&blocked_signals);
 +if (kvm_enabled()) {
 +/*
 + * We need to process timer signals synchronously to avoid a race
 + * between exit_request check and KVM vcpu entry.
 + */
 +sigaddset(&blocked_signals, SIGIO);
 +sigaddset(&blocked_signals, SIGALRM);
 +}
>>>
>>> A block_io_signals() function for !IOTHREAD would be nicer.
>>
>> Well, we aren't blocking all I/O signals, so I decided against causing
>> confusion to people that try to compare the result against real
>> block_io_signals. If you mean just pushing those lines that set up
>> blocked_signals into a separate function, then I need to find a good
>> name for it.
> 
> Yes, separate function, similar to CONFIG_IOTHREAD case (feel free to
> rename function).
> 

Will check.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 16/22] Introduce VCPU self-signaling service

2011-02-01 Thread Jan Kiszka

On 2011-02-01 14:50, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:33:45PM +0100, Jan Kiszka wrote:
 +++ b/cpus.c
 @@ -531,6 +531,17 @@ void qemu_cpu_kick(void *env)
  return;
  }
  
 +void qemu_cpu_kick_self(void)
 +{
 +#ifndef _WIN32
 +assert(cpu_single_env);
 +
 +raise(SIG_IPI);
 +#else
 +abort();
 +#endif
 +}
 +
  void qemu_notify_event(void)
  {
  CPUState *env = cpu_single_env;
 @@ -808,6 +819,16 @@ void qemu_cpu_kick(void *_env)
  }
  }
  
 +void qemu_cpu_kick_self(void)
 +{
 +assert(cpu_single_env);
 +
 +if (!cpu_single_env->thread_kicked) {
 +qemu_thread_signal(cpu_single_env->thread, SIG_IPI);
 +cpu_single_env->thread_kicked = true;
 +}
 +}
 +
>>>
>>> There is no need to use cpu_single_env, can pass CPUState instead.
>>>
>>
>> It's done intentionally this way: function shall not be used for a
>> remote env.
>>
>> Jan
> 
> Can assert on qemu_cpu_self(env) for that.
> 

We already assert on cpu_single_env which is the right condition.
Removing env from the parameter list avoids that someone even thinks
about misusing it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Marcelo Tosatti

On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
> On 2011-02-01 13:47, Marcelo Tosatti wrote:
> > On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
> >> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
> >> checking for exit_request on vcpu entry and timer signals arriving
> >> before KVM starts to catch them. Plug it by blocking both timer related
> >> signals also on !CONFIG_IOTHREAD and process those via signalfd.
> >>
> >> Signed-off-by: Jan Kiszka 
> >> CC: Stefan Hajnoczi 
> >> ---
> >>  cpus.c |   18 ++
> >>  1 files changed, 18 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/cpus.c b/cpus.c
> >> index fc3f222..29b1070 100644
> >> --- a/cpus.c
> >> +++ b/cpus.c
> >> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
> >>  pthread_sigmask(SIG_BLOCK, NULL, &set);
> >>  sigdelset(&set, SIG_IPI);
> >>  sigdelset(&set, SIGBUS);
> >> +#ifndef CONFIG_IOTHREAD
> >> +sigdelset(&set, SIGIO);
> >> +sigdelset(&set, SIGALRM);
> >> +#endif
> > 
> > I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> > section.
> 
> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?

Yes, so to avoid #ifdefs spread.

> >> +
> >> +#ifndef CONFIG_IOTHREAD
> >> +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
> >> +qemu_notify_event();
> >> +}
> >> +#endif
> > 
> > Why is this necessary?
> > 
> > You should break out of cpu_exec_all if there's a pending alarm (see
> > qemu_alarm_pending()).
> 
> qemu_alarm_pending() is not true until the signal is actually taken. The
> alarm handler sets the required flags.

Right. What i mean is you need to execute the signal handler inside
cpu_exec_all loop (so that alarm pending is set).

So, if there is a SIGALRM pending, qemu_run_timers has highest
priority, not vcpu execution.

> >>  
> >>  #else /* _WIN32 */
> >> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
> >>  int ret;
> >>  
> >>  sigemptyset(&blocked_signals);
> >> +if (kvm_enabled()) {
> >> +/*
> >> + * We need to process timer signals synchronously to avoid a race
> >> + * between exit_request check and KVM vcpu entry.
> >> + */
> >> +sigaddset(&blocked_signals, SIGIO);
> >> +sigaddset(&blocked_signals, SIGALRM);
> >> +}
> > 
> > A block_io_signals() function for !IOTHREAD would be nicer.
> 
> Well, we aren't blocking all I/O signals, so I decided against causing
> confusion to people that try to compare the result against real
> block_io_signals. If you mean just pushing those lines that set up
> blocked_signals into a separate function, then I need to find a good
> name for it.

Yes, separate function, similar to CONFIG_IOTHREAD case (feel free to
rename function).

[Qemu-devel] Re: [PATCH 2/8] ahci: split ICH and AHCI even more

2011-02-01 Thread Alexander Graf


On 18.01.2011, at 13:19, Kevin Wolf wrote:

> Am 20.12.2010 22:13, schrieb Alexander Graf:
>> Sebastian's patch already did a pretty good job at splitting up ICH-9
>> AHCI code and the AHCI core. We need some more though. Copyright was missing,
>> the lspci dump belongs to ICH-9, we don't need the AHCI core to have its
>> own qdev device duplicate.
>> 
>> So let's split them a bit more in this patch, making things easier to
>> read an understand.
>> 
>> Signed-off-by: Alexander Graf 
> 
> The license header is still missing in ahci.h.

Yes, because header files don't get license headers. Check most other headers 
in qemu :). I can add one if it makes you happy, but numbers are generally not 
licensable.


Alex

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 10:58 AM,   wrote:
> From: Jes Sorensen 
>
> Implement freeze/thaw support in the guest, allowing the host to
> request the guest freezes all it's file systems before a live snapshot
> is performed.
>  - fsfreeze(): Walk the list of mounted local real file systems,
>               and freeze them.
>  - fsthaw():   Walk the list of previously frozen file systems and
>               thaw them.
>  - fsstatus(): Return the current status of freeze/thaw. The host must
>               poll this function, in case fsfreeze() returned with a
>               timeout, to wait for the operation to finish.

It is desirable to minimize the freeze time, which may interrupt or
degrade the service that applications inside the VM can provide.
Polling means we have to choose a fixed value (500 ms?) at which to
check for freeze completion.  In this example we could have up to 500
ms extra time spent in freeze because it completed right after we
polled.  Any thoughts on this?

In terms of the fsfreeze(), fsthaw(), fsstatus() API, are you looking
at Windows Volume Shadow Copy Services and does this API fit that
model (I haven't looked at it in detail yet)?
http://msdn.microsoft.com/en-us/library/bb968832(v=vs.85).aspx

> Signed-off-by: Jes Sorensen 
> ---
>  virtagent-common.h |    8 ++
>  virtagent-server.c |  196 
> 
>  2 files changed, 204 insertions(+), 0 deletions(-)
>
> diff --git a/virtagent-common.h b/virtagent-common.h
> index 5d8f5c1..220a4b6 100644
> --- a/virtagent-common.h
> +++ b/virtagent-common.h
> @@ -61,6 +61,14 @@ typedef struct VAContext {
>     const char *channel_path;
>  } VAContext;
>
> +enum vs_fsfreeze_status {
> +    FREEZE_ERROR = -1,
> +    FREEZE_THAWED = 0,
> +    FREEZE_INPROGRESS = 1,
> +    FREEZE_FROZEN = 2,
> +    FREEZE_THAWINPROGRESS = 3,
> +};
> +
>  enum va_job_status {
>     VA_JOB_STATUS_PENDING = 0,
>     VA_JOB_STATUS_OK,
> diff --git a/virtagent-server.c b/virtagent-server.c
> index 7bb35b2..cf2a3f0 100644
> --- a/virtagent-server.c
> +++ b/virtagent-server.c
> @@ -14,6 +14,13 @@
>  #include 
>  #include "qemu_socket.h"
>  #include "virtagent-common.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
>
>  static VAServerData *va_server_data;
>  static bool va_enable_syslog = false; /* enable syslog'ing of RPCs */
> @@ -217,6 +224,189 @@ static xmlrpc_value *va_hello(xmlrpc_env *env,
>     return result;
>  }
>
> +
> +/*
> + * Walk the mount table and build a list of local file systems
> + */
> +
> +struct direntry {
> +    char *dirname;
> +    char *devtype;
> +    struct direntry *next;
> +};
> +
> +static struct direntry *mount_list;
> +static int fsfreeze_status;
> +
> +static int build_mount_list(void)
> +{
> +    struct mntent *mnt;
> +    struct direntry *entry;
> +    struct direntry *next;
> +    char const *mtab = MOUNTED;
> +    FILE *fp;
> +
> +    fp = setmntent(mtab, "r");
> +    if (!fp) {
> +       fprintf(stderr, "unable to read mtab\n");
> +       goto fail;
> +    }
> +
> +    while ((mnt = getmntent(fp))) {
> +       /*
> +        * An entry which device name doesn't start with a '/' is
> +        * either a dummy file system or a network file system.
> +        * Add special handling for smbfs and cifs as is done by
> +        * coreutils as well.
> +        */
> +       if ((mnt->mnt_fsname[0] != '/') ||
> +           (strcmp(mnt->mnt_type, "smbfs") == 0) ||
> +           (strcmp(mnt->mnt_type, "cifs") == 0)) {
> +           continue;
> +       }
> +
> +       entry = qemu_malloc(sizeof(struct direntry));
> +       if (!entry) {
> +           goto fail;
> +       }

qemu_malloc() never fails.

> +       entry->dirname = qemu_strdup(mnt->mnt_dir);
> +       entry->devtype = qemu_strdup(mnt->mnt_type);
> +       entry->next = mount_list;
> +
> +       mount_list = entry;
> +    }
> +
> +    endmntent(fp);
> +
> +    return 0;
> +
> +fail:
> +    while(mount_list) {
> +       next = mount_list->next;
> +       qemu_free(mount_list->dirname);
> +       qemu_free(mount_list->devtype);
> +       qemu_free(mount_list);
> +       mount_list = next;
> +    }
> +
> +    return -1;
> +}
> +
> +/*
> + * va_fsfreeze(): Walk list of mounted file systems in the guest, and
> + *   freeze the ones which are real local file systems.
> + * rpc return values: Number of file systems frozen, -1 on error.
> + */
> +static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
> +                                 xmlrpc_value *params,
> +                                 void *user_data)
> +{
> +    xmlrpc_int32 ret = 0, i = 0;
> +    xmlrpc_value *result;
> +    struct direntry *entry;
> +    int fd;
> +    SLOG("va_fsfreeze()");
> +
> +    if (fsfreeze_status == FREEZE_FROZEN) {
> +        ret = 0;
> +        goto out;
> +    }

The only valid status here is FREEZE_THAWED?  Perhaps we should test
for that specifically.

> +
> +    ret = build_mount_list();
> +    if (ret < 0) {
> +

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Marcelo Tosatti

On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
> On 2011-02-01 14:48, Marcelo Tosatti wrote:
> > On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
> >> On 2011-02-01 13:47, Marcelo Tosatti wrote:
> >>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>  Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>  checking for exit_request on vcpu entry and timer signals arriving
>  before KVM starts to catch them. Plug it by blocking both timer related
>  signals also on !CONFIG_IOTHREAD and process those via signalfd.
> 
>  Signed-off-by: Jan Kiszka 
>  CC: Stefan Hajnoczi 
>  ---
>   cpus.c |   18 ++
>   1 files changed, 18 insertions(+), 0 deletions(-)
> 
>  diff --git a/cpus.c b/cpus.c
>  index fc3f222..29b1070 100644
>  --- a/cpus.c
>  +++ b/cpus.c
>  @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>   pthread_sigmask(SIG_BLOCK, NULL, &set);
>   sigdelset(&set, SIG_IPI);
>   sigdelset(&set, SIGBUS);
>  +#ifndef CONFIG_IOTHREAD
>  +sigdelset(&set, SIGIO);
>  +sigdelset(&set, SIGALRM);
>  +#endif
> >>>
> >>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> >>> section.
> >>
> >> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
> > 
> > Yes, so to avoid #ifdefs spread.
> 
> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
> delta though.
> 
> > 
>  +
>  +#ifndef CONFIG_IOTHREAD
>  +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>  +qemu_notify_event();
>  +}
>  +#endif
> >>>
> >>> Why is this necessary?
> >>>
> >>> You should break out of cpu_exec_all if there's a pending alarm (see
> >>> qemu_alarm_pending()).
> >>
> >> qemu_alarm_pending() is not true until the signal is actually taken. The
> >> alarm handler sets the required flags.
> > 
> > Right. What i mean is you need to execute the signal handler inside
> > cpu_exec_all loop (so that alarm pending is set).
> > 
> > So, if there is a SIGALRM pending, qemu_run_timers has highest
> > priority, not vcpu execution.
> 
> We leave the vcpu loop (thanks to notify_event), process the signal in
> the event loop and run the timer handler. This pattern is IMO less
> invasive to the existing code, specifically as it is about to die
> long-term anyway.

You'll probably see poor timer behaviour on smp guests without iothread
enabled.

Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 10:58 AM,   wrote:
> From: Jes Sorensen 
> This is a first attempt to add fsfreeze support to virtagent. The idea
> is for the guest agent to walk the list of locally mounted file
> systems in the guest, and issuing an ioctl to freeze them. The host
> can then do a live snapshot of the guest, obtaining stable file
> systems. After the snapshot, the host then calls the thaw function in
> virtagent, which goes through the list of previously frozen file
> systems and unfreezes them.

Any plans for a call-out to pre/post freeze and thaw scripts so that
applications can flush cached data to disk and brace themselves for
freeze?

Stefan

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 15:10, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
 On 2011-02-01 13:47, Marcelo Tosatti wrote:
> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>> checking for exit_request on vcpu entry and timer signals arriving
>> before KVM starts to catch them. Plug it by blocking both timer related
>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>
>> Signed-off-by: Jan Kiszka 
>> CC: Stefan Hajnoczi 
>> ---
>>  cpus.c |   18 ++
>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index fc3f222..29b1070 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>  pthread_sigmask(SIG_BLOCK, NULL, &set);
>>  sigdelset(&set, SIG_IPI);
>>  sigdelset(&set, SIGBUS);
>> +#ifndef CONFIG_IOTHREAD
>> +sigdelset(&set, SIGIO);
>> +sigdelset(&set, SIGALRM);
>> +#endif
>
> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> section.

 You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>
>>> Yes, so to avoid #ifdefs spread.
>>
>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>> delta though.
>>
>>>
>> +
>> +#ifndef CONFIG_IOTHREAD
>> +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>> +qemu_notify_event();
>> +}
>> +#endif
>
> Why is this necessary?
>
> You should break out of cpu_exec_all if there's a pending alarm (see
> qemu_alarm_pending()).

 qemu_alarm_pending() is not true until the signal is actually taken. The
 alarm handler sets the required flags.
>>>
>>> Right. What i mean is you need to execute the signal handler inside
>>> cpu_exec_all loop (so that alarm pending is set).
>>>
>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>> priority, not vcpu execution.
>>
>> We leave the vcpu loop (thanks to notify_event), process the signal in
>> the event loop and run the timer handler. This pattern is IMO less
>> invasive to the existing code, specifically as it is about to die
>> long-term anyway.
> 
> You'll probably see poor timer behaviour on smp guests without iothread
> enabled.
> 

Still checking, but that would mean the notification mechanism is broken
anyway: If IO events do not force us to process them quickly, we already
suffer from latencies in SMP mode.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [V4 PATCH 5/8] Create support in chroot environment

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 5:27 AM, M. Mohan Kumar  wrote:
> +    if (setfsuid(request->data.uid) < 0) {
> +        fd_info->fi_error = errno;
> +        return;
> +    }
> +    if (setfsgid(request->data.gid) < 0) {
> +        fd_info->fi_error = errno;
> +        goto unset_uid;
> +    }

fsuid is Linux-specific.  Just something to keep in mind if you wanted
this code to be portable (I think the rest *is* portable).

Stefan

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Jes Sorensen

On 02/01/11 15:12, Stefan Hajnoczi wrote:
> On Tue, Feb 1, 2011 at 10:58 AM,   wrote:
>> From: Jes Sorensen 
>>
>> Implement freeze/thaw support in the guest, allowing the host to
>> request the guest freezes all it's file systems before a live snapshot
>> is performed.
>>  - fsfreeze(): Walk the list of mounted local real file systems,
>>   and freeze them.
>>  - fsthaw():   Walk the list of previously frozen file systems and
>>   thaw them.
>>  - fsstatus(): Return the current status of freeze/thaw. The host must
>>   poll this function, in case fsfreeze() returned with a
>>   timeout, to wait for the operation to finish.
> 
> It is desirable to minimize the freeze time, which may interrupt or
> degrade the service that applications inside the VM can provide.
> Polling means we have to choose a fixed value (500 ms?) at which to
> check for freeze completion.  In this example we could have up to 500
> ms extra time spent in freeze because it completed right after we
> polled.  Any thoughts on this?

I have to admit you lost me here, where do you get that 500ms time from?
Is that the XMLRPC polling time or? I just used the example code from
other agent calls.

> In terms of the fsfreeze(), fsthaw(), fsstatus() API, are you looking
> at Windows Volume Shadow Copy Services and does this API fit that
> model (I haven't looked at it in detail yet)?
> http://msdn.microsoft.com/en-us/library/bb968832(v=vs.85).aspx

I haven't looked at it, I designed the calls based on how they fit with
the Linux ioctls.

>> +   entry = qemu_malloc(sizeof(struct direntry));
>> +   if (!entry) {
>> +   goto fail;
>> +   }
> 
> qemu_malloc() never fails.

Good point, we have ugly malloc in qemu :( I wrote the code to handle
this outside QEMU first, to make sure it worked correctly and trying to
see how many times I could crash my laptop in the process. I'll fix it.

>> +static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
>> + xmlrpc_value *params,
>> + void *user_data)
>> +{
>> +xmlrpc_int32 ret = 0, i = 0;
>> +xmlrpc_value *result;
>> +struct direntry *entry;
>> +int fd;
>> +SLOG("va_fsfreeze()");
>> +
>> +if (fsfreeze_status == FREEZE_FROZEN) {
>> +ret = 0;
>> +goto out;
>> +}
> 
> The only valid status here is FREEZE_THAWED?  Perhaps we should test
> for that specifically.

Good point, I'll fix this.

>> +
>> +ret = build_mount_list();
>> +if (ret < 0) {
>> +goto out;
>> +}
>> +
>> +fsfreeze_status = FREEZE_INPROGRESS;
>> +
>> +entry = mount_list;
>> +while(entry) {
>> +fd = qemu_open(entry->dirname, O_RDONLY);
>> +if (fd == -1) {
>> +ret = errno;
>> +goto error;
>> +}
>> +ret = ioctl(fd, FIFREEZE);
> 
> If you close(fd) here then it won't leak or need extra code in the error path.

Good point, will fix.

>> +static xmlrpc_value *va_fsthaw(xmlrpc_env *env,
>> +   xmlrpc_value *params,
>> +   void *user_data)
>> +{
>> +xmlrpc_int32 ret;
>> +xmlrpc_value *result;
>> +struct direntry *entry;
>> +int fd, i = 0;
>> +SLOG("va_fsthaw()");
>> +
>> +if (fsfreeze_status == FREEZE_THAWED) {
>> +ret = 0;
>> +goto out;
>> +}
> 
> A stricter check would be status FREEZE_FROZEN.

Yep, will fix

>> +
>> +while((entry = mount_list)) {
>> +fd = qemu_open(entry->dirname, O_RDONLY);
>> +if (fd == -1) {
>> +ret = -1;
>> +goto out;
>> +}
>> +ret = ioctl(fd, FITHAW);
> 
> Same thing about close(fd) here.

Thanks for the review, all valid points!

Cheers,
Jes

Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Jes Sorensen

On 02/01/11 15:16, Stefan Hajnoczi wrote:
> On Tue, Feb 1, 2011 at 10:58 AM,   wrote:
>> From: Jes Sorensen 
>> This is a first attempt to add fsfreeze support to virtagent. The idea
>> is for the guest agent to walk the list of locally mounted file
>> systems in the guest, and issuing an ioctl to freeze them. The host
>> can then do a live snapshot of the guest, obtaining stable file
>> systems. After the snapshot, the host then calls the thaw function in
>> virtagent, which goes through the list of previously frozen file
>> systems and unfreezes them.
> 
> Any plans for a call-out to pre/post freeze and thaw scripts so that
> applications can flush cached data to disk and brace themselves for
> freeze?

Michael and I were discussing this earlier, we need to add it somehow.
It could be done as call-outs from the freeze call, or from separate
agent calls.

Cheers,
Jes

Re: [Qemu-devel] [V4 PATCH 6/8] Support for creating special files

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 5:27 AM, M. Mohan Kumar  wrote:
> +static int passthrough_mknod(FsContext *fs_ctx, const char *path, FsCred 
> *credp)
> +{
> +    V9fsFileObjectRequest request;
> +    int retval, error = 0;
> +
> +    fill_request(&request, path, credp);
> +    request.data.type = T_MKNOD;
> +
> +    retval = v9fs_create_special(fs_ctx, &request, &error);
> +    if (retval < 0) {
> +        errno = error;
> +    }
> +    return retval;
> +}

It would be nice to avoid duplicating all these wrappers.  Would it be
possible to write one generic function which takes the request type
and some optional arguments and runs the request?

Stefan

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 2:26 PM, Jes Sorensen  wrote:
> On 02/01/11 15:12, Stefan Hajnoczi wrote:
>> On Tue, Feb 1, 2011 at 10:58 AM,   wrote:
>>> From: Jes Sorensen 
>>>
>>> Implement freeze/thaw support in the guest, allowing the host to
>>> request the guest freezes all it's file systems before a live snapshot
>>> is performed.
>>>  - fsfreeze(): Walk the list of mounted local real file systems,
>>>               and freeze them.
>>>  - fsthaw():   Walk the list of previously frozen file systems and
>>>               thaw them.
>>>  - fsstatus(): Return the current status of freeze/thaw. The host must
>>>               poll this function, in case fsfreeze() returned with a
>>>               timeout, to wait for the operation to finish.
>>
>> It is desirable to minimize the freeze time, which may interrupt or
>> degrade the service that applications inside the VM can provide.
>> Polling means we have to choose a fixed value (500 ms?) at which to
>> check for freeze completion.  In this example we could have up to 500
>> ms extra time spent in freeze because it completed right after we
>> polled.  Any thoughts on this?
>
> I have to admit you lost me here, where do you get that 500ms time from?
> Is that the XMLRPC polling time or? I just used the example code from
> other agent calls.

500 ms is made up.  I was thinking, "what would a reasonable polling
interval be?" and picked a sub-second number.

Can you explain how the timeout in fsfreeze can happen?  It's probably
because I don't know the virtagent details.

Stefan

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Jes Sorensen

On 02/01/11 15:34, Stefan Hajnoczi wrote:
> On Tue, Feb 1, 2011 at 2:26 PM, Jes Sorensen  wrote:
>> I have to admit you lost me here, where do you get that 500ms time from?
>> Is that the XMLRPC polling time or? I just used the example code from
>> other agent calls.
> 
> 500 ms is made up.  I was thinking, "what would a reasonable polling
> interval be?" and picked a sub-second number.
> 
> Can you explain how the timeout in fsfreeze can happen?  It's probably
> because I don't know the virtagent details.

Ah ok.

>From what I understand, the XMLRPC code is setup to timeout if the guest
doesn't reply within a certain amount of time. In that case, the caller
needs to poll to wait for the guest to complete the freeze. This really
should only happen if you have a guest with a large number of very large
file systems. I don't know how likely it is to happen in real life.

Cheers,
Jes

Re: [Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Alexander Graf


On 01.02.2011, at 13:56, Luiz Capitulino wrote:

> On Tue, 1 Feb 2011 10:53:21 -0200
> Luiz Capitulino  wrote:
> 
>> On Mon, 31 Jan 2011 15:39:22 -0600
>> Anthony Liguori  wrote:
>> 
>>> On 01/31/2011 12:10 PM, Jan Kiszka wrote:
 On 2011-01-31 11:02, Juan Quintela wrote:
 
> Please send in any agenda items you are interested incovering.
> 
> 
  o KVM upstream merge: status, plans, coordination
 
>>> 
>>>  o QMP support status for 0.14.  Luiz and I already chatted about it 
>>> today but would be good to discuss in the call just to see if anyone has 
>>> opinions.  Basically, declare it fully supported with a few minor 
>>> caveats (like human-monitor-passthrough is no more supported than the 
>>> actual monitor and recommendations about how to deal with devices in the 
>>> device tree).
>> 
>> o Summer of code 2011
> 
> Forgot to mention the wiki page:
> 
>   http://wiki.qemu.org/Google_Summer_of_Code_2011

o SeaBIOS update for 0.14 - I'd like to see an AHCI boot capable version there


Alex

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 15:21, Jan Kiszka wrote:
> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
 On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>> checking for exit_request on vcpu entry and timer signals arriving
>>> before KVM starts to catch them. Plug it by blocking both timer related
>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>
>>> Signed-off-by: Jan Kiszka 
>>> CC: Stefan Hajnoczi 
>>> ---
>>>  cpus.c |   18 ++
>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index fc3f222..29b1070 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState 
>>> *env)
>>>  pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>  sigdelset(&set, SIG_IPI);
>>>  sigdelset(&set, SIGBUS);
>>> +#ifndef CONFIG_IOTHREAD
>>> +sigdelset(&set, SIGIO);
>>> +sigdelset(&set, SIGALRM);
>>> +#endif
>>
>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>> section.
>
> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?

 Yes, so to avoid #ifdefs spread.
>>>
>>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>>> delta though.
>>>

>>> +
>>> +#ifndef CONFIG_IOTHREAD
>>> +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>> +qemu_notify_event();
>>> +}
>>> +#endif
>>
>> Why is this necessary?
>>
>> You should break out of cpu_exec_all if there's a pending alarm (see
>> qemu_alarm_pending()).
>
> qemu_alarm_pending() is not true until the signal is actually taken. The
> alarm handler sets the required flags.

 Right. What i mean is you need to execute the signal handler inside
 cpu_exec_all loop (so that alarm pending is set).

 So, if there is a SIGALRM pending, qemu_run_timers has highest
 priority, not vcpu execution.
>>>
>>> We leave the vcpu loop (thanks to notify_event), process the signal in
>>> the event loop and run the timer handler. This pattern is IMO less
>>> invasive to the existing code, specifically as it is about to die
>>> long-term anyway.
>>
>> You'll probably see poor timer behaviour on smp guests without iothread
>> enabled.
>>
> 
> Still checking, but that would mean the notification mechanism is broken
> anyway: If IO events do not force us to process them quickly, we already
> suffer from latencies in SMP mode.

Maybe a regression caused by the iothread introduction: we need to break
out of the cpu loop via global exit_request when there is an IO event
pending. Fixing this will also heal the above issue.

Sigh, we need to get rid of those two implementations and focus all
reviewing and testing on one. I bet there are still more corner cases
sleeping somewhere.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Stefan Hajnoczi

On Tue, Feb 1, 2011 at 2:36 PM, Jes Sorensen  wrote:
> On 02/01/11 15:34, Stefan Hajnoczi wrote:
>> On Tue, Feb 1, 2011 at 2:26 PM, Jes Sorensen  wrote:
>>> I have to admit you lost me here, where do you get that 500ms time from?
>>> Is that the XMLRPC polling time or? I just used the example code from
>>> other agent calls.
>>
>> 500 ms is made up.  I was thinking, "what would a reasonable polling
>> interval be?" and picked a sub-second number.
>>
>> Can you explain how the timeout in fsfreeze can happen?  It's probably
>> because I don't know the virtagent details.
>
> Ah ok.
>
> From what I understand, the XMLRPC code is setup to timeout if the guest
> doesn't reply within a certain amount of time. In that case, the caller
> needs to poll to wait for the guest to complete the freeze. This really
> should only happen if you have a guest with a large number of very large
> file systems. I don't know how likely it is to happen in real life.

Perhaps Michael can confirm that the freeze function continues to
execute after timeout but the client is able to send fsstatus()
requests?

Stefan

[Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

2011-02-01 Thread Jan Kiszka

On 2011-02-01 15:37, Jan Kiszka wrote:
> On 2011-02-01 15:21, Jan Kiszka wrote:
>> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
 On 2011-02-01 14:48, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
 Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
 checking for exit_request on vcpu entry and timer signals arriving
 before KVM starts to catch them. Plug it by blocking both timer related
 signals also on !CONFIG_IOTHREAD and process those via signalfd.

 Signed-off-by: Jan Kiszka 
 CC: Stefan Hajnoczi 
 ---
  cpus.c |   18 ++
  1 files changed, 18 insertions(+), 0 deletions(-)

 diff --git a/cpus.c b/cpus.c
 index fc3f222..29b1070 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState 
 *env)
  pthread_sigmask(SIG_BLOCK, NULL, &set);
  sigdelset(&set, SIG_IPI);
  sigdelset(&set, SIGBUS);
 +#ifndef CONFIG_IOTHREAD
 +sigdelset(&set, SIGIO);
 +sigdelset(&set, SIGALRM);
 +#endif
>>>
>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>> section.
>>
>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>
> Yes, so to avoid #ifdefs spread.

 Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
 delta though.

>
 +
 +#ifndef CONFIG_IOTHREAD
 +if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) 
 {
 +qemu_notify_event();
 +}
 +#endif
>>>
>>> Why is this necessary?
>>>
>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>> qemu_alarm_pending()).
>>
>> qemu_alarm_pending() is not true until the signal is actually taken. The
>> alarm handler sets the required flags.
>
> Right. What i mean is you need to execute the signal handler inside
> cpu_exec_all loop (so that alarm pending is set).
>
> So, if there is a SIGALRM pending, qemu_run_timers has highest
> priority, not vcpu execution.

 We leave the vcpu loop (thanks to notify_event), process the signal in
 the event loop and run the timer handler. This pattern is IMO less
 invasive to the existing code, specifically as it is about to die
 long-term anyway.
>>>
>>> You'll probably see poor timer behaviour on smp guests without iothread
>>> enabled.
>>>
>>
>> Still checking, but that would mean the notification mechanism is broken
>> anyway: If IO events do not force us to process them quickly, we already
>> suffer from latencies in SMP mode.
> 
> Maybe a regression caused by the iothread introduction:

I take it back, the issue is actually much older.

> we need to break
> out of the cpu loop via global exit_request when there is an IO event
> pending. Fixing this will also heal the above issue.
> 
> Sigh, we need to get rid of those two implementations and focus all
> reviewing and testing on one. I bet there are still more corner cases
> sleeping somewhere.
> 
> Jan
> 

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Adam Litke

On Tue, 2011-02-01 at 11:58 +0100, jes.soren...@redhat.com wrote:
> +/*
> + * va_fsfreeze(): Walk list of mounted file systems in the guest, and
> + *   freeze the ones which are real local file systems.
> + * rpc return values: Number of file systems frozen, -1 on error.
> + */
> +static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
> + xmlrpc_value *params,
> + void *user_data)
> +{
> +xmlrpc_int32 ret = 0, i = 0;
> +xmlrpc_value *result;
> +struct direntry *entry;
> +int fd;
> +SLOG("va_fsfreeze()");
> +
> +if (fsfreeze_status == FREEZE_FROZEN) {
> +ret = 0;
> +goto out;
> +}
> +
> +ret = build_mount_list();
> +if (ret < 0) {
> +goto out;
> +}
> +
> +fsfreeze_status = FREEZE_INPROGRESS;
> +
> +entry = mount_list;
> +while(entry) {
> +fd = qemu_open(entry->dirname, O_RDONLY);
> +if (fd == -1) {
> +ret = errno;
> +goto error;
> +}
> +ret = ioctl(fd, FIFREEZE);
> +if (ret < 0 && ret != EOPNOTSUPP) {
> +goto error;
> +}

Here we silently ignore filesystems that do not support the FIFREEZE
ioctl.  Do we need to have a more complex return value so that we can
communicate which mount points could not be frozen?  Otherwise, an
unsuspecting host could retrieve a corrupted snapshot of that
filesystem, right?

> +
> +close(fd);
> +entry = entry->next;
> +i++;
> +}
> +
> +fsfreeze_status = FREEZE_FROZEN;
> +ret = i;
> +out:
> +result = xmlrpc_build_value(env, "i", ret);
> +return result;
> +error:
> +if (i > 0) {
> +fsfreeze_status = FREEZE_ERROR;
> +}
> +goto out;
> +}

-- 
Thanks,
Adam

[Qemu-devel] [PATCH 0/7] Some more AHCI work v2

2011-02-01 Thread Alexander Graf

Clearly, AHCI as is is not perfect yet (intentionally, release early,
release often, remember?). This patch set makes it work with SeaBIOS
so booting Windows 7 works flawlessly for me. it also adds some speedups
and fixes a level based interrupts, rendering ahci useful on PPC targets.

In preparation of potential non-ich9 implementations, this set also
splits ahci code from concrete ich9 specific code. That way we can
later on create other AHCI adapters while reusing a lot of code.

Git tree (including BIOS patch to enable booting from AHCI):

 git://repo.or.cz/qemu/ahci.git ahci


v1 -> v2:

 - drop dma helper removal
 - drop "free dynamically allocated iovs" patch
 - add "add license header in ahci.h"
 - rephrase interrupt bugfix
 - add comment on d2h delay hack


Alexander Graf (6):
  ahci: add license header in ahci.h
  ahci: split ICH and AHCI even more
  ahci: send init d2h fis on fis enable
  ahci: Implement HBA reset
  ahci: make number of ports runtime determined
  ahci: work around bug with level interrupts

Sebastian Herbszt (1):
  ahci: split ICH9 from core

 Makefile.objs |1 +
 hw/ide/ahci.c |  488 +++--
 hw/ide/ahci.h |  333 +++
 hw/ide/ich.c  |  148 +
 4 files changed, 540 insertions(+), 430 deletions(-)
 create mode 100644 hw/ide/ahci.h
 create mode 100644 hw/ide/ich.c

[Qemu-devel] [PATCH 4/7] ahci: send init d2h fis on fis enable

2011-02-01 Thread Alexander Graf

The drive sends a d2h init fis on initialization. Usually, the guest doesn't
receive fises yet at that point though, so the delivery is deferred.

Let's reflect that by sending the init fis on fis receive enablement.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - add comment on d2h delay hack
---
 hw/ide/ahci.c |   38 +++---
 hw/ide/ahci.h |1 +
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 6822046..e6ac77c 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -47,6 +47,7 @@ static void check_cmd(AHCIState *s, int port);
 static int handle_cmd(AHCIState *s,int port,int slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis);
+static void ahci_init_d2h(AHCIDevice *ad);
 
 static uint32_t  ahci_port_read(AHCIState *s, int port, int offset)
 {
@@ -230,6 +231,16 @@ static void  ahci_port_write(AHCIState *s, int port, int 
offset, uint32_t val)
 pr->cmd |= PORT_CMD_FIS_ON;
 }
 
+/* XXX usually the FIS would be pending on the bus here and
+   issuing deferred until the OS enables FIS receival.
+   Instead, we only submit it once - which works in most
+   cases, but is a hack. */
+if ((pr->cmd & PORT_CMD_FIS_ON) &&
+!s->dev[port].init_d2h_sent) {
+ahci_init_d2h(&s->dev[port]);
+s->dev[port].init_d2h_sent = 1;
+}
+
 check_cmd(s, port);
 break;
 case PORT_TFDATA:
@@ -462,12 +473,29 @@ static void ahci_check_cmd_bh(void *opaque)
 check_cmd(ad->hba, ad->port_no);
 }
 
+static void ahci_init_d2h(AHCIDevice *ad)
+{
+uint8_t init_fis[0x20];
+IDEState *ide_state = &ad->port.ifs[0];
+
+memset(init_fis, 0, sizeof(init_fis));
+
+init_fis[4] = 1;
+init_fis[12] = 1;
+
+if (ide_state->drive_kind == IDE_CD) {
+init_fis[5] = ide_state->lcyl;
+init_fis[6] = ide_state->hcyl;
+}
+
+ahci_write_fis_d2h(ad, init_fis);
+}
+
 static void ahci_reset_port(AHCIState *s, int port)
 {
 AHCIDevice *d = &s->dev[port];
 AHCIPortRegs *pr = &d->port_regs;
 IDEState *ide_state = &d->port.ifs[0];
-uint8_t init_fis[0x20];
 int i;
 
 DPRINTF(port, "reset port\n");
@@ -482,6 +510,7 @@ static void ahci_reset_port(AHCIState *s, int port)
 pr->scr_err = 0;
 pr->scr_act = 0;
 d->busy_slot = -1;
+d->init_d2h_sent = 0;
 
 ide_state = &s->dev[port].port.ifs[0];
 if (!ide_state->bs) {
@@ -504,7 +533,6 @@ static void ahci_reset_port(AHCIState *s, int port)
 ncq_tfs->used = 0;
 }
 
-memset(init_fis, 0, sizeof(init_fis));
 s->dev[port].port_state = STATE_RUN;
 if (!ide_state->bs) {
 s->dev[port].port_regs.sig = 0;
@@ -514,8 +542,6 @@ static void ahci_reset_port(AHCIState *s, int port)
 ide_state->lcyl = 0x14;
 ide_state->hcyl = 0xeb;
 DPRINTF(port, "set lcyl = %d\n", ide_state->lcyl);
-init_fis[5] = ide_state->lcyl;
-init_fis[6] = ide_state->hcyl;
 ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
 } else {
 s->dev[port].port_regs.sig = SATA_SIGNATURE_DISK;
@@ -523,9 +549,7 @@ static void ahci_reset_port(AHCIState *s, int port)
 }
 
 ide_state->error = 1;
-init_fis[4] = 1;
-init_fis[12] = 1;
-ahci_write_fis_d2h(d, init_fis);
+ahci_init_d2h(d);
 }
 
 static void debug_print_fis(uint8_t *fis, int cmd_len)
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index d65b5e3..b2786d1 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -282,6 +282,7 @@ struct AHCIDevice {
 int dma_status;
 int done_atapi_packet;
 int busy_slot;
+int init_d2h_sent;
 BlockDriverCompletionFunc *dma_cb;
 AHCICmdHdr *cur_cmd;
 NCQTransferState ncq_tfs[AHCI_MAX_CMDS];
-- 
1.6.0.2

[Qemu-devel] [PATCH 5/7] ahci: Implement HBA reset

2011-02-01 Thread Alexander Graf

The ahci code was missing its soft reset functionality. This wasn't really an
issue for Linux guests, but Windows gets confused when the controller doesn't
reset when it tells it so.

Using this patch I can now successfully boot Windows 7 from AHCI using AHCI
enabled SeaBIOS.

Signed-off-by: Alexander Graf 
---
 hw/ide/ahci.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e6ac77c..105dd53 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -335,7 +335,7 @@ static void ahci_mem_writel(void *ptr, target_phys_addr_t 
addr, uint32_t val)
 case HOST_CTL: /* R/W */
 if (val & HOST_CTL_RESET) {
 DPRINTF(-1, "HBA Reset\n");
-/* FIXME reset? */
+ahci_reset(container_of(s, AHCIPCIState, ahci));
 } else {
 s->control_regs.ghc = (val & 0x3) | HOST_CTL_AHCI_EN;
 ahci_check_irq(s);
@@ -1134,6 +1134,9 @@ void ahci_reset(void *opaque)
 struct AHCIPCIState *d = opaque;
 int i;
 
+d->ahci.control_regs.irqstatus = 0;
+d->ahci.control_regs.ghc = 0;
+
 for (i = 0; i < SATA_PORTS; i++) {
 ahci_reset_port(&d->ahci, i);
 }
-- 
1.6.0.2

[Qemu-devel] [PATCH 6/7] ahci: make number of ports runtime determined

2011-02-01 Thread Alexander Graf

Different AHCI controllers have a different number of ports, so the core
shouldn't care about the amount of ports available.

This patch makes the number of ports available to the AHCI core runtime
configurable, allowing us to have multiple different AHCI implementations
with different amounts of ports.

Signed-off-by: Alexander Graf 
---
 hw/ide/ahci.c |   29 +++--
 hw/ide/ahci.h |   10 +-
 hw/ide/ich.c  |3 ++-
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 105dd53..98bdf70 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -145,7 +145,7 @@ static void ahci_check_irq(AHCIState *s)
 
 DPRINTF(-1, "check irq %#x\n", s->control_regs.irqstatus);
 
-for (i = 0; i < SATA_PORTS; i++) {
+for (i = 0; i < s->ports; i++) {
 AHCIPortRegs *pr = &s->dev[i].port_regs;
 if (pr->irq_stat & pr->irq_mask) {
 s->control_regs.irqstatus |= (1 << i);
@@ -303,7 +303,8 @@ static uint32_t ahci_mem_readl(void *ptr, 
target_phys_addr_t addr)
 
 DPRINTF(-1, "(addr 0x%08X), val 0x%08X\n", (unsigned) addr, val);
 } else if ((addr >= AHCI_PORT_REGS_START_ADDR) &&
-   (addr < AHCI_PORT_REGS_END_ADDR)) {
+   (addr < (AHCI_PORT_REGS_START_ADDR +
+(s->ports * AHCI_PORT_ADDR_OFFSET_LEN {
 val = ahci_port_read(s, (addr - AHCI_PORT_REGS_START_ADDR) >> 7,
  addr & AHCI_PORT_ADDR_OFFSET_MASK);
 }
@@ -355,7 +356,8 @@ static void ahci_mem_writel(void *ptr, target_phys_addr_t 
addr, uint32_t val)
 DPRINTF(-1, "write to unknown register 0x%x\n", 
(unsigned)addr);
 }
 } else if ((addr >= AHCI_PORT_REGS_START_ADDR) &&
-   (addr < AHCI_PORT_REGS_END_ADDR)) {
+   (addr < (AHCI_PORT_REGS_START_ADDR +
+(s->ports * AHCI_PORT_ADDR_OFFSET_LEN {
 ahci_port_write(s, (addr - AHCI_PORT_REGS_START_ADDR) >> 7,
 addr & AHCI_PORT_ADDR_OFFSET_MASK, val);
 }
@@ -378,16 +380,16 @@ static void ahci_reg_init(AHCIState *s)
 {
 int i;
 
-s->control_regs.cap = (SATA_PORTS - 1) |
+s->control_regs.cap = (s->ports - 1) |
   (AHCI_NUM_COMMAND_SLOTS << 8) |
   (AHCI_SUPPORTED_SPEED_GEN1 << AHCI_SUPPORTED_SPEED) |
   HOST_CAP_NCQ | HOST_CAP_AHCI;
 
-s->control_regs.impl = (1 << SATA_PORTS) - 1;
+s->control_regs.impl = (1 << s->ports) - 1;
 
 s->control_regs.version = AHCI_VERSION_1_0;
 
-for (i = 0; i < SATA_PORTS; i++) {
+for (i = 0; i < s->ports; i++) {
 s->dev[i].port_state = STATE_RUN;
 }
 }
@@ -1096,17 +1098,19 @@ static const IDEDMAOps ahci_dma_ops = {
 .reset = ahci_dma_reset,
 };
 
-void ahci_init(AHCIState *s, DeviceState *qdev)
+void ahci_init(AHCIState *s, DeviceState *qdev, int ports)
 {
 qemu_irq *irqs;
 int i;
 
+s->ports = ports;
+s->dev = qemu_mallocz(sizeof(AHCIDevice) * ports);
 ahci_reg_init(s);
 s->mem = cpu_register_io_memory(ahci_readfn, ahci_writefn, s,
 DEVICE_LITTLE_ENDIAN);
-irqs = qemu_allocate_irqs(ahci_irq_set, s, SATA_PORTS);
+irqs = qemu_allocate_irqs(ahci_irq_set, s, s->ports);
 
-for (i = 0; i < SATA_PORTS; i++) {
+for (i = 0; i < s->ports; i++) {
 AHCIDevice *ad = &s->dev[i];
 
 ide_bus_new(&ad->port, qdev, i);
@@ -1120,6 +1124,11 @@ void ahci_init(AHCIState *s, DeviceState *qdev)
 }
 }
 
+void ahci_uninit(AHCIState *s)
+{
+qemu_free(s->dev);
+}
+
 void ahci_pci_map(PCIDevice *pci_dev, int region_num,
 pcibus_t addr, pcibus_t size, int type)
 {
@@ -1137,7 +1146,7 @@ void ahci_reset(void *opaque)
 d->ahci.control_regs.irqstatus = 0;
 d->ahci.control_regs.ghc = 0;
 
-for (i = 0; i < SATA_PORTS; i++) {
+for (i = 0; i < d->ahci.ports; i++) {
 ahci_reset_port(&d->ahci, i);
 }
 }
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index b2786d1..a4560c4 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -188,11 +188,9 @@
 #define AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR 0x20
 /* Shouldn't this be 0x2c? */
 
-#define SATA_PORTS 4
-
 #define AHCI_PORT_REGS_START_ADDR  0x100
-#define AHCI_PORT_REGS_END_ADDR (AHCI_PORT_REGS_START_ADDR + SATA_PORTS * 0x80)
 #define AHCI_PORT_ADDR_OFFSET_MASK 0x7f
+#define AHCI_PORT_ADDR_OFFSET_LEN  0x80
 
 #define AHCI_NUM_COMMAND_SLOTS 31
 #define AHCI_SUPPORTED_SPEED   20
@@ -289,9 +287,10 @@ struct AHCIDevice {
 };
 
 typedef struct AHCIState {
-AHCIDevice dev[SATA_PORTS];
+AHCIDevice *dev;
 AHCIControlRegs control_regs;
 int mem;
+int ports;
 qemu_irq irq;
 } AHCIState;
 
@@ -323,7 +322,8 @@ typedef struct NCQFrame {
 uint8_t reserved10;
 } __attribute__ ((packed)) NCQFrame;
 
-void ahci_init(AHCIState *s, Devi

[Qemu-devel] [PATCH 3/7] ahci: split ICH and AHCI even more

2011-02-01 Thread Alexander Graf

Sebastian's patch already did a pretty good job at splitting up ICH-9
AHCI code and the AHCI core. We need some more though. Copyright was missing,
the lspci dump belongs to ICH-9, we don't need the AHCI core to have its
own qdev device duplicate.

So let's split them a bit more in this patch, making things easier to
read an understand.

Signed-off-by: Alexander Graf 
---
 hw/ide/ahci.c |  110 -
 hw/ide/ich.c  |   90 +-
 2 files changed, 88 insertions(+), 112 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 28412d0..6822046 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -19,47 +19,6 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see .
  *
- *
- * lspci dump of a ICH-9 real device in IDE mode (hopefully close enough):
- *
- * 00:1f.2 SATA controller [0106]: Intel Corporation 82801IR/IO/IH 
(ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02) (prog-if 01 
[AHCI 1.0])
- * Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
SATA AHCI Controller [8086:2922]
- * Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
- * Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
- * Capabilities: [b0] Vendor Specific Information 
- * Kernel driver in use: ahci
- * Kernel modules: ahci
- * 00: 86 80 22 29 07 04 b0 02 02 01 06 01 00 00 00 00
- * 10: 01 d0 00 00 01 cc 00 00 81 c8 00 00 01 c8 00 00
- * 20: 81 c4 00 00 00 90 bf fe 00 00 00 00 86 80 22 29
- * 30: 00 00 00 00 80 00 00 00 00 00 00 00 0f 02 00 00
- * 40: 00 80 00 80 00 00 00 00 00 00 00 00 00 00 00 00
- * 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- * 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- * 70: 01 a8 03 40 08 00 00 00 00 00 00 00 00 00 00 00
- * 80: 05 70 09 00 0c f0 e0 fe d9 41 00 00 00 00 00 00
- * 90: 40 00 0f 82 93 01 00 00 00 00 00 00 00 00 00 00
- * a0: ac 00 00 00 0a 00 12 00 12 b0 10 00 48 00 00 00
- * b0: 09 00 06 20 00 00 00 00 00 00 00 00 00 00 00 00
- * c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- * d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- * e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- * f0: 00 00 00 00 00 00 00 00 86 0f 02 00 00 00 00 00
- *
  */
 
 #include 
@@ -1155,72 +1114,3 @@ void ahci_reset(void *opaque)
 ahci_reset_port(&d->ahci, i);
 }
 }
-
-static int pci_ahci_init(PCIDevice *dev)
-{
-struct AHCIPCIState *d;
-d = DO_UPCAST(struct AHCIPCIState, card, dev);
-
-pci_config_set_vendor_id(d->card.config, PCI_VENDOR_ID_INTEL);
-pci_config_set_device_id(d->card.config, PCI_DEVICE_ID_INTEL_82801IR);
-
-pci_config_set_class(d->card.config, PCI_CLASS_STORAGE_SATA);
-pci_config_set_revision(d->card.config, 0x02);
-pci_config_set_prog_interface(d->card.config, AHCI_PROGMODE_MAJOR_REV_1);
-
-d->card.config[PCI_CACHE_LINE_SIZE] = 0x08;  /* Cache line size */
-d->card.config[PCI_LATENCY_TIMER]   = 0x00;  /* Latency timer */
-pci_config_set_interrupt_pin(d->card.config, 1);
-
-/* XXX Software should program this register */
-d->card.config[0x90]   = 1 << 6; /* Address Map Register - AHCI mode */
-
-qemu_register_reset(ahci_reset, d);
-
-/* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
-pci_register_bar(&d->card, 5, 0x1000, PCI_BASE_ADDRESS_SPACE_MEMORY,
- ahci_pci_map);
-
-msi_init(dev, 0x50, 1, true, false);
-
-ahci_init(&d->ahci, &dev->qdev);
-d->ahci.irq = d->card.irq[0];
-
-return 0;
-}
-
-static int pci_ahci_uninit(PCIDevice *dev)
-{
-struct AHCIPCIState *d;
-d = DO_UPCAST(struct AHCIPCIState, card, dev);
-
-if (msi_enabled(dev)) {
-msi_uninit(dev);
-}
-
-qemu_unregister_reset(ahci_reset, d);
-
-return 0;
-}
-
-static void pci_ahci_write_config(PCIDevice *pci, uint32_t addr,
-  uint32_t val, int len)
-{
-pci_default_write_config(pci, addr, val, len);
-msi_write_config(pci, addr, val, len);
-}
-
-static PCIDeviceInfo ahci_info = {
-.qdev.name  = "ahci",
-.qdev.size  = sizeof(AHCIPCIState),
-.init   = pci_ahci_init,
-.exit   = pci_ahci_uninit,
-.config_write = pci_ahci_write_config,
-};
-
-static void ahci_pci_register_devices(void)
-{
-pci_qdev_register(&ahci_info);
-}
-
-device_init(ahci_pci_register_devices)
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 9868b73..70cb766 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -1,3 +1,65 @@
+/*
+ * QEMU ICH Emulation
+ *
+ * Copyright (c) 2010 Sebastian Herbszt 
+ * Copyright (c) 2010 Alexander Graf 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Sof

[Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Alexander Graf

When using level based interrupts, the interrupt is treated the same as an
edge triggered one: leaving the line up does not retrigger the interrupt.

In fact, when not lowering the line, we won't ever get a new interrupt inside
the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
something on the device. This way we're sure the guest doesn't starve on
interrupts until someone fixes the actual interrupt path.

Signed-off-by: Alexander Graf 
---
 hw/ide/ahci.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 98bdf70..bce7fba 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -152,11 +152,10 @@ static void ahci_check_irq(AHCIState *s)
 }
 }
 
+ahci_irq_lower(s, NULL);
 if (s->control_regs.irqstatus &&
 (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
 ahci_irq_raise(s, NULL);
-} else {
-ahci_irq_lower(s, NULL);
 }
 }
 
-- 
1.6.0.2

[Qemu-devel] [PATCH 2/7] ahci: add license header in ahci.h

2011-02-01 Thread Alexander Graf

Due to popular request, this patch adds a license header to ahci.h

Signed-off-by: Alexander Graf 
---
 hw/ide/ahci.h |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 63ef785..d65b5e3 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -1,3 +1,26 @@
+/*
+ * QEMU AHCI Emulation
+ *
+ * Copyright (c) 2010 qiaoch...@loongson.cn
+ * Copyright (c) 2010 Roland Elek 
+ * Copyright (c) 2010 Sebastian Herbszt 
+ * Copyright (c) 2010 Alexander Graf 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
 #ifndef HW_IDE_AHCI_H
 #define HW_IDE_AHCI_H
 
-- 
1.6.0.2

[Qemu-devel] Re: [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Jes Sorensen

On 02/01/11 15:48, Adam Litke wrote:
> On Tue, 2011-02-01 at 11:58 +0100, jes.soren...@redhat.com wrote:
>> +/*
>> + * va_fsfreeze(): Walk list of mounted file systems in the guest, and
>> + *   freeze the ones which are real local file systems.
>> + * rpc return values: Number of file systems frozen, -1 on error.
>> + */
>> +static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
>> + xmlrpc_value *params,
>> + void *user_data)
>> +{
>> +xmlrpc_int32 ret = 0, i = 0;
>> +xmlrpc_value *result;
>> +struct direntry *entry;
>> +int fd;
>> +SLOG("va_fsfreeze()");
>> +
>> +if (fsfreeze_status == FREEZE_FROZEN) {
>> +ret = 0;
>> +goto out;
>> +}
>> +
>> +ret = build_mount_list();
>> +if (ret < 0) {
>> +goto out;
>> +}
>> +
>> +fsfreeze_status = FREEZE_INPROGRESS;
>> +
>> +entry = mount_list;
>> +while(entry) {
>> +fd = qemu_open(entry->dirname, O_RDONLY);
>> +if (fd == -1) {
>> +ret = errno;
>> +goto error;
>> +}
>> +ret = ioctl(fd, FIFREEZE);
>> +if (ret < 0 && ret != EOPNOTSUPP) {
>> +goto error;
>> +}
> 
> Here we silently ignore filesystems that do not support the FIFREEZE
> ioctl.  Do we need to have a more complex return value so that we can
> communicate which mount points could not be frozen?  Otherwise, an
> unsuspecting host could retrieve a corrupted snapshot of that
> filesystem, right?

That is correct, however most Linux file systems do support it, and for
the ones that don't, there really isn't anything we can do.

Cheers,
Jes

Re: [Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Anthony Liguori


On 02/01/2011 08:37 AM, Alexander Graf wrote:


o SeaBIOS update for 0.14 - I'd like to see an AHCI boot capable version there
   


I'll update to the latest release before I fork today.

Regards,

Anthony Liguori


Alex

Re: [Qemu-devel] Re: KVM call agenda for Feb 1

2011-02-01 Thread Alexander Graf

On 01.02.2011, at 16:05, Anthony Liguori wrote:

> On 02/01/2011 08:37 AM, Alexander Graf wrote:
>> 
>> o SeaBIOS update for 0.14 - I'd like to see an AHCI boot capable version 
>> there
>>   
> 
> I'll update to the latest release before I fork today.

Last time I checked, AHCI was disabled by default. Also, SeaBIOS should really 
have a stable fork for us to use :).

Alex

[Qemu-devel] [PATCH] Make spice dummy functions inline to fix calls not checking return values

2011-02-01 Thread Jes . Sorensen

From: Jes Sorensen 

qemu_spice_set_passwd() and qemu_spice_set_pw_expire() dummy functions
needs to be inline, in order to handle the case where they are called
without checking the return value.

Signed-off-by: Jes Sorensen 
---
 ui/qemu-spice.h |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
index 48239c3..920d501 100644
--- a/ui/qemu-spice.h
+++ b/ui/qemu-spice.h
@@ -42,8 +42,16 @@ void do_info_spice(Monitor *mon, QObject **ret_data);
 #else  /* CONFIG_SPICE */
 
 #define using_spice 0
-#define qemu_spice_set_passwd(_p, _f1, _f2) (-1)
-#define qemu_spice_set_pw_expire(_e) (-1)
+static inline int qemu_spice_set_passwd(const char *passwd,
+bool fail_if_connected,
+bool disconnect_if_connected)
+{
+return -1;
+}
+static inline int qemu_spice_set_pw_expire(time_t expires)
+{
+return -1;
+}
 
 #endif /* CONFIG_SPICE */
 
-- 
1.7.3.5

[Qemu-devel] [PATCH 1/7] ahci: split ICH9 from core

2011-02-01 Thread Alexander Graf

From: Sebastian Herbszt 

There are multiple ahci devices out there. The currently implemented ich-9
is only one of the many. So let's split that one out into a separate file
to stress the difference.

Signed-off-by: Sebastian Herbszt 
Signed-off-by: Alexander Graf 
---
 Makefile.objs |1 +
 hw/ide/ahci.c |  305 +---
 hw/ide/ahci.h |  309 +
 hw/ide/ich.c  |   61 +++
 4 files changed, 375 insertions(+), 301 deletions(-)
 create mode 100644 hw/ide/ahci.h
 create mode 100644 hw/ide/ich.c

diff --git a/Makefile.objs b/Makefile.objs
index 93406ff..b15df5e 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -244,6 +244,7 @@ hw-obj-$(CONFIG_IDE_CMD646) += ide/cmd646.o
 hw-obj-$(CONFIG_IDE_MACIO) += ide/macio.o
 hw-obj-$(CONFIG_IDE_VIA) += ide/via.o
 hw-obj-$(CONFIG_AHCI) += ide/ahci.o
+hw-obj-$(CONFIG_AHCI) += ide/ich.o
 
 # SCSI layer
 hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 671b4df..28412d0 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -72,6 +72,7 @@
 #include "cpu-common.h"
 #include "internal.h"
 #include 
+#include 
 
 /* #define DEBUG_AHCI */
 
@@ -83,304 +84,6 @@ do { fprintf(stderr, "ahci: %s: [%d] ", __FUNCTION__, 
port); \
 #define DPRINTF(port, fmt, ...) do {} while(0)
 #endif
 
-#define AHCI_PCI_BAR  5
-#define AHCI_MAX_PORTS32
-#define AHCI_MAX_SG   168 /* hardware max is 64K */
-#define AHCI_DMA_BOUNDARY 0x
-#define AHCI_USE_CLUSTERING   0
-#define AHCI_MAX_CMDS 32
-#define AHCI_CMD_SZ   32
-#define AHCI_CMD_SLOT_SZ  (AHCI_MAX_CMDS * AHCI_CMD_SZ)
-#define AHCI_RX_FIS_SZ256
-#define AHCI_CMD_TBL_CDB  0x40
-#define AHCI_CMD_TBL_HDR_SZ   0x80
-#define AHCI_CMD_TBL_SZ   (AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16))
-#define AHCI_CMD_TBL_AR_SZ(AHCI_CMD_TBL_SZ * AHCI_MAX_CMDS)
-#define AHCI_PORT_PRIV_DMA_SZ (AHCI_CMD_SLOT_SZ + AHCI_CMD_TBL_AR_SZ + \
-   AHCI_RX_FIS_SZ)
-
-#define AHCI_IRQ_ON_SG(1 << 31)
-#define AHCI_CMD_ATAPI(1 << 5)
-#define AHCI_CMD_WRITE(1 << 6)
-#define AHCI_CMD_PREFETCH (1 << 7)
-#define AHCI_CMD_RESET(1 << 8)
-#define AHCI_CMD_CLR_BUSY (1 << 10)
-
-#define RX_FIS_D2H_REG0x40 /* offset of D2H Register FIS data */
-#define RX_FIS_SDB0x58 /* offset of SDB FIS data */
-#define RX_FIS_UNK0x60 /* offset of Unknown FIS data */
-
-/* global controller registers */
-#define HOST_CAP  0x00 /* host capabilities */
-#define HOST_CTL  0x04 /* global host control */
-#define HOST_IRQ_STAT 0x08 /* interrupt status */
-#define HOST_PORTS_IMPL   0x0c /* bitmap of implemented ports */
-#define HOST_VERSION  0x10 /* AHCI spec. version compliancy */
-
-/* HOST_CTL bits */
-#define HOST_CTL_RESET(1 << 0)  /* reset controller; self-clear */
-#define HOST_CTL_IRQ_EN   (1 << 1)  /* global IRQ enable */
-#define HOST_CTL_AHCI_EN  (1 << 31) /* AHCI enabled */
-
-/* HOST_CAP bits */
-#define HOST_CAP_SSC  (1 << 14) /* Slumber capable */
-#define HOST_CAP_AHCI (1 << 18) /* AHCI only */
-#define HOST_CAP_CLO  (1 << 24) /* Command List Override support */
-#define HOST_CAP_SSS  (1 << 27) /* Staggered Spin-up */
-#define HOST_CAP_NCQ  (1 << 30) /* Native Command Queueing */
-#define HOST_CAP_64   (1 << 31) /* PCI DAC (64-bit DMA) support */
-
-/* registers for each SATA port */
-#define PORT_LST_ADDR 0x00 /* command list DMA addr */
-#define PORT_LST_ADDR_HI  0x04 /* command list DMA addr hi */
-#define PORT_FIS_ADDR 0x08 /* FIS rx buf addr */
-#define PORT_FIS_ADDR_HI  0x0c /* FIS rx buf addr hi */
-#define PORT_IRQ_STAT 0x10 /* interrupt status */
-#define PORT_IRQ_MASK 0x14 /* interrupt enable/disable mask */
-#define PORT_CMD  0x18 /* port command */
-#define PORT_TFDATA   0x20 /* taskfile data */
-#define PORT_SIG  0x24 /* device TF signature */
-#define PORT_SCR_STAT 0x28 /* SATA phy register: SStatus */
-#define PORT_SCR_CTL  0x2c /* SATA phy register: SControl */
-#define PORT_SCR_ERR  0x30 /* SATA phy register: SError */
-#define PORT_SCR_ACT  0x34 /* SATA phy register: SActive */
-#define PORT_CMD_ISSUE0x38 /* command issue */
-#define PORT_RESERVED 0x3c /* reserved */
-
-/* PORT_IRQ_{STAT,MASK} bits */
-#define PORT_IRQ_COLD_PRES(1 << 31) /* cold presence detect */
-#define PORT_IRQ_TF_ERR   (1 << 30) /* task file error */
-#define PORT_IRQ_HBUS_ERR (1 << 29) /* host bus fatal error */
-#define PORT_IRQ_

[Qemu-devel] KVM call minutes for Feb 1

2011-02-01 Thread Chris Wright

KVM upstream merge: status, plans, coordination
- Jan has a git tree, consolidating
- qemu-kvm io threading is still an issue
- Anthony wants to just merge
  - concerns with non-x86 arch and merge
  - concerns with big-bang patch merge and following stability
- post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
  a problem if it's not there by then
- testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
- qemu-kvm still evolving, needs to get sync'd or it will keep diverging
- 2 implementations of main init, cpu init, Jan has merged them into one
  - qemu-kvm-x86.c file that's only a few hundred lines
- review as one patch to see the fundamental difference

QMP support status for 0.14
- declare QMP fully supported
  - caveats: specific errors aren't guaranteed yet (primarily documentation)
  - human monitor passthrough command is best effort
- device tree structure is not reliable, use name not path
- will send out patch to update qmp-commands.hx to document this (and Cc
  libvirt)
- schema file (json subset which is python) and code generator to
  generate code with C structures, also generates client library for
  test cases (can test against new and old qmp server to verify hasn't
  changed)
  - HMP implemented in terms of QMP only
  - at the end should have a test framework to test all commands
  - glib/gtest framework

0.14 stable fork today
already posted 0.14 patches?
- will pick up all those patches before forking, fork at the end of the day
- will grab latest SeaBIOS and vgabios

SeaBIOS update for 0.14 (AHCI boot capable version)
- need to check if (and why) AHCI is disabled by default 
  - assuming no fundamental issues, could be enabled and become an
experimental new 0.14 feature

Summer of code 2011
- http://wiki.qemu.org/Google_Summer_of_Code_2011
- update wiki page with project ideas (let Anthony or Luiz know if you
  want to be a mentor)
- application is due at end of the month
- mentors...be prepared that projects may take longer than just the
  summer of code to complete
- join #qemu-gsoc on OFTC for gsoc discussions

Going to FOSDEM?  agraf will be there...

[Qemu-devel] [PATCH] linux-user: avoid gcc array overrun warning for sparc

2011-02-01 Thread Peter Maydell

Suppress a gcc array bounds overrun warning when filling in the SPARC
signal frame by adjusting our definition of the structure so that the
fp and callers_pc membes are part of the ins[] array rather than
separate fields; since qemu has no need to access the fields individually
there is no need to follow the kernel's structure field naming exactly.

Signed-off-by: Peter Maydell 
---
This is a fix for another warning that the armel gcc gives:
linux-user/signal.c:1979: error: array subscript is above array bounds
so if it passes review I think it's a good candidate for putting in 0.14.

 linux-user/signal.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 0664770..b01bd64 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1817,9 +1817,10 @@ struct target_sigcontext {
 /* A Sparc stack frame */
 struct sparc_stackf {
 abi_ulong locals[8];
-abi_ulong ins[6];
-struct sparc_stackf *fp;
-abi_ulong callers_pc;
+abi_ulong ins[8];
+/* It's simpler to treat fp and callers_pc as elements of ins[]
+ * since we never need to access them ourselves.
+ */
 char *structptr;
 abi_ulong xargs[6];
 abi_ulong xxargs[1];
-- 
1.7.1

Re: [Qemu-devel] [PATCH 0.14] tap: safe sndbuf default

2011-02-01 Thread Anthony Liguori


On 02/01/2011 06:25 AM, Michael S. Tsirkin wrote:

With current sndbuf default value, a blocked
target guest can prevent another guest from
transmitting any packets. While current
sndbuf value (1M) is reported to help some
UDP based workloads, the default should
be safe (0).
   


Can you be more specific about the workload this helps and by how much?

Regards,

Anthony Liguori


Signed-off-by: Michael S. Tsirkin
---

I think this should go into 0.14.
Comments?

  net/tap-linux.c |   13 +
  qemu-options.hx |2 +-
  2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/tap-linux.c b/net/tap-linux.c
index f7aa904..ff8cad0 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -82,12 +82,17 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
  return fd;
  }

-/* sndbuf should be set to a value lower than the tx queue
- * capacity of any destination network interface.
+/* sndbuf implements a kind of flow control for tap.
+ * Unfortunately when it's enabled, and packets are sent
+ * to other guests on the same host, the receiver
+ * can lock up the transmitter indefinitely.
+ *
+ * To avoid packet loss, sndbuf should be set to a value lower than the tx
+ * queue capacity of any destination network interface.
   * Ethernet NICs generally have txqueuelen=1000, so 1Mb is
- * a good default, given a 1500 byte MTU.
+ * a good value, given a 1500 byte MTU.
   */
-#define TAP_DEFAULT_SNDBUF 1024*1024
+#define TAP_DEFAULT_SNDBUF 0

  int tap_set_sndbuf(int fd, QemuOpts *opts)
  {
diff --git a/qemu-options.hx b/qemu-options.hx
index 11c93a2..ca4d5c9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1057,7 +1057,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
  "use '[down]script=no' to disable script execution\n"
  "use 'fd=h' to connect to an already opened TAP 
interface\n"
  "use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n"
-"default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n"
+"default is disabled 'sndbuf=0' to enable flow control set 
'sndbuf=1048576')\n"
  "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n"
  "use vnet_hdr=on to make the lack of IFF_VNET_HDR support an 
error condition\n"
  "use vhost=on to enable experimental in kernel 
accelerator\n"

Re: [Qemu-devel] [PATCH] linux-user: avoid gcc array overrun warning for sparc

2011-02-01 Thread Peter Maydell

On 1 February 2011 15:54, Peter Maydell  wrote:
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -1817,9 +1817,10 @@ struct target_sigcontext {
>  /* A Sparc stack frame */
>  struct sparc_stackf {
>         abi_ulong locals[8];
> -        abi_ulong ins[6];
> -        struct sparc_stackf *fp;
> -        abi_ulong callers_pc;
> +        abi_ulong ins[8];
> +        /* It's simpler to treat fp and callers_pc as elements of ins[]
> +         * since we never need to access them ourselves.
> +         */
>         char *structptr;

Incidentally, I think the presence of a host pointer in a target
structure definition is a (different) bug which might cause problems
when the target and host have different pointer sizes...

-- PMM

Re: [Qemu-devel] [PATCH 0/2] virtagent - fsfreeze support

2011-02-01 Thread Richard W.M. Jones

On Tue, Feb 01, 2011 at 02:25:12PM +0300, Vasiliy G Tolstov wrote:
> On Tue, 2011-02-01 at 11:58 +0100, jes.soren...@redhat.com wrote:
> > From: Jes Sorensen 
> > 
> > Hi
> > 
> > This is a first attempt to add fsfreeze support to virtagent. The idea
> > is for the guest agent to walk the list of locally mounted file
> > systems in the guest, and issuing an ioctl to freeze them. The host
> > can then do a live snapshot of the guest, obtaining stable file
> > systems. After the snapshot, the host then calls the thaw function in
> > virtagent, which goes through the list of previously frozen file
> > systems and unfreezes them.
> > 
> > The list walking ignores remote file systems such as NFS and CIFS as
> > well as all pseudo file systems.
> > 
> > The guest agent code is in the first patch, and host agent code is in
> > the second patch. For now there is only human monitor support, but it
> > should be pretty straight forward to add QMP support as well.
> > 
> > Patches are against the virtagent-dev git tree.
> > 
> > Comments and suggestions welcome!
> > 
> > Cheers,
> > Jes
> 
> Hello. Very nice feature. Sorry for offropic, but can this feature can
> be used to modify partiotion table on already mounted device (for
> example root on ext3? )

There are some experimental patches to libguestfs to do live
filesystem and partition manipulations now:

  https://www.redhat.com/archives/libguestfs/2011-January/msg00096.html

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

[Qemu-devel] Re: [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Michael Roth


On 02/01/2011 04:58 AM, jes.soren...@redhat.com wrote:

From: Jes Sorensen

Implement freeze/thaw support in the guest, allowing the host to
request the guest freezes all it's file systems before a live snapshot
is performed.
  - fsfreeze(): Walk the list of mounted local real file systems,
and freeze them.
  - fsthaw():   Walk the list of previously frozen file systems and
thaw them.
  - fsstatus(): Return the current status of freeze/thaw. The host must
poll this function, in case fsfreeze() returned with a
   timeout, to wait for the operation to finish.

Signed-off-by: Jes Sorensen
---
  virtagent-common.h |8 ++
  virtagent-server.c |  196 
  2 files changed, 204 insertions(+), 0 deletions(-)

diff --git a/virtagent-common.h b/virtagent-common.h
index 5d8f5c1..220a4b6 100644
--- a/virtagent-common.h
+++ b/virtagent-common.h
@@ -61,6 +61,14 @@ typedef struct VAContext {
  const char *channel_path;
  } VAContext;

+enum vs_fsfreeze_status {
+FREEZE_ERROR = -1,
+FREEZE_THAWED = 0,
+FREEZE_INPROGRESS = 1,
+FREEZE_FROZEN = 2,
+FREEZE_THAWINPROGRESS = 3,
+};


Any reason for vs_* vs. va_*?


+
  enum va_job_status {
  VA_JOB_STATUS_PENDING = 0,
  VA_JOB_STATUS_OK,
diff --git a/virtagent-server.c b/virtagent-server.c
index 7bb35b2..cf2a3f0 100644
--- a/virtagent-server.c
+++ b/virtagent-server.c
@@ -14,6 +14,13 @@
  #include
  #include "qemu_socket.h"
  #include "virtagent-common.h"
+#include
+#include
+#include
+#include
+#include
+#include
+#include


Can probably clean these up a bit, I believe fcntl.h/errno.h/stat.h are 
already available at least.




  static VAServerData *va_server_data;
  static bool va_enable_syslog = false; /* enable syslog'ing of RPCs */
@@ -217,6 +224,189 @@ static xmlrpc_value *va_hello(xmlrpc_env *env,
  return result;
  }

+
+/*
+ * Walk the mount table and build a list of local file systems
+ */
+
+struct direntry {
+char *dirname;
+char *devtype;
+struct direntry *next;
+};
+
+static struct direntry *mount_list;
+static int fsfreeze_status;
+
+static int build_mount_list(void)
+{
+struct mntent *mnt;
+struct direntry *entry;
+struct direntry *next;
+char const *mtab = MOUNTED;
+FILE *fp;
+
+fp = setmntent(mtab, "r");
+if (!fp) {
+   fprintf(stderr, "unable to read mtab\n");
+   goto fail;
+}
+
+while ((mnt = getmntent(fp))) {
+   /*
+* An entry which device name doesn't start with a '/' is
+* either a dummy file system or a network file system.
+* Add special handling for smbfs and cifs as is done by
+* coreutils as well.
+*/
+   if ((mnt->mnt_fsname[0] != '/') ||
+   (strcmp(mnt->mnt_type, "smbfs") == 0) ||
+   (strcmp(mnt->mnt_type, "cifs") == 0)) {
+   continue;
+   }
+
+   entry = qemu_malloc(sizeof(struct direntry));
+   if (!entry) {
+   goto fail;
+   }
+   entry->dirname = qemu_strdup(mnt->mnt_dir);
+   entry->devtype = qemu_strdup(mnt->mnt_type);
+   entry->next = mount_list;
+
+   mount_list = entry;
+}
+
+endmntent(fp);
+
+return 0;
+
+fail:
+while(mount_list) {
+   next = mount_list->next;
+   qemu_free(mount_list->dirname);
+   qemu_free(mount_list->devtype);
+   qemu_free(mount_list);
+   mount_list = next;
+}


should be spaces instead of tabs


+
+return -1;
+}
+
+/*
+ * va_fsfreeze(): Walk list of mounted file systems in the guest, and
+ *   freeze the ones which are real local file systems.
+ * rpc return values: Number of file systems frozen, -1 on error.
+ */
+static xmlrpc_value *va_fsfreeze(xmlrpc_env *env,
+ xmlrpc_value *params,
+ void *user_data)
+{
+xmlrpc_int32 ret = 0, i = 0;
+xmlrpc_value *result;
+struct direntry *entry;
+int fd;
+SLOG("va_fsfreeze()");
+
+if (fsfreeze_status == FREEZE_FROZEN) {
+ret = 0;
+goto out;
+}
+
+ret = build_mount_list();
+if (ret<  0) {
+goto out;
+}
+
+fsfreeze_status = FREEZE_INPROGRESS;
+
+entry = mount_list;


I think as we start adding more and more stateful RPCs, free-floating 
state variables can start getting a bit hairy to keep track of. 
Eventually I'd like to have state information that only applies to a 
subset of RPCs consolidated into a single object. I wouldn't focus on 
this too much because I'd like to have an interface to do this in the 
future (mainly so they can state objects can register themselves and 
provide a reset() function that can be called when, for instance, an 
agent disconnects/reconnects), but in the meantime I think it would be 
more readable to have a global va_fsfreeze_state object to track freeze 
status and mount points.



+while(entry) {
+fd = qemu_open(entry->dirname, O_RDONL

Re: [Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Aurelien Jarno

On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
> When using level based interrupts, the interrupt is treated the same as an
> edge triggered one: leaving the line up does not retrigger the interrupt.
> 
> In fact, when not lowering the line, we won't ever get a new interrupt inside
> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
> something on the device. This way we're sure the guest doesn't starve on
> interrupts until someone fixes the actual interrupt path.

Given this issue mostly concerns x86 and not other architectures where
the SATA emulation can probably be used, what about putting the two
versions of the codes like in i8259.c:

| * all targets should do this rather than acking the IRQ in the cpu */
| #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)

The list of architectures here is reduced given the few architectures 
that actually use the i8259, so for ahci.c it should probably be #if not
defined(TARGET_I386) instead.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Aurelien Jarno

On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
> 
> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
> 
> > On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
> >> When using level based interrupts, the interrupt is treated the same as an
> >> edge triggered one: leaving the line up does not retrigger the interrupt.
> >> 
> >> In fact, when not lowering the line, we won't ever get a new interrupt 
> >> inside
> >> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
> >> something on the device. This way we're sure the guest doesn't starve on
> >> interrupts until someone fixes the actual interrupt path.
> > 
> > Given this issue mostly concerns x86 and not other architectures where
> > the SATA emulation can probably be used, what about putting the two
> > versions of the codes like in i8259.c:
> > 
> > | * all targets should do this rather than acking the IRQ in the cpu */
> > | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> > 
> > The list of architectures here is reduced given the few architectures 
> > that actually use the i8259, so for ahci.c it should probably be #if not
> > defined(TARGET_I386) instead.
> 
> Because then we'd have to build the ahci code conditionally on the 
> architecture. Right now it builds into libhw :)

If we want to keep it in libhw, it's probably better to disable it on
other architectures than i386.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Alexander Graf


On 01.02.2011, at 18:06, Aurelien Jarno wrote:

> On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
>> 
>> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
>> 
>>> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
 When using level based interrupts, the interrupt is treated the same as an
 edge triggered one: leaving the line up does not retrigger the interrupt.
 
 In fact, when not lowering the line, we won't ever get a new interrupt 
 inside
 the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
 something on the device. This way we're sure the guest doesn't starve on
 interrupts until someone fixes the actual interrupt path.
>>> 
>>> Given this issue mostly concerns x86 and not other architectures where
>>> the SATA emulation can probably be used, what about putting the two
>>> versions of the codes like in i8259.c:
>>> 
>>> | * all targets should do this rather than acking the IRQ in the cpu */
>>> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
>>> 
>>> The list of architectures here is reduced given the few architectures 
>>> that actually use the i8259, so for ahci.c it should probably be #if not
>>> defined(TARGET_I386) instead.
>> 
>> Because then we'd have to build the ahci code conditionally on the 
>> architecture. Right now it builds into libhw :)
> 
> If we want to keep it in libhw, it's probably better to disable it on
> other architectures than i386.

How so? The workaround simply triggers a superfluous interrupt event, but 
shouldn't break anything for other architectures. In fact, last time I checked 
it worked just fine on ppc.


Alex

[Qemu-devel] [PATCH 4/4] add CONFIG_VMPORT option

2011-02-01 Thread Eduardo Habkost

This allows vmport to be easily enabled or disabled at build time.

Signed-off-by: Eduardo Habkost 
---
 Makefile.target|3 ++-
 default-configs/i386-softmmu.mak   |2 ++
 default-configs/x86_64-softmmu.mak |2 ++
 hw/pc_piix.c   |2 ++
 4 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index c184af6..c55951a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -220,12 +220,13 @@ obj-$(CONFIG_KVM) += ivshmem.o
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
 obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
-obj-i386-y += vmport.o hpet.o applesmc.o
+obj-i386-y += hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
 
 obj-i386-$(CONFIG_VMMOUSE) += vmmouse.o
+obj-i386-$(CONFIG_VMPORT) += vmport.o
 
 
 # shared objects
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index a4450bc..e195b47 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -23,4 +23,6 @@ CONFIG_NE2000_ISA=y
 CONFIG_PIIX_PCI=y
 CONFIG_SOUND=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VMPORT=y
+#NOTE: VMMOUSE depends on VMPORT
 CONFIG_VMMOUSE=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 658b249..8782cb9 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -23,4 +23,6 @@ CONFIG_NE2000_ISA=y
 CONFIG_PIIX_PCI=y
 CONFIG_SOUND=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VMPORT=y
+#NOTE: VMMOUSE depends on VMPORT
 CONFIG_VMMOUSE=y
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 7d29d43..c0697e0 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -85,7 +85,9 @@ static void pc_init1(ram_addr_t ram_size,
 
 pc_cpus_init(cpu_model);
 
+#ifdef CONFIG_VMPORT
 vmport_init();
+#endif
 
 /* allocate ram and load rom/bios */
 pc_memory_init(ram_size, kernel_filename, kernel_cmdline, initrd_filename,
-- 
1.7.3.2

[Qemu-devel] Re: KVM call minutes for Feb 1

2011-02-01 Thread Jan Kiszka

On 2011-02-01 17:53, Anthony Liguori wrote:
> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>> On 2011-02-01 16:54, Chris Wright wrote:
>>
>>> KVM upstream merge: status, plans, coordination
>>> - Jan has a git tree, consolidating
>>> - qemu-kvm io threading is still an issue
>>> - Anthony wants to just merge
>>>- concerns with non-x86 arch and merge
>>>- concerns with big-bang patch merge and following stability
>>> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
>>>a problem if it's not there by then
>>> - testing and nuances are still an issue (e.g. stefan berger's mmio read 
>>> issue)
>>> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
>>> - 2 implementations of main init, cpu init, Jan has merged them into one
>>>- qemu-kvm-x86.c file that's only a few hundred lines
>>> - review as one patch to see the fundamental difference
>>>  
>> More precisely, my current work flow is to pick some function(s), e.g.
>> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
>> upstream so that qemu-kvm could use that implementation?". If they
>> differ, the reasons need to be understood and patched away, either by
>> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
>> changes are merged back, a qemu-kvm patch is posted to switch to that
>> version.
>>
>> Any help will be welcome, either via review of my subtle regressions or
>> on resolving concrete differences.
>>
>> E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
>> of its own iothread code, can we wrap that away or do we need to
>> consolidate the threading code first? Or do we need to fix something in
>> upstream?
>>
> 
> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small 
> diff post your series, I'd be happy to take a look at it and see what I 
> can explain.
> 

Looks like it's around signalfd and its emulation:

[git diff qemu/master..master posix-aio-compat.c]

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index fa5494d..0704064 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -28,6 +28,7 @@
 #include "qemu-common.h"
 #include "trace.h"
 #include "block_int.h"
+#include "compatfd.h"
 
 #include "block/raw-posix-aio.h"
 
@@ -55,7 +56,7 @@ struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-int rfd, wfd;
+int fd;
 struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -474,18 +475,29 @@ static int posix_aio_process_queue(void *opaque)
 static void posix_aio_read(void *opaque)
 {
 PosixAioState *s = opaque;
-ssize_t len;
+union {
+struct qemu_signalfd_siginfo siginfo;
+char buf[128];
+} sig;
+size_t offset;
 
-/* read all bytes from signal pipe */
-for (;;) {
-char bytes[16];
+/* try to read from signalfd, don't freak out if we can't read anything */
+offset = 0;
+while (offset < 128) {
+ssize_t len;
 
-len = read(s->rfd, bytes, sizeof(bytes));
+len = read(s->fd, sig.buf + offset, 128 - offset);
 if (len == -1 && errno == EINTR)
-continue; /* try again */
-if (len == sizeof(bytes))
-continue; /* more to read */
-break;
+continue;
+if (len == -1 && errno == EAGAIN) {
+/* there is no natural reason for this to happen,
+ * so we'll spin hard until we get everything just
+ * to be on the safe side. */
+if (offset > 0)
+continue;
+}
+
+offset += len;
 }
 
 posix_aio_process_queue(s);
@@ -499,20 +511,6 @@ static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
-static void aio_signal_handler(int signum)
-{
-if (posix_aio_state) {
-char byte = 0;
-ssize_t ret;
-
-ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
-if (ret < 0 && errno != EAGAIN)
-die("write()");
-}
-
-qemu_service_io();
-}
-
 static void paio_remove(struct qemu_paiocb *acb)
 {
 struct qemu_paiocb **pacb;
@@ -616,9 +614,8 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 
 int paio_init(void)
 {
-struct sigaction act;
+sigset_t mask;
 PosixAioState *s;
-int fds[2];
 int ret;
 
 if (posix_aio_state)
@@ -626,24 +623,21 @@ int paio_init(void)
 
 s = qemu_malloc(sizeof(PosixAioState));
 
-sigfillset(&act.sa_mask);
-act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-act.sa_handler = aio_signal_handler;
-sigaction(SIGUSR2, &act, NULL);
+/* Make sure to block AIO signal */
+sigemptyset(&mask);
+sigaddset(&mask, SIGUSR2);
+sigprocmask(SIG_BLOCK, &mask, NULL);
 
 s->first_aio = NULL;
-if (qemu_pipe(fds) == -1) {
-fprintf(stderr, "failed to create pipe\n");
+s->fd = qemu_signalfd(&mask);
+if (s->fd == -1) {
+fprintf(stderr, "failed to create signalfd\n");
 return

Re: [Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Alexander Graf


On 01.02.2011, at 17:34, Aurelien Jarno wrote:

> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
>> When using level based interrupts, the interrupt is treated the same as an
>> edge triggered one: leaving the line up does not retrigger the interrupt.
>> 
>> In fact, when not lowering the line, we won't ever get a new interrupt inside
>> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
>> something on the device. This way we're sure the guest doesn't starve on
>> interrupts until someone fixes the actual interrupt path.
> 
> Given this issue mostly concerns x86 and not other architectures where
> the SATA emulation can probably be used, what about putting the two
> versions of the codes like in i8259.c:
> 
> | * all targets should do this rather than acking the IRQ in the cpu */
> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> 
> The list of architectures here is reduced given the few architectures 
> that actually use the i8259, so for ahci.c it should probably be #if not
> defined(TARGET_I386) instead.

Because then we'd have to build the ahci code conditionally on the 
architecture. Right now it builds into libhw :)


Alex

[Qemu-devel] [PATCH 0/4] fix/add CONFIG_* options for VMWare device emulation

2011-02-01 Thread Eduardo Habkost

Hi,

This series makes CONFIG_VMWARE_VGA actually work (today we can't disable the
option without getting a build error).

It also add two new options: CONFIG_VMMOUSE and CONFIG_VMPORT, for vmmouse.o
and vmport.o.

Eduardo Habkost (4):
  Add config-devices.h again
  skip pci_vmsvga_init() calls if CONFIG_VMWARE_VGA is disabled
  add CONFIG_VMMOUSE option
  add CONFIG_VMPORT option

 Makefile   |7 +--
 Makefile.target|8 ++--
 config.h   |   11 +++
 default-configs/i386-softmmu.mak   |3 +++
 default-configs/x86_64-softmmu.mak |3 +++
 hw/mips_malta.c|4 
 hw/pc.c|6 ++
 hw/pc_piix.c   |2 ++
 8 files changed, 40 insertions(+), 4 deletions(-)

-- 
1.7.3.2

[Qemu-devel] [PATCH 3/4] add CONFIG_VMMOUSE option

2011-02-01 Thread Eduardo Habkost

This will allow vmmouse to be disabled at build time if necessary.

Signed-off-by: Eduardo Habkost 
---
 Makefile.target|5 -
 default-configs/i386-softmmu.mak   |1 +
 default-configs/x86_64-softmmu.mak |1 +
 hw/pc.c|2 ++
 4 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 03fc486..c184af6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -220,11 +220,14 @@ obj-$(CONFIG_KVM) += ivshmem.o
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
 obj-i386-y += cirrus_vga.o apic.o ioapic.o piix_pci.o
-obj-i386-y += vmmouse.o vmport.o hpet.o applesmc.o
+obj-i386-y += vmport.o hpet.o applesmc.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
 
+obj-i386-$(CONFIG_VMMOUSE) += vmmouse.o
+
+
 # shared objects
 obj-ppc-y = ppc.o
 obj-ppc-y += vga.o
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index ed00471..a4450bc 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -23,3 +23,4 @@ CONFIG_NE2000_ISA=y
 CONFIG_PIIX_PCI=y
 CONFIG_SOUND=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VMMOUSE=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 5183203..658b249 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -23,3 +23,4 @@ CONFIG_NE2000_ISA=y
 CONFIG_PIIX_PCI=y
 CONFIG_SOUND=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VMMOUSE=y
diff --git a/hw/pc.c b/hw/pc.c
index e872a7b..31ac075 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1059,7 +1059,9 @@ void pc_basic_device_init(qemu_irq *isa_irq,
 a20_line = qemu_allocate_irqs(handle_a20_line_change, first_cpu, 1);
 i8042 = isa_create_simple("i8042");
 i8042_setup_a20_line(i8042, a20_line);
+#ifdef CONFIG_VMMOUSE
 vmmouse_init(i8042);
+#endif
 
 cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1);
 DMA_init(0, cpu_exit_irq);
-- 
1.7.3.2

[Qemu-devel] [PATCH 2/4] skip pci_vmsvga_init() calls if CONFIG_VMWARE_VGA is disabled

2011-02-01 Thread Eduardo Habkost

I was planning to add the check for CONFIG_VMWARE to the command-line
parsing code in vl.c, but vl.c is not built by Makefile.target, so we
can't test for a per-target config option there.

It is not the best solution, but it is better than simply having a
CONFIG_VMWARE_VGA option that doesn't work and can't be disabled. I
don't see a good way to implement it that wouldn't involve heavily
refactoring completely the '-vga' option parsing code.

Signed-off-by: Eduardo Habkost 
---
 hw/mips_malta.c |4 
 hw/pc.c |4 
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/mips_malta.c b/hw/mips_malta.c
index 6be8aa7..b4164a0 100644
--- a/hw/mips_malta.c
+++ b/hw/mips_malta.c
@@ -975,7 +975,11 @@ void mips_malta_init (ram_addr_t ram_size,
 if (cirrus_vga_enabled) {
 pci_cirrus_vga_init(pci_bus);
 } else if (vmsvga_enabled) {
+#ifdef CONFIG_VMWARE_VGA
 pci_vmsvga_init(pci_bus);
+#else
+fprintf(stderr, "%s: vmware_vga support is not compiled in\n", 
__FUNCTION__);
+#endif /* CONFIG_VMWARE_VGA */
 } else if (std_vga_enabled) {
 pci_vga_init(pci_bus);
 }
diff --git a/hw/pc.c b/hw/pc.c
index 119c110..e872a7b 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -987,10 +987,14 @@ void pc_vga_init(PCIBus *pci_bus)
 isa_cirrus_vga_init();
 }
 } else if (vmsvga_enabled) {
+#ifdef CONFIG_VMWARE_VGA
 if (pci_bus)
 pci_vmsvga_init(pci_bus);
 else
 fprintf(stderr, "%s: vmware_vga: no PCI bus\n", __FUNCTION__);
+#else
+fprintf(stderr, "%s: vmware_vga support is not compiled in\n", 
__FUNCTION__);
+#endif /* CONFIG_VMWARE_VGA */
 } else if (std_vga_enabled) {
 if (pci_bus) {
 pci_vga_init(pci_bus);
-- 
1.7.3.2

Re: [Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Aurelien Jarno

On Tue, Feb 01, 2011 at 06:10:56PM +0100, Alexander Graf wrote:
> 
> On 01.02.2011, at 18:06, Aurelien Jarno wrote:
> 
> > On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
> >> 
> >> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
> >> 
> >>> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
>  When using level based interrupts, the interrupt is treated the same as 
>  an
>  edge triggered one: leaving the line up does not retrigger the interrupt.
>  
>  In fact, when not lowering the line, we won't ever get a new interrupt 
>  inside
>  the guest. So let's always retrigger an interrupt as soon as the OS 
>  ack'ed
>  something on the device. This way we're sure the guest doesn't starve on
>  interrupts until someone fixes the actual interrupt path.
> >>> 
> >>> Given this issue mostly concerns x86 and not other architectures where
> >>> the SATA emulation can probably be used, what about putting the two
> >>> versions of the codes like in i8259.c:
> >>> 
> >>> | * all targets should do this rather than acking the IRQ in the cpu */
> >>> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> >>> 
> >>> The list of architectures here is reduced given the few architectures 
> >>> that actually use the i8259, so for ahci.c it should probably be #if not
> >>> defined(TARGET_I386) instead.
> >> 
> >> Because then we'd have to build the ahci code conditionally on the 
> >> architecture. Right now it builds into libhw :)
> > 
> > If we want to keep it in libhw, it's probably better to disable it on
> > other architectures than i386.
> 
> How so? The workaround simply triggers a superfluous interrupt event, but 
> shouldn't break anything for other architectures. In fact, last time I 
> checked it worked just fine on ppc.
> 

Superfluous interrupt events can have a lot of consequences, it's often
a kernel panic (see recent zilog serial port issues for example), and
when it comes to disk controllers, it might mean data loss.

Anyway the way to go there is cleary to fix the x86 interrupt model,
it's the only one broken among all targets. If we can't find anyone to
do that, we can also declare x86 unsupported :)

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] Re: KVM call minutes for Feb 1

2011-02-01 Thread Christoph Hellwig

On Tue, Feb 01, 2011 at 05:36:13PM +0100, Jan Kiszka wrote:
> kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
> upstream so that qemu-kvm could use that implementation?". If they
> differ, the reasons need to be understood and patched away, either by
> fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
> changes are merged back, a qemu-kvm patch is posted to switch to that
> version.

while I'm not an expert in that area I really like you're approach.  I'd
really prefer to let you finish up all the major work that way before
starting massive revamping like the glib main loop.  Resolving the
qemu/qemu-kvm schisma surely is more important for the overall project
than rewriting existing functionality to look a little nicer.

[Qemu-devel] KVM: Windows 64-bit troubles with user space irqchip

2011-02-01 Thread Jan Kiszka

Hi,

testing my KVM patches, I noticed that none of the 64-bit Windows
versions I have around (early Win7 & 2003 server) boot in KVM mode when
using 2 or more VCPUs and the user space irqchip. This applies to both
upstream KVM and qemu-kvm, with our without any of my current patches. A
subtle difference in the APIC/IOAPIC emulation?

Can anyone confirm this?

Jan

PS: Looks like they boot fine without CONFIG_IOTHREAD (over upstream).

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH] hw/slavio_intctl.c: fix gcc warning about array bounds overrun

2011-02-01 Thread Blue Swirl

Thanks, applied.

On Mon, Jan 31, 2011 at 10:42 AM, Peter Maydell
 wrote:
> The Ubuntu 10.10 gcc for ARM complains that we might be overrunning
> the cpu_irqs[][] array: silence this by correcting the bounds on the
> loop. (In fact we would not have overrun the array because bit
> MAX_PILS in pil_pending and irl_out will always be 0.)
>
> Also add a comment about why the loop's lower bound is OK.
>
> Signed-off-by: Peter Maydell 
> ---
> I've tested that with this change we still boot the sparc
> Debian image from http://people.debian.org/~aurel32/qemu/sparc/
> and the change makes sense according to my understanding of
> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>
>  hw/slavio_intctl.c |    7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
> index fd69354..a83e5b8 100644
> --- a/hw/slavio_intctl.c
> +++ b/hw/slavio_intctl.c
> @@ -289,7 +289,12 @@ static void slavio_check_interrupts(SLAVIO_INTCTLState 
> *s, int set_irqs)
>         pil_pending |= (s->slaves[i].intreg_pending & CPU_SOFTIRQ_MASK) >> 16;
>
>         if (set_irqs) {
> -            for (j = MAX_PILS; j > 0; j--) {
> +            /* Since there is not really an interrupt 0 (and pil_pending
> +             * and irl_out bit zero are thus always zero) there is no need
> +             * to do anything with cpu_irqs[i][0] and it is OK not to do
> +             * the j=0 iteration of this loop.
> +             */
> +            for (j = MAX_PILS-1; j > 0; j--) {
>                 if (pil_pending & (1 << j)) {
>                     if (!(s->slaves[i].irl_out & (1 << j))) {
>                         qemu_irq_raise(s->cpu_irqs[i][j]);
> --
> 1.7.1
>
>

Re: [Qemu-devel] [PATCH] linux-user: avoid gcc array overrun warning for sparc

2011-02-01 Thread Blue Swirl

On Tue, Feb 1, 2011 at 4:00 PM, Peter Maydell  wrote:
> On 1 February 2011 15:54, Peter Maydell  wrote:
>> --- a/linux-user/signal.c
>> +++ b/linux-user/signal.c
>> @@ -1817,9 +1817,10 @@ struct target_sigcontext {
>>  /* A Sparc stack frame */
>>  struct sparc_stackf {
>>         abi_ulong locals[8];
>> -        abi_ulong ins[6];
>> -        struct sparc_stackf *fp;
>> -        abi_ulong callers_pc;
>> +        abi_ulong ins[8];
>> +        /* It's simpler to treat fp and callers_pc as elements of ins[]
>> +         * since we never need to access them ourselves.
>> +         */
>>         char *structptr;
>
> Incidentally, I think the presence of a host pointer in a target
> structure definition is a (different) bug which might cause problems
> when the target and host have different pointer sizes...

Right, it was copied from Linux. I can't see where it was used.
UREG_FP use cases look OK.

Re: [Qemu-devel] [PATCH v2] SPARC: Fix Leon3 cache control

2011-02-01 Thread Blue Swirl

Thanks, applied.

On Mon, Jan 31, 2011 at 10:36 AM, Fabien Chouteau  wrote:
> The "leon3_cache_control_int" (op_helper.c) function is called within leon3.c
> which leads to segfault error with the global "env".
>
> Now cache control is a CPU feature and everything is handled in op_helper.c.
>
> Signed-off-by: Fabien Chouteau 
> ---
>  hw/leon3.c               |    5 ++---
>  target-sparc/cpu.h       |    8 ++--
>  target-sparc/helper.c    |    2 +-
>  target-sparc/op_helper.c |   18 ++
>  4 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/hw/leon3.c b/hw/leon3.c
> index 69d8f3b..919f49f 100644
> --- a/hw/leon3.c
> +++ b/hw/leon3.c
> @@ -56,10 +56,9 @@ static void main_cpu_reset(void *opaque)
>     env->npc    = s->entry + 4;
>  }
>
> -static void leon3_irq_ack(void *irq_manager, int intno)
> +void leon3_irq_ack(void *irq_manager, int intno)
>  {
>     grlib_irqmp_ack((DeviceState *)irq_manager, intno);
> -    leon3_cache_control_int();
>  }
>
>  static void leon3_set_pil_in(void *opaque, uint32_t pil_in)
> @@ -130,7 +129,7 @@ static void leon3_generic_hw_init(ram_addr_t  ram_size,
>     /* Allocate IRQ manager */
>     grlib_irqmp_create(0x8200, env, &cpu_irqs, MAX_PILS, 
> &leon3_set_pil_in);
>
> -    env->qemu_irq_ack = leon3_irq_ack;
> +    env->qemu_irq_ack = leon3_irq_manager;
>
>     /* Allocate RAM */
>     if ((uint64_t)ram_size > (1UL << 30)) {
> diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
> index 6f5990b..320530e 100644
> --- a/target-sparc/cpu.h
> +++ b/target-sparc/cpu.h
> @@ -268,6 +268,8 @@ typedef struct sparc_def_t {
>  #define CPU_FEATURE_GL           (1 << 13)
>  #define CPU_FEATURE_TA0_SHUTDOWN (1 << 14) /* Shutdown on "ta 0x0" */
>  #define CPU_FEATURE_ASR17        (1 << 15)
> +#define CPU_FEATURE_CACHE_CTRL   (1 << 16)
> +
>  #ifndef TARGET_SPARC64
>  #define CPU_DEFAULT_FEATURES (CPU_FEATURE_FLOAT | CPU_FEATURE_SWAP |  \
>                               CPU_FEATURE_MUL | CPU_FEATURE_DIV |     \
> @@ -476,12 +478,14 @@ void cpu_put_cwp64(CPUState *env1, int cwp);
>  int cpu_cwp_inc(CPUState *env1, int cwp);
>  int cpu_cwp_dec(CPUState *env1, int cwp);
>  void cpu_set_cwp(CPUState *env1, int new_cwp);
> -
> -void leon3_cache_control_int(void);
> +void leon3_irq_manager(void *irq_manager, int intno);
>
>  /* sun4m.c, sun4u.c */
>  void cpu_check_irqs(CPUSPARCState *env);
>
> +/* leon3.c */
> +void leon3_irq_ack(void *irq_manager, int intno);
> +
>  #if defined (TARGET_SPARC64)
>
>  static inline int compare_masked(uint64_t x, uint64_t y, uint64_t mask)
> diff --git a/target-sparc/helper.c b/target-sparc/helper.c
> index 2f3d1e6..b2d4d70 100644
> --- a/target-sparc/helper.c
> +++ b/target-sparc/helper.c
> @@ -1289,7 +1289,7 @@ static const sparc_def_t sparc_defs[] = {
>         .mmu_trcr_mask = 0x,
>         .nwindows = 8,
>         .features = CPU_DEFAULT_FEATURES | CPU_FEATURE_TA0_SHUTDOWN |
> -        CPU_FEATURE_ASR17,
> +        CPU_FEATURE_ASR17 | CPU_FEATURE_CACHE_CTRL,
>     },
>  #endif
>  };
> diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c
> index d3e1b63..854f168 100644
> --- a/target-sparc/op_helper.c
> +++ b/target-sparc/op_helper.c
> @@ -1653,7 +1653,7 @@ static void dump_asi(const char *txt, target_ulong 
> addr, int asi, int size,
>
>  /* Leon3 cache control */
>
> -void leon3_cache_control_int(void)
> +static void leon3_cache_control_int(void)
>  {
>     uint32_t state = 0;
>
> @@ -1741,11 +1741,17 @@ static uint64_t leon3_cache_control_ld(target_ulong 
> addr, int size)
>         DPRINTF_CACHE_CONTROL("read unknown register %08x\n", addr);
>         break;
>     };
> -    DPRINTF_CACHE_CONTROL("st addr:%08x, ret:%" PRIx64 ", size:%d\n",
> +    DPRINTF_CACHE_CONTROL("ld addr:%08x, ret:0x%" PRIx64 ", size:%d\n",
>                           addr, ret, size);
>     return ret;
>  }
>
> +void leon3_irq_manager(void *irq_manager, int intno)
> +{
> +    leon3_irq_ack(irq_manager, intno);
> +    leon3_cache_control_int();
> +}
> +
>  uint64_t helper_ld_asi(target_ulong addr, int asi, int size, int sign)
>  {
>     uint64_t ret = 0;
> @@ -1760,7 +1766,9 @@ uint64_t helper_ld_asi(target_ulong addr, int asi, int 
> size, int sign)
>         case 0x00:          /* Leon3 Cache Control */
>         case 0x08:          /* Leon3 Instruction Cache config */
>         case 0x0C:          /* Leon3 Date Cache config */
> -            ret = leon3_cache_control_ld(addr, size);
> +            if (env->def->features & CPU_FEATURE_CACHE_CTRL) {
> +                ret = leon3_cache_control_ld(addr, size);
> +            }
>             break;
>         case 0x01c00a00: /* MXCC control register */
>             if (size == 8)
> @@ -1994,7 +2002,9 @@ void helper_st_asi(target_ulong addr, uint64_t val, int 
> asi, int size)
>         case 0x00:          /* Leon3 Cache Control */
>         case 0x08:          /* Leon3 Instruction Cache config */
>         case 0x0C:          /* Leon3 Date Cache config */
> -            leon3_c

[Qemu-devel] Re: [PATCH 15/19] savevm: introduce qemu_savevm_trans_{begin, commit}.

2011-02-01 Thread Yoshiaki Tamura

Paolo,

I refactored the savevm functions.  Could you give me your
comments?

Thanks,

Yoshi

diff --git a/savevm.c b/savevm.c
index 5418280..90aae55 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1602,29 +1602,68 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 return false;
 }

-int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
-int shared)
+/*
+ * section: header to write
+ * inc: if true, forces to pass SECTION_PART instead of SECTION_START
+ * pause: if true, breaks the loop when live handler returned 0
+ */
+static int qemu_savevm_state_live(Monitor *mon, QEMUFile *f, int section,
+  bool inc, bool pause)
 {
 SaveStateEntry *se;
+int skip = 0, ret;

 QTAILQ_FOREACH(se, &savevm_handlers, entry) {
-if(se->set_params == NULL) {
+int len, stage;
+
+if (se->save_live_state == NULL) {
 continue;
-   }
-   se->set_params(blk_enable, shared, se->opaque);
+}
+
+/* Section type */
+qemu_put_byte(f, section);
+qemu_put_be32(f, se->section_id);
+
+if (section == QEMU_VM_SECTION_START) {
+/* ID string */
+len = strlen(se->idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se->idstr, len);
+
+qemu_put_be32(f, se->instance_id);
+qemu_put_be32(f, se->version_id);
+
+stage = inc ? QEMU_VM_SECTION_PART : QEMU_VM_SECTION_START;
+} else {
+assert(inc);
+stage = section;
+}
+
+ret = se->save_live_state(mon, f, stage, se->opaque);
+if (!ret) {
+skip++;
+if (pause) {
+break;
+}
+}
 }
-
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+return skip;
+}
+
+static void qemu_savevm_state_full(QEMUFile *f)
+{
+SaveStateEntry *se;

 QTAILQ_FOREACH(se, &savevm_handlers, entry) {
 int len;

-if (se->save_live_state == NULL)
+if (se->save_state == NULL && se->vmsd == NULL) {
 continue;
+}

 /* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
 qemu_put_be32(f, se->section_id);

 /* ID string */
@@ -1635,8 +1674,28 @@ int qemu_savevm_state_begin(Monitor *mon,
QEMUFile *f, int blk_enable,
 qemu_put_be32(f, se->instance_id);
 qemu_put_be32(f, se->version_id);

-se->save_live_state(mon, f, QEMU_VM_SECTION_START, se->opaque);
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
+int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
+int shared)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+if(se->set_params == NULL) {
+continue;
+}
+se->set_params(blk_enable, shared, se->opaque);
 }
+
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_START, 0, 0);

 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1648,29 +1707,16 @@ int qemu_savevm_state_begin(Monitor *mon,
QEMUFile *f, int blk_enable,

 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
 int ret = 1;

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
-if (se->save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se->section_id);
-
-ret = se->save_live_state(mon, f, QEMU_VM_SECTION_PART, se->opaque);
-if (!ret) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
-}
-}
-
-if (ret)
+/* Do not proceed to the next vmstate before this one reported
+   completion of the current stage. This serializes the migration
+   and reduces the probability that a faster changing state is
+   synchronized over and over again. */
+ret = qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_PART, 1, 1);
+if (!ret) {
 return 1;
+}

 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1682,46 +1728,40 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)

 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
-
 cpu_synchronize_all_states();

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
-if (se->save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_END

[Qemu-devel] Re: [PATCH] linux-user: avoid gcc array overrun warning for sparc

2011-02-01 Thread Blue Swirl

Thanks, applied.

On Tue, Feb 1, 2011 at 3:54 PM, Peter Maydell  wrote:
> Suppress a gcc array bounds overrun warning when filling in the SPARC
> signal frame by adjusting our definition of the structure so that the
> fp and callers_pc membes are part of the ins[] array rather than
> separate fields; since qemu has no need to access the fields individually
> there is no need to follow the kernel's structure field naming exactly.
>
> Signed-off-by: Peter Maydell 
> ---
> This is a fix for another warning that the armel gcc gives:
> linux-user/signal.c:1979: error: array subscript is above array bounds
> so if it passes review I think it's a good candidate for putting in 0.14.
>
>  linux-user/signal.c |    7 ---
>  1 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 0664770..b01bd64 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -1817,9 +1817,10 @@ struct target_sigcontext {
>  /* A Sparc stack frame */
>  struct sparc_stackf {
>         abi_ulong locals[8];
> -        abi_ulong ins[6];
> -        struct sparc_stackf *fp;
> -        abi_ulong callers_pc;
> +        abi_ulong ins[8];
> +        /* It's simpler to treat fp and callers_pc as elements of ins[]
> +         * since we never need to access them ourselves.
> +         */
>         char *structptr;
>         abi_ulong xargs[6];
>         abi_ulong xxargs[1];
> --
> 1.7.1
>
>

Re: [Qemu-devel] [PATCH 1/4] Add config-devices.h again

2011-02-01 Thread Stefan Weil


Am 01.02.2011 17:53, schrieb Eduardo Habkost:

This reverts part of commit a992fe3d0fc185112677286f7a02204d8245b61e.

We do have code that needs #ifdefs depending on the list of enabled devices,
but currently that code breaks when we try to disable a feature that is enabled
by default.

For example, if we try to disable CONFIG_VMWARE_VGA, we get the following:

LINK  x86_64-softmmu/qemu-system-x86_64
   pc.o: In function `pc_vga_init':
   /home/ehabkost/pessoal/proj/virt/qemu/qemu/hw/pc.c:991: undefined reference 
to `pci_vmsvga_init'
   collect2: ld returned 1 exit status
   make[1]: *** [qemu-system-x86_64] Error 1
   rm config-devices.h-timestamp
   make: *** [subdir-x86_64-softmmu] Error 2

config-devices.h will allow us to add an #ifdef to fix the above error, and
other similar cases.

Signed-off-by: Eduardo Habkost
---
  Makefile|7 +--
  Makefile.target |2 +-
  config.h|   11 +++
  3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 4e120a2..22b53f6 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
  # Makefile for QEMU.

-GENERATED_HEADERS = config-host.h trace.h qemu-options.def
+GENERATED_HEADERS = config-host.h trace.h qemu-options.def config-all-devices.h
  ifeq ($(TRACE_BACKEND),dtrace)
  GENERATED_HEADERS += trace-dtrace.h
  endif
@@ -77,6 +77,9 @@ config-host.h-timestamp: config-host.mak
  qemu-options.def: $(SRC_PATH)/qemu-options.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h<  $<  >  $@,"  GEN   $@")

+config-all-devices.h: config-all-devices.h-timestamp
+config-all-devices.h-timestamp: config-all-devices.mak
+
  SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS))

  subdir-%: $(GENERATED_HEADERS)
@@ -190,7 +193,7 @@ clean:

  distclean: clean
rm -f config-host.mak config-host.h* config-host.ld $(DOCS) 
qemu-options.texi qemu-img-cmds.texi qemu-monitor.texi
-   rm -f config-all-devices.mak
+   rm -f config-all-devices.mak config-all-devices.h*
rm -f roms/seabios/config.mak roms/vgabios/config.mak
rm -f qemu-doc.info qemu-doc.aux qemu-doc.cp qemu-doc.dvi qemu-doc.fn 
qemu-doc.info qemu-doc.ky qemu-doc.log qemu-doc.pdf qemu-doc.pg qemu-doc.toc 
qemu-doc.tp qemu-doc.vr
rm -f qemu-tech.info qemu-tech.aux qemu-tech.cp qemu-tech.dvi 
qemu-tech.fn qemu-tech.info qemu-tech.ky qemu-tech.log qemu-tech.pdf 
qemu-tech.pg qemu-tech.toc qemu-tech.tp qemu-tech.vr
diff --git a/Makefile.target b/Makefile.target
index 2800f47..03fc486 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -1,6 +1,6 @@
  # -*- Mode: makefile -*-

-GENERATED_HEADERS = config-target.h
+GENERATED_HEADERS = config-target.h config-devices.h
  CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)

  include ../config-host.mak
diff --git a/config.h b/config.h
index e20f786..07d79d4 100644
--- a/config.h
+++ b/config.h
@@ -1,2 +1,13 @@
+
  #include "config-host.h"
  #include "config-target.h"
+
+/* We want to include different config files for specific targets
+   And for the common library.  They need a different name because
+   we don't want to rely in paths */
   


on paths?


+
+#if defined(NEED_CPU_H)
+#include "config-devices.h"
+#else
+#include "config-all-devices.h"
+#endif

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Mike Frysinger

On Tue, Feb 1, 2011 at 12:30, Peter Maydell wrote:
> On 1 February 2011 17:20, Mike Frysinger wrote:
>> On Tue, Feb 1, 2011 at 05:31, Peter Maydell wrote:
>>> I suspect that this check of pc against the lbreg[]
>>> values should be being done in the generated code,
>>> not at translate time.
>>
>> the way i'm doing it atm i believe is safe.  if a lbreg changes, then
>> i invalidate any TBs associated with the old value and any TBs
>> associated with the new value.  thus i force the code to be
>> retranslated, and i can assume the lbreg values are constant when
>> doing so.
>
> That's OK too, that would fall into my category (3).

so the TB invalidation checking can be taken care of implicitly if i
handled things in cpu_get_tb_cpu_state() ?  that would be nice.

but i guess i'm not seeing how i would handle this scenario ... i want
to attach to each TB the state of the two 32bit lbregs when that TB
was created.  then in this state func, make sure the current lbregs
have the same values.  but if i need to encode this information into
"flags", i dont think i have enough space.
-mike

[Qemu-devel] [PATCH 7/7] ahci: work around bug with level interrupts

2011-02-01 Thread Alexander Graf

When using level based interrupts, the interrupt is treated the same as an
edge triggered one: leaving the line up does not retrigger the interrupt.

In fact, when not lowering the line, we won't ever get a new interrupt inside
the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
something on the device. This way we're sure the guest doesn't starve on
interrupts until someone fixes the actual interrupt path.

Signed-off-by: Alexander Graf 

---

v2 -> v3:

  - add comment about the interrupt hack
---
 hw/ide/ahci.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 98bdf70..95e1cf7 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -152,11 +152,15 @@ static void ahci_check_irq(AHCIState *s)
 }
 }
 
+/* XXX We lower the interrupt line here because of a bug with interrupt
+   handling in Qemu. When leaving a line up, the interrupt does
+   not get retriggered automatically currently. Once that bug is fixed,
+   this workaround is not necessary anymore and we only need to lower
+   in the else branch. */
+ahci_irq_lower(s, NULL);
 if (s->control_regs.irqstatus &&
 (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
 ahci_irq_raise(s, NULL);
-} else {
-ahci_irq_lower(s, NULL);
 }
 }
 
-- 
1.6.0.2

Re: [Qemu-devel] [PATCH 1/3] use nanoseconds everywhere for timeout computation

2011-02-01 Thread Aurelien Jarno

On Mon, Jan 31, 2011 at 10:51:17PM +0100, Paolo Bonzini wrote:
> Suggested by Aurelien Jarno.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  qemu-timer.c |   30 +++---
>  1 files changed, 15 insertions(+), 15 deletions(-)

Reviewed-by: Aurelien Jarno 

> diff --git a/qemu-timer.c b/qemu-timer.c
> index db1ec49..60283a8 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -197,8 +197,8 @@ static void qemu_rearm_alarm_timer(struct 
> qemu_alarm_timer *t)
>  t->rearm(t);
>  }
>  
> -/* TODO: MIN_TIMER_REARM_US should be optimized */
> -#define MIN_TIMER_REARM_US 250
> +/* TODO: MIN_TIMER_REARM_NS should be optimized */
> +#define MIN_TIMER_REARM_NS 25
>  
>  #ifdef _WIN32
>  
> @@ -698,11 +698,11 @@ int64_t qemu_next_deadline(void)
>  
>  if (active_timers[QEMU_CLOCK_VIRTUAL]) {
>  delta = active_timers[QEMU_CLOCK_VIRTUAL]->expire_time -
> - qemu_get_clock(vm_clock);
> + qemu_get_clock_ns(vm_clock);
>  }
>  if (active_timers[QEMU_CLOCK_HOST]) {
>  int64_t hdelta = active_timers[QEMU_CLOCK_HOST]->expire_time -
> - qemu_get_clock(host_clock);
> + qemu_get_clock_ns(host_clock);
>  if (hdelta < delta)
>  delta = hdelta;
>  }
> @@ -727,17 +727,17 @@ static uint64_t qemu_next_deadline_dyntick(void)
>  if (use_icount)
>  delta = INT32_MAX;
>  else
> -delta = (qemu_next_deadline() + 999) / 1000;
> +delta = qemu_next_deadline();
>  
>  if (active_timers[QEMU_CLOCK_REALTIME]) {
>  rtdelta = (active_timers[QEMU_CLOCK_REALTIME]->expire_time -
> - qemu_get_clock(rt_clock))*1000;
> + qemu_get_clock_ns(rt_clock));
>  if (rtdelta < delta)
>  delta = rtdelta;
>  }
>  
> -if (delta < MIN_TIMER_REARM_US)
> -delta = MIN_TIMER_REARM_US;
> +if (delta < MIN_TIMER_REARM_NS)
> +delta = MIN_TIMER_REARM_NS;
>  
>  return delta;
>  }
> @@ -887,8 +887,8 @@ static void dynticks_rearm_timer(struct qemu_alarm_timer 
> *t)
>  {
>  timer_t host_timer = (timer_t)(long)t->priv;
>  struct itimerspec timeout;
> -int64_t nearest_delta_us = INT64_MAX;
> -int64_t current_us;
> +int64_t nearest_delta_ns = INT64_MAX;
> +int64_t current_ns;
>  
>  assert(alarm_has_dynticks(t));
>  if (!active_timers[QEMU_CLOCK_REALTIME] &&
> @@ -896,7 +896,7 @@ static void dynticks_rearm_timer(struct qemu_alarm_timer 
> *t)
>  !active_timers[QEMU_CLOCK_HOST])
>  return;
>  
> -nearest_delta_us = qemu_next_deadline_dyntick();
> +nearest_delta_ns = qemu_next_deadline_dyntick();
>  
>  /* check whether a timer is already running */
>  if (timer_gettime(host_timer, &timeout)) {
> @@ -904,14 +904,14 @@ static void dynticks_rearm_timer(struct 
> qemu_alarm_timer *t)
>  fprintf(stderr, "Internal timer error: aborting\n");
>  exit(1);
>  }
> -current_us = timeout.it_value.tv_sec * 100 + 
> timeout.it_value.tv_nsec/1000;
> -if (current_us && current_us <= nearest_delta_us)
> +current_ns = timeout.it_value.tv_sec * 10LL + 
> timeout.it_value.tv_nsec;
> +if (current_ns && current_ns <= nearest_delta_ns)
>  return;
>  
>  timeout.it_interval.tv_sec = 0;
>  timeout.it_interval.tv_nsec = 0; /* 0 for one-shot timer */
> -timeout.it_value.tv_sec =  nearest_delta_us / 100;
> -timeout.it_value.tv_nsec = (nearest_delta_us % 100) * 1000;
> +timeout.it_value.tv_sec =  nearest_delta_ns / 10;
> +timeout.it_value.tv_nsec = nearest_delta_ns % 10;
>  if (timer_settime(host_timer, 0 /* RELATIVE */, &timeout, NULL)) {
>  perror("settime");
>  fprintf(stderr, "Internal timer error: aborting\n");
> -- 
> 1.7.3.4
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 0/4] new Blackfin QEMU port

2011-02-01 Thread Peter Maydell

On 1 February 2011 18:16, Mike Frysinger  wrote:
> On Tue, Feb 1, 2011 at 12:30, Peter Maydell wrote:
>> That's OK too, that would fall into my category (3).
>
> so the TB invalidation checking can be taken care of implicitly if i
> handled things in cpu_get_tb_cpu_state() ?  that would be nice.

It doesn't invalidate the TB, it just means that qemu can have
two different TBs for the same address and distinguish between
them. So in cpu_get_tb_cpu_state() you encode the relevant bits
of env into the flags, and then in translate.c, instead of looking
at env members, you unpack tb->flags (usually into a disas context
struct) and look at the results of your unpacking.

> but i guess i'm not seeing how i would handle this scenario ... i want
> to attach to each TB the state of the two 32bit lbregs when that TB
> was created.  then in this state func, make sure the current lbregs
> have the same values.  but if i need to encode this information into
> "flags", i dont think i have enough space.

That's right -- you only have a total of 64 bits (you can use flags
and also cs_base since you're not a PC with a weird addressing
setup), so you have to be stingy about what you put in there.
Also if you're likely to repeatedly execute the same bit of code
with different values of lbregs you probably don't want to put them
in tb_flags anyway, because you'd end up repeatedly translating
the code and holding lots of nearly-identical TBs for it.

-- PMM

Re: [Qemu-devel] [PATCH 1/3] use nanoseconds everywhere for timeout computation

2011-02-01 Thread Aurelien Jarno

On Mon, Jan 31, 2011 at 04:17:52PM -0600, Anthony Liguori wrote:
> On 01/31/2011 03:51 PM, Paolo Bonzini wrote:
>> Suggested by Aurelien Jarno.
>>
>> Signed-off-by: Paolo Bonzini
>>
>
> Something I've found is that we have a lot of bugs that are the result  
> of unit conversions when the unit can't be mapped directly to base 10.
>
> This happens in both the PIT and RTC and probably every other fixed bus  
> speed based clock we support.
>
> I think it would be better to have a Unit argument to qemu_get_clock to  
> specify the desired units.  That way, we could request NS as the time  
> unit or something like PC_FREQ0 which would map to the bus speed of the 
> PIT.

It's more or less what is already done with the two versions of the
functions, qemu_get_clock() which can return different unit depending on
what you ask (and in my opinion should simply disappear because it's the
best way to have bugs), and qemu_get_clock_ns() that always return it in
nanoseconds. What you suggests is a generalization of this second
function to different units than ns.

That's an option, but we should pay attention that given we work in
integer, a conversion decrease the precision, so it should usually be 
done at the last moment to keep the precision correct (assuming ns is
precise enough, which I guess is true).

In any case I think this patch series should be applied as it fixes a
real bug. It's already a step in the right direction, as it removes
useless conversions, as all the involved functions use ns in fine.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH 1/4] Add config-devices.h again

2011-02-01 Thread Eduardo Habkost

This reverts part of commit a992fe3d0fc185112677286f7a02204d8245b61e.

We do have code that needs #ifdefs depending on the list of enabled devices,
but currently that code breaks when we try to disable a feature that is enabled
by default.

For example, if we try to disable CONFIG_VMWARE_VGA, we get the following:

   LINK  x86_64-softmmu/qemu-system-x86_64
  pc.o: In function `pc_vga_init':
  /home/ehabkost/pessoal/proj/virt/qemu/qemu/hw/pc.c:991: undefined reference 
to `pci_vmsvga_init'
  collect2: ld returned 1 exit status
  make[1]: *** [qemu-system-x86_64] Error 1
  rm config-devices.h-timestamp
  make: *** [subdir-x86_64-softmmu] Error 2

config-devices.h will allow us to add an #ifdef to fix the above error, and
other similar cases.

Signed-off-by: Eduardo Habkost 
---
 Makefile|7 +--
 Makefile.target |2 +-
 config.h|   11 +++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 4e120a2..22b53f6 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 # Makefile for QEMU.
 
-GENERATED_HEADERS = config-host.h trace.h qemu-options.def
+GENERATED_HEADERS = config-host.h trace.h qemu-options.def config-all-devices.h
 ifeq ($(TRACE_BACKEND),dtrace)
 GENERATED_HEADERS += trace-dtrace.h
 endif
@@ -77,6 +77,9 @@ config-host.h-timestamp: config-host.mak
 qemu-options.def: $(SRC_PATH)/qemu-options.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $@")
 
+config-all-devices.h: config-all-devices.h-timestamp
+config-all-devices.h-timestamp: config-all-devices.mak
+
 SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS))
 
 subdir-%: $(GENERATED_HEADERS)
@@ -190,7 +193,7 @@ clean:
 
 distclean: clean
rm -f config-host.mak config-host.h* config-host.ld $(DOCS) 
qemu-options.texi qemu-img-cmds.texi qemu-monitor.texi
-   rm -f config-all-devices.mak
+   rm -f config-all-devices.mak config-all-devices.h*
rm -f roms/seabios/config.mak roms/vgabios/config.mak
rm -f qemu-doc.info qemu-doc.aux qemu-doc.cp qemu-doc.dvi qemu-doc.fn 
qemu-doc.info qemu-doc.ky qemu-doc.log qemu-doc.pdf qemu-doc.pg qemu-doc.toc 
qemu-doc.tp qemu-doc.vr
rm -f qemu-tech.info qemu-tech.aux qemu-tech.cp qemu-tech.dvi 
qemu-tech.fn qemu-tech.info qemu-tech.ky qemu-tech.log qemu-tech.pdf 
qemu-tech.pg qemu-tech.toc qemu-tech.tp qemu-tech.vr
diff --git a/Makefile.target b/Makefile.target
index 2800f47..03fc486 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -1,6 +1,6 @@
 # -*- Mode: makefile -*-
 
-GENERATED_HEADERS = config-target.h
+GENERATED_HEADERS = config-target.h config-devices.h
 CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
 
 include ../config-host.mak
diff --git a/config.h b/config.h
index e20f786..07d79d4 100644
--- a/config.h
+++ b/config.h
@@ -1,2 +1,13 @@
+
 #include "config-host.h"
 #include "config-target.h"
+
+/* We want to include different config files for specific targets
+   And for the common library.  They need a different name because
+   we don't want to rely in paths */
+
+#if defined(NEED_CPU_H)
+#include "config-devices.h"
+#else
+#include "config-all-devices.h"
+#endif
-- 
1.7.3.2

Re: [Qemu-devel] [PATCH 0/4] fix/add CONFIG_* options for VMWare device emulation

2011-02-01 Thread Blue Swirl

On Tue, Feb 1, 2011 at 4:53 PM, Eduardo Habkost  wrote:
> Hi,
>
> This series makes CONFIG_VMWARE_VGA actually work (today we can't disable the
> option without getting a build error).
>
> It also add two new options: CONFIG_VMMOUSE and CONFIG_VMPORT, for vmmouse.o
> and vmport.o.

Nack, see the list archives for the discussion.

One way to solve this which would preserve the device model would be
to add stub devices. For example, hw/vmmouse-stub.c would be:
void *vmmouse_init(void *m)
{
return NULL;
}

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

2011-02-01 Thread Michael Roth


On 02/01/2011 08:41 AM, Stefan Hajnoczi wrote:

On Tue, Feb 1, 2011 at 2:36 PM, Jes Sorensen  wrote:

On 02/01/11 15:34, Stefan Hajnoczi wrote:

On Tue, Feb 1, 2011 at 2:26 PM, Jes Sorensen  wrote:

I have to admit you lost me here, where do you get that 500ms time from?
Is that the XMLRPC polling time or? I just used the example code from
other agent calls.


500 ms is made up.  I was thinking, "what would a reasonable polling
interval be?" and picked a sub-second number.

Can you explain how the timeout in fsfreeze can happen?  It's probably
because I don't know the virtagent details.


Ah ok.

 From what I understand, the XMLRPC code is setup to timeout if the guest
doesn't reply within a certain amount of time. In that case, the caller
needs to poll to wait for the guest to complete the freeze. This really
should only happen if you have a guest with a large number of very large
file systems. I don't know how likely it is to happen in real life.


Perhaps Michael can confirm that the freeze function continues to
execute after timeout but the client is able to send fsstatus()
requests?


Ahh, yeah there's the confusion: we only execute one RPC at a time, so a 
polling function for a previous RPC won't work unless that RPC is being 
done concurrently, via fork()ing or something and communicating status 
via some method of IPC.


I touched on possible approaches to dealing with this in the response I 
just sent to this patch.




Stefan

[Qemu-devel] [PATCH] make tsc stable over migration and machine start

2011-02-01 Thread Glauber Costa

If the machine is stopped, we should not record two different tsc values
upon a save operation. The same problem happens with kvmclock.

But kvmclock is taking a different diretion, being now seen as a separate
device. Since this is unlikely to happen with the tsc, I am taking the
approach here of simply registering a handler for state change, and
using a per-CPUState variable that prevents double updates for the TSC.

Signed-off-by: Glauber Costa 
---
 target-i386/cpu.h |1 +
 target-i386/kvm.c |   19 ++-
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 6d619e8..7f1c4f8 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -732,6 +732,7 @@ typedef struct CPUX86State {
 uint32_t sipi_vector;
 uint32_t cpuid_kvm_features;
 uint32_t cpuid_svm_features;
+uint8_t  update_tsc;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ecb8405..c3925be 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -302,6 +302,16 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t 
status,
 
 static int _kvm_arch_init_vcpu(CPUState *env);
 
+static void cpu_update_state(void *opaque, int running, int reason)
+{
+CPUState *env = opaque;
+
+if (!running) {
+env->update_tsc = 1;
+}
+}
+
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
 int r;
@@ -444,6 +454,8 @@ int kvm_arch_init_vcpu(CPUState *env)
 }
 #endif
 
+qemu_add_vm_change_state_handler(cpu_update_state, env);
+
 return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data);
 }
 
@@ -1093,7 +1105,12 @@ static int kvm_get_msrs(CPUState *env)
msrs[n++].index = MSR_STAR;
 if (kvm_has_msr_hsave_pa(env))
 msrs[n++].index = MSR_VM_HSAVE_PA;
-msrs[n++].index = MSR_IA32_TSC;
+
+if (env->update_tsc) {
+msrs[n++].index = MSR_IA32_TSC;
+env->update_tsc = 0;
+}
+
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 msrs[n++].index = MSR_CSTAR;
-- 
1.7.2.3

[Qemu-devel] Re: KVM call minutes for Feb 1

2011-02-01 Thread Jan Kiszka

On 2011-02-01 18:20, Anthony Liguori wrote:
> On 02/01/2011 11:03 AM, Jan Kiszka wrote:
>> On 2011-02-01 17:53, Anthony Liguori wrote:
>>
>>> On 02/01/2011 10:36 AM, Jan Kiszka wrote:
>>>  
 On 2011-02-01 16:54, Chris Wright wrote:


> KVM upstream merge: status, plans, coordination
> - Jan has a git tree, consolidating
> - qemu-kvm io threading is still an issue
> - Anthony wants to just merge
> - concerns with non-x86 arch and merge
> - concerns with big-bang patch merge and following stability
> - post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
> a problem if it's not there by then
> - testing and nuances are still an issue (e.g. stefan berger's mmio read 
> issue)
> - qemu-kvm still evolving, needs to get sync'd or it will keep diverging
> - 2 implementations of main init, cpu init, Jan has merged them into one
> - qemu-kvm-x86.c file that's only a few hundred lines
> - review as one patch to see the fundamental difference
>
>  
 More precisely, my current work flow is to pick some function(s), e.g.
 kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
 upstream so that qemu-kvm could use that implementation?". If they
 differ, the reasons need to be understood and patched away, either by
 fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
 changes are merged back, a qemu-kvm patch is posted to switch to that
 version.

 Any help will be welcome, either via review of my subtle regressions or
 on resolving concrete differences.

 E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
 of its own iothread code, can we wrap that away or do we need to
 consolidate the threading code first? Or do we need to fix something in
 upstream?


>>> I bet it's the eventfd thing.  It's arbitrary.  If you've got a small
>>> diff post your series, I'd be happy to take a look at it and see what I
>>> can explain.
>>>
>>>  
>> Looks like it's around signalfd and its emulation:
>>
> 
> I really meant the compatfd thing.
> 
> signalfd can't really be emulated properly so in upstream we switched to 
> a pipe() which Avi didn't like.
> 
> But with glib, this all goes away anyway so we should just drop the 
> qemu-kvm changes and use the upstream version.  Once we enable I/O 
> thread in qemu.git, we no longer need to use signals for I/O completion 
> which I think everyone would agree is a better solution.

Don't understand: If we do not need SIGIO for AIO emulation in threaded
mode, why wasn't that stubbed out already? If that helps reducing
worries about the signalfd emulation (which is likely a non-issue anyway
as anyone with serious workload should run a kernel with such support).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: KVM call minutes for Feb 1

2011-02-01 Thread Anthony Liguori


On 02/01/2011 10:36 AM, Jan Kiszka wrote:

On 2011-02-01 16:54, Chris Wright wrote:
   

KVM upstream merge: status, plans, coordination
- Jan has a git tree, consolidating
- qemu-kvm io threading is still an issue
- Anthony wants to just merge
   - concerns with non-x86 arch and merge
   - concerns with big-bang patch merge and following stability
- post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
   a problem if it's not there by then
- testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
- qemu-kvm still evolving, needs to get sync'd or it will keep diverging
- 2 implementations of main init, cpu init, Jan has merged them into one
   - qemu-kvm-x86.c file that's only a few hundred lines
- review as one patch to see the fundamental difference
 

More precisely, my current work flow is to pick some function(s), e.g.
kvm_cpu_exec/kvm_run, and start wondering "What needs to be done to
upstream so that qemu-kvm could use that implementation?". If they
differ, the reasons need to be understood and patched away, either by
fixing/enhancing upstream or simplifying qemu-kvm. Once the upstream
changes are merged back, a qemu-kvm patch is posted to switch to that
version.

Any help will be welcome, either via review of my subtle regressions or
on resolving concrete differences.

E.g. posix-aio-compat.c: Why does qemu-kvm differ here? If it's because
of its own iothread code, can we wrap that away or do we need to
consolidate the threading code first? Or do we need to fix something in
upstream?
   


I bet it's the eventfd thing.  It's arbitrary.  If you've got a small 
diff post your series, I'd be happy to take a look at it and see what I 
can explain.


Regards,

Anthony Liguori


Jan

1 2 >

1 - 100 of 148 matches

Mail list logo