On Fri, Apr 27, 2007 at 06:53:40PM +0200, Arnd Bergmann wrote: > On Friday 27 April 2007, Carsten Otte wrote: > > Add interface which allows a process to start a virtual machine. > > Looks pretty good to me already. > > It seems a lot closer to lguest than to kvm at the moment concerning > the kernel interfaces (guest real address == host process, state > is attached to the thread_info, ...).
Guess I should take a look at lguest then :) > > +static DEFINE_MUTEX(s390host_init_mutex); > > + > > +static void s390host_get_data(struct s390host_data *data) > > +{ > > + atomic_inc(&data->count); > > +} > > + > > +void s390host_put_data(struct s390host_data *data) > > +{ > > + int cpu; > > + > > + if (atomic_dec_return(&data->count)) > > + return; > > These should probably use a struct kref instead of an atomic_t > to count the references. That makes sense. > > > +static unsigned long > > +s390host_create_io_area(unsigned long addr, unsigned long flags, > > + unsigned long io_addr, struct s390host_data *data) > > +{ [...] > > + down_write(&mm->mmap_sem); > > + ret = insert_vm_struct(mm, vma); > > + if (ret) { > > + kmem_cache_free(vm_area_cachep, vma); > > + goto out; > > + } > > + s390host_get_data(data); > > + mm->total_vm += 2; > > + vm_insert_page(vma, addr, virt_to_page(io_addr)); > > + > > + ret = split_vma(mm, vma, addr + PAGE_SIZE, 0); > > + if (ret) > > + goto out; > > This open-coded mmap looks a bit scary. I can't see anything wrong in it, > but it may conflict with a user application that tries to do other strange > things with memory maps. Well, there is at least already one bug in it. If the call to split_vma fails there is no cleanup of the vm_insert_page that has already happened. > Maybe you can either make this a real call to mmap(), or have it automatically > mapped using the vdso mechanism. So, yes we need to find something better, since this is the code that I don't like at all. > > + __u16 cpu; > > __u16 is a type for user space interfaces. In the kernel, use u16. Yes. > > + if (current_thread_info()->sie_cpu != -1) > > + return -EINVAL; > > -EBUSY? Yes. > > + if (copy_from_user(&cpu, &sie_template->icpua, sizeof(u16))) > > + return -EFAULT; > > get_user()? Yes. > > + sca_block = data->sca_block; > > + if (sca_block->mcn & (1UL << (S390HOST_MAX_CPUS - 1 - cpu))) { > > __test_bit()? Yes, but I think we should probably add some function to access the mcn bits. Each access looks a bit complicated. > > + ret = -EINVAL; > > -EBUSY? Yes. > > +int sys_s390host_remove_cpu(void) > > +{ > > + struct sca_block *sca_block; > > + int cpu; > > + > > + cpu = current_thread_info()->sie_cpu; > > + if (cpu == -1) > > + return -EINVAL; > > + > > + mutex_lock(&s390host_init_mutex); > > + sca_block = current_thread_info()->s390host_data->sca_block; > > + sca_block->mcn &= ~(1UL << (S390HOST_MAX_CPUS - 1 - cpu)); > > + current_thread_info()->sie_cpu = -1; > > + mutex_unlock(&s390host_init_mutex); > > + return 0; > > +} > > Shouldn't this free the resources that were allocated by add_cpu? Yes, it was supposed to do that. But since it was only a prototype I was too lazy to code it up and so I decided to free things on exit. > If not, this syscall seems pretty useless and I'd simply not > do it at all -- just wait until exit() to clean up. This reveals that we are not clearing the mcn bit on exit without pior call to this syscall. BUG(). > > + if (signal_pending(current)) { > > + ret = -EINTR; > > + goto out; > > + } > > Can't you handle interrupted system calls? I think it would be nicer > to return -ERESTARTSYS here so you don't have to go back to your > user space after a signal. That should work and it is better. > > +struct sie_kernel { > > + struct sie_block sie_block; > > + s390_fp_regs host_fpregs; > > + int host_acrs[NUM_ACRS]; > > +} __attribute__((packed,aligned(4096))); > > Why packed? Not needed. > > +#define SIE_UPDATE_PSW (1UL << 0) > > +#define SIE_FLUSH_TLB (1UL << 1) > > +#define SIE_ISKE (1UL << 2) > > +#define SIE_SSKE (1UL << 3) > > +#define SIE_BLOCK_UPDATE (1UL << 4) > > +#define SIE_VSMXM_LOCAL_UPDATE (1UL << 5) > > +#define SIE_VSMXM_DIST_UPDATE (1UL << 6) > > I find it much more readable to define these like > > #define SIE_UPDATE_PSW 1 > #define SIE_FLUSH_TLB 2 > #define SIE_ISKE 4 > > especially when trying to interpret values in memory. Hmm... don't know. Both make sense, but I have no preference on this. > > +struct sie_user { > > + struct sie_block sie_block; > > + psw_t psw; > > + unsigned long gprs[NUM_GPRS]; > > + s390_fp_regs guest_fpregs; > > + int guest_acrs[NUM_ACRS]; > > + struct sie_skey_parm iske_parm; > > + struct sie_skey_parm sske_parm; > > + int vsmxm_or_local; > > + int vsmxm_and_local; > > + int vsmxm_or; > > + int vsmxm_and; > > + int vsmxm_cpuid; > > +} __attribute__((packed,aligned(4096))); > > Is this data structure extensible? If it is, you probably need > some sort of versioning information to make sure that user space > doesn't rely on fields that the kernel doesn't know about. I don't think we can put in some versioning information here. If the kernel decides to increase the version then old userspace code would break? We rather need some mechanism so userpace can ask the kernel "do you support feature x?" and dependent on the answer some fields are used or unused. > > +struct sca_entry { > > + atomic_t scn; > > + __u64 reserved; > > + __u64 sda; > > + __u64 reserved2[2]; > > +}__attribute__((packed)); > > Is this a hardware data structure? If not, you shouldn't pack it > because the first word is only 32 bits and consequently all following > ones are unaligned. It's a hardware structure. Guess the unaligned u64s were generated when this structure was extended to support 64bit. But of course we shouldn't put an atomic_t in a hardware structure, since it's implementation might change. > > +/* function definitions */ > > +extern int sie64a(struct sie_kernel *); > > +extern void s390host_put_data(struct s390host_data *); > > These should be inside of '#ifdef __KERNEL__', like all declarations that user > space doesn't need to see. Yes, of course. > > > @@ -52,6 +53,8 @@ struct thread_info { > > unsigned int cpu; /* current CPU */ > > int preempt_count; /* 0 => preemptable, <0 => BUG > > */ > > struct restart_block restart_block; > > + struct s390host_data *s390host_data; /* s390host data */ > > + int sie_cpu; /* sie cpu number */ > > }; > > What happens to these variables on execve? They stay... We probably should do some cleanup here e.g. by adding some code to init_new_context(). Thanks for reviewing! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel