Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Jan Kiszka
On 2012-06-13 14:33, Alexander Graf wrote:
> On 06/13/2012 01:41 PM, Jan Kiszka wrote:
>> On 2012-06-13 13:27, Christian Borntraeger wrote:
>>> On 13/06/12 12:58, Jan Kiszka wrote:
> Thinking about this a bit more, how about
>
> } else if (!kvm_arch_vmalloc(size,&new_block->host)) {
> 
> }
>
>>> I like that. Of course, we have to have a generic kvm_arch_vmalloc 
>>> implementation
>>> then.
>> Then better go for kvm_vmalloc calling kvm_arch_vmalloc (in the s390 case).
>>
>> However, I do not like the variation of parameters and return value
>> compared to normal *alloc. Better:
>>
>> memory = kvm_vmalloc(size);
>> if (!memory)
>>  memory = qemu_vmalloc(size);
>>
>> But more regular (when looking at the Xen block) is guarding the call
>> with kvm_enabled() and embedding qemu_vmalloc in kvm_vmalloc.
> 
> So basically
> 
> #ifdef CONFIG_TARGET_S390X
>} else if (kvm_enabled()) {
>  memory = kvm_vmalloc();
>} else {
> #endif
> 
> or a generic
> 
> } else if (kvm_enabled()) {
>memory = kvm_vmalloc();
> } else {
> 
> ? Because that one would mean we always duplicate the common 
> qemu_vmalloc case. But then again, that one's only a single call, so 
> maybe it's ok. Meh - I'll let you decide :).

I'm fine with the #ifdef-free version if providing some comment which
arch requires this special path.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Alexander Graf

On 06/13/2012 01:41 PM, Jan Kiszka wrote:

On 2012-06-13 13:27, Christian Borntraeger wrote:

On 13/06/12 12:58, Jan Kiszka wrote:

Thinking about this a bit more, how about

} else if (!kvm_arch_vmalloc(size,&new_block->host)) {

}


I like that. Of course, we have to have a generic kvm_arch_vmalloc 
implementation
then.

Then better go for kvm_vmalloc calling kvm_arch_vmalloc (in the s390 case).

However, I do not like the variation of parameters and return value
compared to normal *alloc. Better:

memory = kvm_vmalloc(size);
if (!memory)
memory = qemu_vmalloc(size);

But more regular (when looking at the Xen block) is guarding the call
with kvm_enabled() and embedding qemu_vmalloc in kvm_vmalloc.


So basically

#ifdef CONFIG_TARGET_S390X
  } else if (kvm_enabled()) {
memory = kvm_vmalloc();
  } else {
#endif

or a generic

} else if (kvm_enabled()) {
  memory = kvm_vmalloc();
} else {

? Because that one would mean we always duplicate the common 
qemu_vmalloc case. But then again, that one's only a single call, so 
maybe it's ok. Meh - I'll let you decide :).



Alex




Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Jan Kiszka
On 2012-06-13 13:27, Christian Borntraeger wrote:
> On 13/06/12 12:58, Jan Kiszka wrote:
>>> Thinking about this a bit more, how about
>>>
>>> } else if (!kvm_arch_vmalloc(size, &new_block->host)) {
>>> 
>>> }
>>>
> 
> I like that. Of course, we have to have a generic kvm_arch_vmalloc 
> implementation
> then.

Then better go for kvm_vmalloc calling kvm_arch_vmalloc (in the s390 case).

However, I do not like the variation of parameters and return value
compared to normal *alloc. Better:

memory = kvm_vmalloc(size);
if (!memory)
memory = qemu_vmalloc(size);

But more regular (when looking at the Xen block) is guarding the call
with kvm_enabled() and embedding qemu_vmalloc in kvm_vmalloc.

> 
>>> Then the arch specific code could do the check and the implementation of 
>>> vmalloc, but only has to return -1 if we don't need it and things still 
>>> fall back to the generic code.
>>
>> But you would have to walk a while to find out that only s390x on (old)
>> KVM actually returns success here and does some allocation.
> 
> 
> It that such a problem? What about adding a comment then, otherwise we just
> use ifdef as a comment, which isnt nice either.

Any kind of comment is definitely a good idea.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Christian Borntraeger
On 13/06/12 12:58, Jan Kiszka wrote:
>> Thinking about this a bit more, how about
>>
>> } else if (!kvm_arch_vmalloc(size, &new_block->host)) {
>> 
>> }
>>

I like that. Of course, we have to have a generic kvm_arch_vmalloc 
implementation
then.

>> Then the arch specific code could do the check and the implementation of 
>> vmalloc, but only has to return -1 if we don't need it and things still 
>> fall back to the generic code.
> 
> But you would have to walk a while to find out that only s390x on (old)
> KVM actually returns success here and does some allocation.


It that such a problem? What about adding a comment then, otherwise we just
use ifdef as a comment, which isnt nice either.

Christian




Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Jan Kiszka
On 2012-06-13 12:54, Alexander Graf wrote:
> On 06/13/2012 12:30 PM, Jan Kiszka wrote:
>> On 2012-06-12 14:12, Alexander Graf wrote:
>>> On 06/12/2012 02:02 PM, Christian Borntraeger wrote:
 On 12/06/12 13:57, Alexander Graf wrote:
> Since it lives in an s390 specific branch, the function name should 
> probably be called s390 specific. If we ever need another architecture to 
> have a kvm specific ram allocator, we can make it generic when that time 
> comes. Until then, let's treat s390 as the oddball it is :).
>
> Apart from that, this approach looks a lot nicer, yes.
 But then I have to have a *s390* function declared in kvm.h and your other 
 comment
 hits me. You got me in a trap here, heh? ;-)
>>> Ah, I see what you mean. I was thinking of having a
>>> target-s390x/kvm_s390x.h or so. Then we could add the function
>>> definition there and have everything nicely contained within
>>> target-s390x only.
>>>
>>> Jan, which approach would you think is cleaner? Make this a generic
>>> kvm_arch callback or introduce a special kvm_s390x.h header which would
>>> then have to be explicitly included in exec.c?
>> Maybe somethings like
>>
>> #ifdef __s390__
>>  else if (kvm_enabled())
>>  new_block->host = kvm_arch_vmalloc(size)
>> #endif
>>
>> ? But I have no definitive opinion yet. I think that
>>
>>   - the changes to generic code should make clear that it's an s390+kvm
>> specialty
>>   - actual work should be done in target-s390/kvm.c (e.g. avoid
>> legacy_s390_alloc)
> 
> Thinking about this a bit more, how about
> 
> } else if (!kvm_arch_vmalloc(size, &new_block->host)) {
> 
> }
> 
> Then the arch specific code could do the check and the implementation of 
> vmalloc, but only has to return -1 if we don't need it and things still 
> fall back to the generic code.

But you would have to walk a while to find out that only s390x on (old)
KVM actually returns success here and does some allocation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Alexander Graf

On 06/13/2012 12:30 PM, Jan Kiszka wrote:

On 2012-06-12 14:12, Alexander Graf wrote:

On 06/12/2012 02:02 PM, Christian Borntraeger wrote:

On 12/06/12 13:57, Alexander Graf wrote:

Since it lives in an s390 specific branch, the function name should probably be 
called s390 specific. If we ever need another architecture to have a kvm 
specific ram allocator, we can make it generic when that time comes. Until 
then, let's treat s390 as the oddball it is :).

Apart from that, this approach looks a lot nicer, yes.

But then I have to have a *s390* function declared in kvm.h and your other 
comment
hits me. You got me in a trap here, heh? ;-)

Ah, I see what you mean. I was thinking of having a
target-s390x/kvm_s390x.h or so. Then we could add the function
definition there and have everything nicely contained within
target-s390x only.

Jan, which approach would you think is cleaner? Make this a generic
kvm_arch callback or introduce a special kvm_s390x.h header which would
then have to be explicitly included in exec.c?

Maybe somethings like

#ifdef __s390__
 else if (kvm_enabled())
 new_block->host = kvm_arch_vmalloc(size)
#endif

? But I have no definitive opinion yet. I think that

  - the changes to generic code should make clear that it's an s390+kvm
specialty
  - actual work should be done in target-s390/kvm.c (e.g. avoid
legacy_s390_alloc)


Thinking about this a bit more, how about

} else if (!kvm_arch_vmalloc(size, &new_block->host)) {

}

Then the arch specific code could do the check and the implementation of 
vmalloc, but only has to return -1 if we don't need it and things still 
fall back to the generic code.



Alex




Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-13 Thread Jan Kiszka
On 2012-06-12 14:12, Alexander Graf wrote:
> On 06/12/2012 02:02 PM, Christian Borntraeger wrote:
>> On 12/06/12 13:57, Alexander Graf wrote:
>>> Since it lives in an s390 specific branch, the function name should 
>>> probably be called s390 specific. If we ever need another architecture to 
>>> have a kvm specific ram allocator, we can make it generic when that time 
>>> comes. Until then, let's treat s390 as the oddball it is :).
>>>
>>> Apart from that, this approach looks a lot nicer, yes.
>> But then I have to have a *s390* function declared in kvm.h and your other 
>> comment
>> hits me. You got me in a trap here, heh? ;-)
> 
> Ah, I see what you mean. I was thinking of having a 
> target-s390x/kvm_s390x.h or so. Then we could add the function 
> definition there and have everything nicely contained within 
> target-s390x only.
> 
> Jan, which approach would you think is cleaner? Make this a generic 
> kvm_arch callback or introduce a special kvm_s390x.h header which would 
> then have to be explicitly included in exec.c?

Maybe somethings like

#ifdef __s390__
else if (kvm_enabled())
new_block->host = kvm_arch_vmalloc(size)
#endif

? But I have no definitive opinion yet. I think that

 - the changes to generic code should make clear that it's an s390+kvm
   specialty
 - actual work should be done in target-s390/kvm.c (e.g. avoid
   legacy_s390_alloc)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-12 Thread Alexander Graf

On 06/12/2012 02:02 PM, Christian Borntraeger wrote:

On 12/06/12 13:57, Alexander Graf wrote:

Since it lives in an s390 specific branch, the function name should probably be 
called s390 specific. If we ever need another architecture to have a kvm 
specific ram allocator, we can make it generic when that time comes. Until 
then, let's treat s390 as the oddball it is :).

Apart from that, this approach looks a lot nicer, yes.

But then I have to have a *s390* function declared in kvm.h and your other 
comment
hits me. You got me in a trap here, heh? ;-)


Ah, I see what you mean. I was thinking of having a 
target-s390x/kvm_s390x.h or so. Then we could add the function 
definition there and have everything nicely contained within 
target-s390x only.


Jan, which approach would you think is cleaner? Make this a generic 
kvm_arch callback or introduce a special kvm_s390x.h header which would 
then have to be explicitly included in exec.c?



Alex




Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-12 Thread Christian Borntraeger
On 12/06/12 13:57, Alexander Graf wrote:
> Since it lives in an s390 specific branch, the function name should probably 
> be called s390 specific. If we ever need another architecture to have a kvm 
> specific ram allocator, we can make it generic when that time comes. Until 
> then, let's treat s390 as the oddball it is :).
> 
> Apart from that, this approach looks a lot nicer, yes.

But then I have to have a *s390* function declared in kvm.h and your other 
comment
hits me. You got me in a trap here, heh? ;-)






Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-12 Thread Alexander Graf

On 06/12/2012 01:20 PM, Christian Borntraeger wrote:

Is there any way we can move the above code to target-s390x? Having the
branch below is already invasive enough for generic code, but we really
don't need all the special s390 quirks to live here.


Hmm, we have to have a special hook somehow.
What about this approach?

---

By default qemu will use MAP_PRIVATE for guest pages. This will write
protect pages and thus break on s390 systems that dont support this feature.
Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
has other problems (no dirty pages tracking, a lot more swap overhead etc.)
Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
qemu can use the standard qemu alloc if available, otherwise it will use
the old s390 hack.

Signed-off-by: Christian Borntraeger
---
  exec.c |   19 ---
  kvm.h  |1 +
  oslib-posix.c  |3 +++
  target-s390x/kvm.c |   34 ++
  4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index a587e7a..9b9b8e1 100644
--- a/exec.c
+++ b/exec.c
@@ -2645,26 +2645,15 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
  exit(1);
  #endif
  } else {
-#if defined(TARGET_S390X)&&  defined(CONFIG_KVM)
-/* S390 KVM requires the topmost vma of the RAM to be smaller than
-   an system defined value, which is at least 256GB. Larger systems
-   have larger values. We put the guest between the end of data
-   segment (system break) and this value. We use 32GB as a base to
-   have enough room for the system break to grow. */
-new_block->host = mmap((void*)0x8, size,
-   PROT_EXEC|PROT_READ|PROT_WRITE,
-   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 
0);
-if (new_block->host == MAP_FAILED) {
-fprintf(stderr, "Allocating RAM failed\n");
-abort();
-}
-#else
  if (xen_enabled()) {
  xen_ram_alloc(new_block->offset, size, mr);
+#if defined(TARGET_S390X)
+} else if (kvm_enabled()) {
+new_block->host = kvm_arch_alloc(size);


Since it lives in an s390 specific branch, the function name should 
probably be called s390 specific. If we ever need another architecture 
to have a kvm specific ram allocator, we can make it generic when that 
time comes. Until then, let's treat s390 as the oddball it is :).


Apart from that, this approach looks a lot nicer, yes.

Alex


+#endif
  } else {
  new_block->host = qemu_vmalloc(size);
  }
-#endif
  qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
  }
  }
diff --git a/kvm.h b/kvm.h
index 9c7b0ea..9d50016 100644
--- a/kvm.h
+++ b/kvm.h
@@ -102,6 +102,7 @@ int kvm_vcpu_ioctl(CPUArchState *env, int type, ...);

  extern const KVMCapabilityInfo kvm_arch_required_capabilities[];

+void *kvm_arch_alloc(ram_addr_t size);
  void kvm_arch_pre_run(CPUArchState *env, struct kvm_run *run);
  void kvm_arch_post_run(CPUArchState *env, struct kvm_run *run);

diff --git a/oslib-posix.c b/oslib-posix.c
index b6a3c7f..93902ac 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -41,6 +41,9 @@ extern int daemon(int, int);
therefore we need special code which handles running on Valgrind. */
  #  define QEMU_VMALLOC_ALIGN (512 * 4096)
  #  define CONFIG_VALGRIND
+#elif defined(__linux__)&&  defined(__s390x__)
+   /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
+#  define QEMU_VMALLOC_ALIGN (256 * 4096)
  #else
  #  define QEMU_VMALLOC_ALIGN getpagesize()
  #endif
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 90aad61..ccf5daa 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -135,6 +135,40 @@ int kvm_arch_get_registers(CPUS390XState *env)
  return 0;
  }

+/*
+ * Legacy layout for s390:
+ * Older S390 KVM requires the topmost vma of the RAM to be
+ * smaller than an system defined value, which is at least 256GB.
+ * Larger systems have larger values. We put the guest between
+ * the end of data segment (system break) and this value. We
+ * use 32GB as a base to have enough room for the system break
+ * to grow. We also have to use MAP parameters that avoid
+ * read-only mapping of guest pages.
+ */
+static void *legacy_s390_alloc(ram_addr_t size)
+{
+void *mem;
+
+mem = mmap((void *) 0x8ULL, size,
+   PROT_EXEC|PROT_READ|PROT_WRITE,
+   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+if (mem == MAP_FAILED) {
+fprintf(stderr, "Allocating RAM failed\n");
+abort();
+}
+return mem;
+}
+
+void *kvm_arch_alloc(ram_addr_t size)
+{
+if (kvm_check_extension(kvm_state, KVM_CAP_S390_GMAP)&&
+kvm_check_extension(kvm_state, K

Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-12 Thread Christian Borntraeger
> Is there any way we can move the above code to target-s390x? Having the
> branch below is already invasive enough for generic code, but we really
> don't need all the special s390 quirks to live here.


Hmm, we have to have a special hook somehow.
What about this approach?

---

By default qemu will use MAP_PRIVATE for guest pages. This will write
protect pages and thus break on s390 systems that dont support this feature.
Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
has other problems (no dirty pages tracking, a lot more swap overhead etc.)
Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
qemu can use the standard qemu alloc if available, otherwise it will use
the old s390 hack.

Signed-off-by: Christian Borntraeger 
---
 exec.c |   19 ---
 kvm.h  |1 +
 oslib-posix.c  |3 +++
 target-s390x/kvm.c |   34 ++
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index a587e7a..9b9b8e1 100644
--- a/exec.c
+++ b/exec.c
@@ -2645,26 +2645,15 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 exit(1);
 #endif
 } else {
-#if defined(TARGET_S390X) && defined(CONFIG_KVM)
-/* S390 KVM requires the topmost vma of the RAM to be smaller than
-   an system defined value, which is at least 256GB. Larger systems
-   have larger values. We put the guest between the end of data
-   segment (system break) and this value. We use 32GB as a base to
-   have enough room for the system break to grow. */
-new_block->host = mmap((void*)0x8, size,
-   PROT_EXEC|PROT_READ|PROT_WRITE,
-   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 
0);
-if (new_block->host == MAP_FAILED) {
-fprintf(stderr, "Allocating RAM failed\n");
-abort();
-}
-#else
 if (xen_enabled()) {
 xen_ram_alloc(new_block->offset, size, mr);
+#if defined(TARGET_S390X)
+} else if (kvm_enabled()) {
+new_block->host = kvm_arch_alloc(size);
+#endif
 } else {
 new_block->host = qemu_vmalloc(size);
 }
-#endif
 qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
 }
 }
diff --git a/kvm.h b/kvm.h
index 9c7b0ea..9d50016 100644
--- a/kvm.h
+++ b/kvm.h
@@ -102,6 +102,7 @@ int kvm_vcpu_ioctl(CPUArchState *env, int type, ...);
 
 extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
 
+void *kvm_arch_alloc(ram_addr_t size);
 void kvm_arch_pre_run(CPUArchState *env, struct kvm_run *run);
 void kvm_arch_post_run(CPUArchState *env, struct kvm_run *run);
 
diff --git a/oslib-posix.c b/oslib-posix.c
index b6a3c7f..93902ac 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -41,6 +41,9 @@ extern int daemon(int, int);
   therefore we need special code which handles running on Valgrind. */
 #  define QEMU_VMALLOC_ALIGN (512 * 4096)
 #  define CONFIG_VALGRIND
+#elif defined(__linux__) && defined(__s390x__)
+   /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
+#  define QEMU_VMALLOC_ALIGN (256 * 4096)
 #else
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 90aad61..ccf5daa 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -135,6 +135,40 @@ int kvm_arch_get_registers(CPUS390XState *env)
 return 0;
 }
 
+/*
+ * Legacy layout for s390:
+ * Older S390 KVM requires the topmost vma of the RAM to be
+ * smaller than an system defined value, which is at least 256GB.
+ * Larger systems have larger values. We put the guest between
+ * the end of data segment (system break) and this value. We
+ * use 32GB as a base to have enough room for the system break
+ * to grow. We also have to use MAP parameters that avoid
+ * read-only mapping of guest pages.
+ */
+static void *legacy_s390_alloc(ram_addr_t size)
+{
+void *mem;
+
+mem = mmap((void *) 0x8ULL, size,
+   PROT_EXEC|PROT_READ|PROT_WRITE,
+   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+if (mem == MAP_FAILED) {
+fprintf(stderr, "Allocating RAM failed\n");
+abort();
+}
+return mem;
+}
+
+void *kvm_arch_alloc(ram_addr_t size)
+{
+if (kvm_check_extension(kvm_state, KVM_CAP_S390_GMAP) &&
+kvm_check_extension(kvm_state, KVM_CAP_S390_COW)) {
+return qemu_vmalloc(size);
+} else {
+return legacy_s390_alloc(size);
+}
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUS390XState *env, struct kvm_sw_breakpoint 
*bp)
 {
 static const uint8_t diag_501[] = {0x83, 0x24, 0x05, 0x01};




Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-12 Thread Alexander Graf
Jens Freimann wrote:
> From: Christian Borntraeger 
>
> By default qemu will use MAP_PRIVATE for guest pages. This will write
> protect pages and thus break on s390 systems that dont support this feature.
> Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
> has other problems (no dirty pages tracking, a lot more swap overhead etc.)
> Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
> qemu can use the standard qemu alloc if available, otherwise it will use
> the old s390 hack.
>
> Signed-off-by: Christian Borntraeger 
> Signed-off-by: Jens Freimann 
> ---
>  exec.c |   54 
> +---
>  kvm.h  |9 +
>  oslib-posix.c  |3 +++
>  target-s390x/kvm.c |6 ++
>  4 files changed, 57 insertions(+), 15 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index a0494c7..8fec680 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2618,6 +2618,43 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
> *name, DeviceState *dev)
>  }
>  }
>  
> +/*
> + * lets make sure that we dont have the old s390x limitations regarding
> + * guest mappings
> + */
> +static int legacy_s390x_mem_layout(void)
> +{
> +#if defined(TARGET_S390X)
> +return kvm_has_legacy_s390x_memlayout();
> +#else
> +return 0;
> +#endif
> +}
> +
> +/*
> + * Legacy layout for s390:
> + * Older S390 KVM requires the topmost vma of the RAM to be
> + * smaller than an system defined value, which is at least 256GB.
> + * Larger systems have larger values. We put the guest between
> + * the end of data segment (system break) and this value. We
> + * use 32GB as a base to have enough room for the system break
> + * to grow. We also have to use MAP parameters that avoid
> + * read-only mapping of guest pages.
> + */
> +static void *legacy_s390_alloc(ram_addr_t size)
> +{
> +void *mem;
> +
> +mem = mmap((void *) 0x8ULL, size,
> +   PROT_EXEC|PROT_READ|PROT_WRITE,
> +   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> +if (mem == MAP_FAILED) {
> +fprintf(stderr, "Allocating RAM failed\n");
> +abort();
> +}
> +return mem;
> +}
> +
>   

Is there any way we can move the above code to target-s390x? Having the
branch below is already invasive enough for generic code, but we really
don't need all the special s390 quirks to live here.

>  ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
> MemoryRegion *mr)
>  {
> @@ -2644,26 +2681,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
> void *host,
>  exit(1);
>  #endif
>  } else {
> -#if defined(TARGET_S390X) && defined(CONFIG_KVM)
> -/* S390 KVM requires the topmost vma of the RAM to be smaller 
> than
> -   an system defined value, which is at least 256GB. Larger 
> systems
> -   have larger values. We put the guest between the end of data
> -   segment (system break) and this value. We use 32GB as a base 
> to
> -   have enough room for the system break to grow. */
> -new_block->host = mmap((void*)0x8, size,
> -   PROT_EXEC|PROT_READ|PROT_WRITE,
> -   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, 
> -1, 0);
> -if (new_block->host == MAP_FAILED) {
> -fprintf(stderr, "Allocating RAM failed\n");
> -abort();
> -}
> -#else
>  if (xen_enabled()) {
>  xen_ram_alloc(new_block->offset, size, mr);
>   
#ifdef TARGET_S390X
> +} else if (legacy_s390x_mem_layout()) {
> +new_block->host = legacy_s390_alloc(size);
>   
#endif

maybe? That way you could move everything to target-s390x.
>  } else {
>  new_block->host = qemu_vmalloc(size);
>  }
> -#endif
>  qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
>  }
>  }
> diff --git a/kvm.h b/kvm.h
> index 9c7b0ea..ca0557e 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -62,6 +62,15 @@ int kvm_has_pit_state2(void);
>  int kvm_has_many_ioeventfds(void);
>  int kvm_has_gsi_routing(void);
>  
> +#ifndef CONFIG_KVM 
> +static inline int kvm_has_legacy_s390x_memlayout(void)
>   

An s390 function in generic kvm.h? No way :)

> +{
> +return 0;
> +}
> +#else
> +int kvm_has_legacy_s390x_memlayout(void);
> +#endif
> +
>  int kvm_allows_irq0_override(void);
>  
>  #ifdef NEED_CPU_H
> diff --git a/oslib-posix.c b/oslib-posix.c
> index b6a3c7f..93902ac 100644
> --- a/oslib-posix.c
> +++ b/oslib-posix.c
> @@ -41,6 +41,9 @@ extern int daemon(int, int);
>therefore we need special code which handles running on Valgrind. */
>  #  define QEMU_VMALLOC_ALIGN (512 * 4096)
>  #  define CONFIG_VALGRIND
> +#elif defined(__linux__) && defined(__s390x__)
> +   /* Use 1 MiB (segment size) alignment so gmap can be

[Qemu-devel] [PATCH 2/8] s390: autodetect map private

2012-06-06 Thread Jens Freimann
From: Christian Borntraeger 

By default qemu will use MAP_PRIVATE for guest pages. This will write
protect pages and thus break on s390 systems that dont support this feature.
Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
has other problems (no dirty pages tracking, a lot more swap overhead etc.)
Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
qemu can use the standard qemu alloc if available, otherwise it will use
the old s390 hack.

Signed-off-by: Christian Borntraeger 
Signed-off-by: Jens Freimann 
---
 exec.c |   54 +---
 kvm.h  |9 +
 oslib-posix.c  |3 +++
 target-s390x/kvm.c |6 ++
 4 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index a0494c7..8fec680 100644
--- a/exec.c
+++ b/exec.c
@@ -2618,6 +2618,43 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 }
 }
 
+/*
+ * lets make sure that we dont have the old s390x limitations regarding
+ * guest mappings
+ */
+static int legacy_s390x_mem_layout(void)
+{
+#if defined(TARGET_S390X)
+return kvm_has_legacy_s390x_memlayout();
+#else
+return 0;
+#endif
+}
+
+/*
+ * Legacy layout for s390:
+ * Older S390 KVM requires the topmost vma of the RAM to be
+ * smaller than an system defined value, which is at least 256GB.
+ * Larger systems have larger values. We put the guest between
+ * the end of data segment (system break) and this value. We
+ * use 32GB as a base to have enough room for the system break
+ * to grow. We also have to use MAP parameters that avoid
+ * read-only mapping of guest pages.
+ */
+static void *legacy_s390_alloc(ram_addr_t size)
+{
+void *mem;
+
+mem = mmap((void *) 0x8ULL, size,
+   PROT_EXEC|PROT_READ|PROT_WRITE,
+   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+if (mem == MAP_FAILED) {
+fprintf(stderr, "Allocating RAM failed\n");
+abort();
+}
+return mem;
+}
+
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
MemoryRegion *mr)
 {
@@ -2644,26 +2681,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 exit(1);
 #endif
 } else {
-#if defined(TARGET_S390X) && defined(CONFIG_KVM)
-/* S390 KVM requires the topmost vma of the RAM to be smaller than
-   an system defined value, which is at least 256GB. Larger systems
-   have larger values. We put the guest between the end of data
-   segment (system break) and this value. We use 32GB as a base to
-   have enough room for the system break to grow. */
-new_block->host = mmap((void*)0x8, size,
-   PROT_EXEC|PROT_READ|PROT_WRITE,
-   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 
0);
-if (new_block->host == MAP_FAILED) {
-fprintf(stderr, "Allocating RAM failed\n");
-abort();
-}
-#else
 if (xen_enabled()) {
 xen_ram_alloc(new_block->offset, size, mr);
+} else if (legacy_s390x_mem_layout()) {
+new_block->host = legacy_s390_alloc(size);
 } else {
 new_block->host = qemu_vmalloc(size);
 }
-#endif
 qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
 }
 }
diff --git a/kvm.h b/kvm.h
index 9c7b0ea..ca0557e 100644
--- a/kvm.h
+++ b/kvm.h
@@ -62,6 +62,15 @@ int kvm_has_pit_state2(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
 
+#ifndef CONFIG_KVM 
+static inline int kvm_has_legacy_s390x_memlayout(void)
+{
+return 0;
+}
+#else
+int kvm_has_legacy_s390x_memlayout(void);
+#endif
+
 int kvm_allows_irq0_override(void);
 
 #ifdef NEED_CPU_H
diff --git a/oslib-posix.c b/oslib-posix.c
index b6a3c7f..93902ac 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -41,6 +41,9 @@ extern int daemon(int, int);
   therefore we need special code which handles running on Valgrind. */
 #  define QEMU_VMALLOC_ALIGN (512 * 4096)
 #  define CONFIG_VALGRIND
+#elif defined(__linux__) && defined(__s390x__)
+   /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
+#  define QEMU_VMALLOC_ALIGN (256 * 4096)
 #else
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 90aad61..93a8431 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -135,6 +135,12 @@ int kvm_arch_get_registers(CPUS390XState *env)
 return 0;
 }
 
+int kvm_has_legacy_s390x_memlayout(void)
+{
+return !kvm_check_extension(kvm_state, KVM_CAP_S390_GMAP) ||
+   !kvm_check_extension(kvm_state, KVM_CAP_S390_COW);
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUS390XState *env, struct kvm_sw_breakpoint 
*bp)
 {
 static const uint8_t diag_50