Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-27 Thread Dor Laor

On 09/27/2012 11:49 AM, Raghavendra K T wrote:

On 09/25/2012 08:30 PM, Dor Laor wrote:

On 09/24/2012 02:02 PM, Raghavendra K T wrote:

On 09/24/2012 02:12 PM, Dor Laor wrote:

In order to help PLE and pvticketlock converge I thought that a small
test code should be developed to test this in a predictable,
deterministic way.

The idea is to have a guest kernel module that spawn a new thread each
time you write to a /sys/ entry.

Each such a thread spins over a spin lock. The specific spin lock is
also chosen by the /sys/ interface. Let's say we have an array of spin
locks *10 times the amount of vcpus.

All the threads are running a
while (1) {

spin_lock(my_lock);
sum += execute_dummy_cpu_computation(time);
spin_unlock(my_lock);

if (sys_tells_thread_to_die()) break;
}

print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and
make
the ticket lock order deterministic and known (like a linear walk of
all
the threads trying to catch that lock).


By Cloning you mean hierarchy of the locks?


No, I meant to clone the implementation of the current spin lock code in
order to set any order you may like for the ticket selection.
(even for a non pvticket lock version)

For instance, let's say you have N threads trying to grab the lock, you
can always make the ticket go linearly from 1->2...->N.
Not sure it's a good idea, just a recommendation.


Also I believe time should be passed via sysfs / hardcoded for each
type of lock we are mimicking


Yap





This way you can easy calculate:
1. the score of a single vcpu running a single thread
2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
3. Like #2 but #threads > #vcpus and other versions of #total vcpus
(belonging to all VMs) > #pcpus.
4. Create #thread == #vcpus but let each thread have it's own spin
lock
5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact
overhead of scheduling VMs and threads since you have the ideal result
in hand and you know what the threads are doing.

My 2 cents, Dor



Thank you,
I think this is an excellent idea. ( Though I am trying to put all the
pieces together you mentioned). So overall we should be able to measure
the performance of pvspinlock/PLE improvements with a deterministic
load in guest.

Only thing I am missing is,
How to generate different combinations of the lock.

Okay, let me see if I can come with a solid model for this.



Do you mean the various options for PLE/pvticket/other? I haven't
thought of it and assumed its static but it can also be controlled
through the temporary /sys interface.



No, I am not there yet.

So In summary, we are suffering with inconsistent benchmark result,
while measuring the benefit of our improvement in PLE/pvlock etc..

So good point from your suggestion is,
- Giving predictability to workload that runs in guest, so that we have
pi-pi comparison of improvement.

- we can easily tune the workload via sysfs, and we can have script to
automate them.

What is complicated is:
- How can we simulate a workload close to what we measure with
benchmarks?
- How can we mimic lock holding time/ lock hierarchy close to the way
it is seen with real workloads (for e.g. highly contended zone lru lock
with similar amount of lockholding times).


You can spin for a similar instruction count that you're interested


- How close it would be to when we forget about other types of spinning
(for e.g, flush_tlb).

So I feel it is not as trivial as it looks like.



Indeed this is mainly a tool that can serve to optimize few synthetic 
workloads.
I still believe that it worth to go through this exercise since a 100% 
predictable and controlled case can help us purely asses the state of 
PLE and pvticket code. Otherwise we're dealing w/ too many parameters 
and assumptions at once.


Dor

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-27 Thread Dor Laor

On 09/27/2012 11:49 AM, Raghavendra K T wrote:

On 09/25/2012 08:30 PM, Dor Laor wrote:

On 09/24/2012 02:02 PM, Raghavendra K T wrote:

On 09/24/2012 02:12 PM, Dor Laor wrote:

In order to help PLE and pvticketlock converge I thought that a small
test code should be developed to test this in a predictable,
deterministic way.

The idea is to have a guest kernel module that spawn a new thread each
time you write to a /sys/ entry.

Each such a thread spins over a spin lock. The specific spin lock is
also chosen by the /sys/ interface. Let's say we have an array of spin
locks *10 times the amount of vcpus.

All the threads are running a
while (1) {

spin_lock(my_lock);
sum += execute_dummy_cpu_computation(time);
spin_unlock(my_lock);

if (sys_tells_thread_to_die()) break;
}

print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and
make
the ticket lock order deterministic and known (like a linear walk of
all
the threads trying to catch that lock).


By Cloning you mean hierarchy of the locks?


No, I meant to clone the implementation of the current spin lock code in
order to set any order you may like for the ticket selection.
(even for a non pvticket lock version)

For instance, let's say you have N threads trying to grab the lock, you
can always make the ticket go linearly from 1-2...-N.
Not sure it's a good idea, just a recommendation.


Also I believe time should be passed via sysfs / hardcoded for each
type of lock we are mimicking


Yap





This way you can easy calculate:
1. the score of a single vcpu running a single thread
2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
3. Like #2 but #threads  #vcpus and other versions of #total vcpus
(belonging to all VMs)  #pcpus.
4. Create #thread == #vcpus but let each thread have it's own spin
lock
5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact
overhead of scheduling VMs and threads since you have the ideal result
in hand and you know what the threads are doing.

My 2 cents, Dor



Thank you,
I think this is an excellent idea. ( Though I am trying to put all the
pieces together you mentioned). So overall we should be able to measure
the performance of pvspinlock/PLE improvements with a deterministic
load in guest.

Only thing I am missing is,
How to generate different combinations of the lock.

Okay, let me see if I can come with a solid model for this.



Do you mean the various options for PLE/pvticket/other? I haven't
thought of it and assumed its static but it can also be controlled
through the temporary /sys interface.



No, I am not there yet.

So In summary, we are suffering with inconsistent benchmark result,
while measuring the benefit of our improvement in PLE/pvlock etc..

So good point from your suggestion is,
- Giving predictability to workload that runs in guest, so that we have
pi-pi comparison of improvement.

- we can easily tune the workload via sysfs, and we can have script to
automate them.

What is complicated is:
- How can we simulate a workload close to what we measure with
benchmarks?
- How can we mimic lock holding time/ lock hierarchy close to the way
it is seen with real workloads (for e.g. highly contended zone lru lock
with similar amount of lockholding times).


You can spin for a similar instruction count that you're interested


- How close it would be to when we forget about other types of spinning
(for e.g, flush_tlb).

So I feel it is not as trivial as it looks like.



Indeed this is mainly a tool that can serve to optimize few synthetic 
workloads.
I still believe that it worth to go through this exercise since a 100% 
predictable and controlled case can help us purely asses the state of 
PLE and pvticket code. Otherwise we're dealing w/ too many parameters 
and assumptions at once.


Dor

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-25 Thread Dor Laor

On 09/24/2012 02:02 PM, Raghavendra K T wrote:

On 09/24/2012 02:12 PM, Dor Laor wrote:

In order to help PLE and pvticketlock converge I thought that a small
test code should be developed to test this in a predictable,
deterministic way.

The idea is to have a guest kernel module that spawn a new thread each
time you write to a /sys/ entry.

Each such a thread spins over a spin lock. The specific spin lock is
also chosen by the /sys/ interface. Let's say we have an array of spin
locks *10 times the amount of vcpus.

All the threads are running a
while (1) {

spin_lock(my_lock);
sum += execute_dummy_cpu_computation(time);
spin_unlock(my_lock);

if (sys_tells_thread_to_die()) break;
}

print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and make
the ticket lock order deterministic and known (like a linear walk of all
the threads trying to catch that lock).


By Cloning you mean hierarchy of the locks?


No, I meant to clone the implementation of the current spin lock code in 
order to set any order you may like for the ticket selection.

(even for a non pvticket lock version)

For instance, let's say you have N threads trying to grab the lock, you 
can always make the ticket go linearly from 1->2...->N.

Not sure it's a good idea, just a recommendation.


Also I believe time should be passed via sysfs / hardcoded for each
type of lock we are mimicking


Yap





This way you can easy calculate:
1. the score of a single vcpu running a single thread
2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
3. Like #2 but #threads > #vcpus and other versions of #total vcpus
(belonging to all VMs) > #pcpus.
4. Create #thread == #vcpus but let each thread have it's own spin
lock
5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact
overhead of scheduling VMs and threads since you have the ideal result
in hand and you know what the threads are doing.

My 2 cents, Dor



Thank you,
I think this is an excellent idea. ( Though I am trying to put all the
pieces together you mentioned). So overall we should be able to measure
the performance of pvspinlock/PLE improvements with a deterministic
load in guest.

Only thing I am missing is,
How to generate different combinations of the lock.

Okay, let me see if I can come with a solid model for this.



Do you mean the various options for PLE/pvticket/other? I haven't 
thought of it and assumed its static but it can also be controlled 
through the temporary /sys interface.


Thanks for following up!
Dor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-25 Thread Dor Laor

On 09/24/2012 02:02 PM, Raghavendra K T wrote:

On 09/24/2012 02:12 PM, Dor Laor wrote:

In order to help PLE and pvticketlock converge I thought that a small
test code should be developed to test this in a predictable,
deterministic way.

The idea is to have a guest kernel module that spawn a new thread each
time you write to a /sys/ entry.

Each such a thread spins over a spin lock. The specific spin lock is
also chosen by the /sys/ interface. Let's say we have an array of spin
locks *10 times the amount of vcpus.

All the threads are running a
while (1) {

spin_lock(my_lock);
sum += execute_dummy_cpu_computation(time);
spin_unlock(my_lock);

if (sys_tells_thread_to_die()) break;
}

print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and make
the ticket lock order deterministic and known (like a linear walk of all
the threads trying to catch that lock).


By Cloning you mean hierarchy of the locks?


No, I meant to clone the implementation of the current spin lock code in 
order to set any order you may like for the ticket selection.

(even for a non pvticket lock version)

For instance, let's say you have N threads trying to grab the lock, you 
can always make the ticket go linearly from 1-2...-N.

Not sure it's a good idea, just a recommendation.


Also I believe time should be passed via sysfs / hardcoded for each
type of lock we are mimicking


Yap





This way you can easy calculate:
1. the score of a single vcpu running a single thread
2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
3. Like #2 but #threads  #vcpus and other versions of #total vcpus
(belonging to all VMs)  #pcpus.
4. Create #thread == #vcpus but let each thread have it's own spin
lock
5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact
overhead of scheduling VMs and threads since you have the ideal result
in hand and you know what the threads are doing.

My 2 cents, Dor



Thank you,
I think this is an excellent idea. ( Though I am trying to put all the
pieces together you mentioned). So overall we should be able to measure
the performance of pvspinlock/PLE improvements with a deterministic
load in guest.

Only thing I am missing is,
How to generate different combinations of the lock.

Okay, let me see if I can come with a solid model for this.



Do you mean the various options for PLE/pvticket/other? I haven't 
thought of it and assumed its static but it can also be controlled 
through the temporary /sys interface.


Thanks for following up!
Dor
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Dor Laor
In order to help PLE and pvticketlock converge I thought that a small 
test code should be developed to test this in a predictable, 
deterministic way.


The idea is to have a guest kernel module that spawn a new thread each 
time you write to a /sys/ entry.


Each such a thread spins over a spin lock. The specific spin lock is 
also chosen by the /sys/ interface. Let's say we have an array of spin 
locks *10 times the amount of vcpus.


All the threads are running a
 while (1) {

   spin_lock(my_lock);
   sum += execute_dummy_cpu_computation(time);
   spin_unlock(my_lock);

   if (sys_tells_thread_to_die()) break;
 }

 print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and make 
the ticket lock order deterministic and known (like a linear walk of all 
the threads trying to catch that lock).


This way you can easy calculate:
 1. the score of a single vcpu running a single thread
 2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
 3. Like #2 but #threads > #vcpus and other versions of #total vcpus
(belonging to all VMs)  > #pcpus.
 4. Create #thread == #vcpus but let each thread have it's own spin
lock
 5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact 
overhead of scheduling VMs and threads since you have the ideal result 
in hand and you know what the threads are doing.


My 2 cents, Dor

On 09/21/2012 08:36 PM, Raghavendra K T wrote:

On 09/21/2012 06:48 PM, Chegu Vinod wrote:

On 9/21/2012 4:59 AM, Raghavendra K T wrote:

In some special scenarios like #vcpu <= #pcpu, PLE handler may
prove very costly,


Yes.

because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

An idea to solve this is:
1) As Avi had proposed we can modify hardware ple_window
dynamically to avoid frequent PL-exit.


Yes. We had to do this to get around some scaling issues for large
(>20way) guests (with no overcommitment)


Do you mean you already have some solution tested for this?



As part of some experimentation we even tried "switching off" PLE too :(



Honestly,
Your this experiment and Andrew Theurer's observations were the
motivation for this patch.





(IMHO, it is difficult to
decide when we have mixed type of VMs).


Agree.

Not sure if the following alternatives have also been looked at :

- Could the behavior associated with the "ple_window" be modified to be
a function of some [new] per-guest attribute (which can be conveyed to
the host as part of the guest launch sequence). The user can choose to
set this [new] attribute for a given guest. This would help avoid the
frequent exits due to PLE (as Avi had mentioned earlier) ?


Ccing Drew also. We had a good discussion on this idea last time.
(sorry that I forgot to include in patch series)

May be a good idea when we know the load in advance..



- Can the PLE feature ( in VT) be "enhanced" to be made a per guest
attribute ?


IMHO, the approach of not taking a frequent exit is better than taking
an exit and returning back from the handler etc.


I entirely agree on this point. (though have not tried above
approaches). Hope to see more expert opinions pouring in.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Dor Laor
In order to help PLE and pvticketlock converge I thought that a small 
test code should be developed to test this in a predictable, 
deterministic way.


The idea is to have a guest kernel module that spawn a new thread each 
time you write to a /sys/ entry.


Each such a thread spins over a spin lock. The specific spin lock is 
also chosen by the /sys/ interface. Let's say we have an array of spin 
locks *10 times the amount of vcpus.


All the threads are running a
 while (1) {

   spin_lock(my_lock);
   sum += execute_dummy_cpu_computation(time);
   spin_unlock(my_lock);

   if (sys_tells_thread_to_die()) break;
 }

 print_result(sum);

Instead of calling the kernel's spin_lock functions, clone them and make 
the ticket lock order deterministic and known (like a linear walk of all 
the threads trying to catch that lock).


This way you can easy calculate:
 1. the score of a single vcpu running a single thread
 2. the score of sum of all thread scores when #thread==#vcpu all
taking the same spin lock. The overall sum should be close as
possible to #1.
 3. Like #2 but #threads  #vcpus and other versions of #total vcpus
(belonging to all VMs)   #pcpus.
 4. Create #thread == #vcpus but let each thread have it's own spin
lock
 5. Like 4 + 2

Hopefully this way will allows you to judge and evaluate the exact 
overhead of scheduling VMs and threads since you have the ideal result 
in hand and you know what the threads are doing.


My 2 cents, Dor

On 09/21/2012 08:36 PM, Raghavendra K T wrote:

On 09/21/2012 06:48 PM, Chegu Vinod wrote:

On 9/21/2012 4:59 AM, Raghavendra K T wrote:

In some special scenarios like #vcpu = #pcpu, PLE handler may
prove very costly,


Yes.

because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

An idea to solve this is:
1) As Avi had proposed we can modify hardware ple_window
dynamically to avoid frequent PL-exit.


Yes. We had to do this to get around some scaling issues for large
(20way) guests (with no overcommitment)


Do you mean you already have some solution tested for this?



As part of some experimentation we even tried switching off PLE too :(



Honestly,
Your this experiment and Andrew Theurer's observations were the
motivation for this patch.





(IMHO, it is difficult to
decide when we have mixed type of VMs).


Agree.

Not sure if the following alternatives have also been looked at :

- Could the behavior associated with the ple_window be modified to be
a function of some [new] per-guest attribute (which can be conveyed to
the host as part of the guest launch sequence). The user can choose to
set this [new] attribute for a given guest. This would help avoid the
frequent exits due to PLE (as Avi had mentioned earlier) ?


Ccing Drew also. We had a good discussion on this idea last time.
(sorry that I forgot to include in patch series)

May be a good idea when we know the load in advance..



- Can the PLE feature ( in VT) be enhanced to be made a per guest
attribute ?


IMHO, the approach of not taking a frequent exit is better than taking
an exit and returning back from the handler etc.


I entirely agree on this point. (though have not tried above
approaches). Hope to see more expert opinions pouring in.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/2] virtio: provide a way for host to monitor critical events in the device

2012-07-24 Thread Dor Laor

On 07/24/2012 03:30 PM, Sasha Levin wrote:

On 07/24/2012 10:26 AM, Dor Laor wrote:

On 07/24/2012 07:55 AM, Rusty Russell wrote:

On Mon, 23 Jul 2012 22:32:39 +0200, Sasha Levin  wrote:

As it was discussed recently, there's currently no way for the guest to notify
the host about panics. Further more, there's no reasonable way to notify the
host of other critical events such as an OOM kill.


I clearly missed the discussion.  Is this actually useful?  In practice,


Admit this is not a killer feature..


won't you want the log from the guest?  What makes a virtual guest
different from a physical guest?


Most times virt guest can do better than a physical OS. In that sense, this is 
where virtualization shines (live migration, hotplug for any virtual resource 
including net/block/cpu/memory/..).

There are plenty of niche but worth while small features such as the 
virtio-trace series and other that allow the host/virt-mgmt to get more insight 
into the guest w/o a need to configure the guest.

In theory guest OOM can trigger a host memory hot plug action. Again, I don't 
see it as a key feature..



Guest watchdog functionality might be useful, but that's simpler to


There is already a fully emulated watchdog device in qemu.


There is, but why emulate physical devices when you can take advantage of 
virtio?

You could say the same about the rest of the virtio family - "There is already a 
fully emulated NIC device in qemu".


The single issue virtio-nic solves is performance enhancements that can 
be done w/ a fully emulated NIC. The reason is that such NIC tend to 
access pio/mmio space a lot while virtio is designed for virtualization.


Standard watchdog device (isn't it time you'll try qemu?)  isn't about 
performance and if that's all the functionality you need it should work 
fine.


btw: check the virtio-trace series that was just send in a parallel thread.

Cheers,
Dor



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/2] virtio: provide a way for host to monitor critical events in the device

2012-07-24 Thread Dor Laor

On 07/24/2012 07:55 AM, Rusty Russell wrote:

On Mon, 23 Jul 2012 22:32:39 +0200, Sasha Levin  wrote:

As it was discussed recently, there's currently no way for the guest to notify
the host about panics. Further more, there's no reasonable way to notify the
host of other critical events such as an OOM kill.


I clearly missed the discussion.  Is this actually useful?  In practice,


Admit this is not a killer feature..


won't you want the log from the guest?  What makes a virtual guest
different from a physical guest?


Most times virt guest can do better than a physical OS. In that sense, 
this is where virtualization shines (live migration, hotplug for any 
virtual resource including net/block/cpu/memory/..).


There are plenty of niche but worth while small features such as the 
virtio-trace series and other that allow the host/virt-mgmt to get more 
insight into the guest w/o a need to configure the guest.


In theory guest OOM can trigger a host memory hot plug action. Again, I 
don't see it as a key feature..




Guest watchdog functionality might be useful, but that's simpler to


There is already a fully emulated watchdog device in qemu.
Cheers,
Dor


implement via a virtio watchdog device, and more effective to implement
via a host facility that actually pings guest functionality (rather than
the kernel).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/2] virtio: provide a way for host to monitor critical events in the device

2012-07-24 Thread Dor Laor

On 07/24/2012 07:55 AM, Rusty Russell wrote:

On Mon, 23 Jul 2012 22:32:39 +0200, Sasha Levin levinsasha...@gmail.com wrote:

As it was discussed recently, there's currently no way for the guest to notify
the host about panics. Further more, there's no reasonable way to notify the
host of other critical events such as an OOM kill.


I clearly missed the discussion.  Is this actually useful?  In practice,


Admit this is not a killer feature..


won't you want the log from the guest?  What makes a virtual guest
different from a physical guest?


Most times virt guest can do better than a physical OS. In that sense, 
this is where virtualization shines (live migration, hotplug for any 
virtual resource including net/block/cpu/memory/..).


There are plenty of niche but worth while small features such as the 
virtio-trace series and other that allow the host/virt-mgmt to get more 
insight into the guest w/o a need to configure the guest.


In theory guest OOM can trigger a host memory hot plug action. Again, I 
don't see it as a key feature..




Guest watchdog functionality might be useful, but that's simpler to


There is already a fully emulated watchdog device in qemu.
Cheers,
Dor


implement via a virtio watchdog device, and more effective to implement
via a host facility that actually pings guest functionality (rather than
the kernel).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/2] virtio: provide a way for host to monitor critical events in the device

2012-07-24 Thread Dor Laor

On 07/24/2012 03:30 PM, Sasha Levin wrote:

On 07/24/2012 10:26 AM, Dor Laor wrote:

On 07/24/2012 07:55 AM, Rusty Russell wrote:

On Mon, 23 Jul 2012 22:32:39 +0200, Sasha Levin levinsasha...@gmail.com wrote:

As it was discussed recently, there's currently no way for the guest to notify
the host about panics. Further more, there's no reasonable way to notify the
host of other critical events such as an OOM kill.


I clearly missed the discussion.  Is this actually useful?  In practice,


Admit this is not a killer feature..


won't you want the log from the guest?  What makes a virtual guest
different from a physical guest?


Most times virt guest can do better than a physical OS. In that sense, this is 
where virtualization shines (live migration, hotplug for any virtual resource 
including net/block/cpu/memory/..).

There are plenty of niche but worth while small features such as the 
virtio-trace series and other that allow the host/virt-mgmt to get more insight 
into the guest w/o a need to configure the guest.

In theory guest OOM can trigger a host memory hot plug action. Again, I don't 
see it as a key feature..



Guest watchdog functionality might be useful, but that's simpler to


There is already a fully emulated watchdog device in qemu.


There is, but why emulate physical devices when you can take advantage of 
virtio?

You could say the same about the rest of the virtio family - There is already a 
fully emulated NIC device in qemu.


The single issue virtio-nic solves is performance enhancements that can 
be done w/ a fully emulated NIC. The reason is that such NIC tend to 
access pio/mmio space a lot while virtio is designed for virtualization.


Standard watchdog device (isn't it time you'll try qemu?)  isn't about 
performance and if that's all the functionality you need it should work 
fine.


btw: check the virtio-trace series that was just send in a parallel thread.

Cheers,
Dor



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] Guest kernel hangs in smp kvm for older kernelsprior to tsc sync cleanup

2007-12-19 Thread Dor Laor

Amit Shah wrote:


On Wednesday 19 December 2007 21:02:06 Glauber de Oliveira Costa wrote:
> On Dec 19, 2007 12:27 PM, Avi Kivity <[EMAIL PROTECTED]> wrote:
> > Ingo Molnar wrote:
> > > * Avi Kivity <[EMAIL PROTECTED]> wrote:
> > >> Avi Kivity wrote:
> > >>>  Testing shows wrmsr and rdtsc function normally.
> > >>>
> > >>> I'll try pinning the vcpus to cpus and see if that helps.
> > >>
> > >> It does.
> > >
> > > do we let the guest read the physical CPU's TSC? That would be 
trouble.

> >
> > vmx (and svm) allow us to add an offset to the physical tsc.  We 
set it

> > on startup to -tsc (so that an rdtsc on boot would return 0), and
> > massage it on vcpu migration so that guest rdtsc is monotonic.
> >
> > The net effect is that tsc on a vcpu can experience large forward 
jumps

> > and changes in rate, but no negative jumps.
>
> Changes in rate does not sound good. It's possibly what's screwing up
> my paravirt clock implementation in smp.

Do you mean in the case of VM migration, or just starting them on a single
host?


It's the cpu preemption stuff on local host and not VM migration


> Since the host updates guest time prior to putting vcpu to run, two
> vcpus that start running at different times will have different system
> values.
>
> Now if the vcpu that started running later probes the time first,
> we'll se the time going backwards. A constant tsc rate is the only way
> around
> my limited mind sees around the problem (besides, obviously, _not_
> making the system time per-vcpu).

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] Guest kernel hangs in smp kvm for older kernelsprior to tsc sync cleanup

2007-12-19 Thread Dor Laor

Amit Shah wrote:


On Wednesday 19 December 2007 21:02:06 Glauber de Oliveira Costa wrote:
 On Dec 19, 2007 12:27 PM, Avi Kivity [EMAIL PROTECTED] wrote:
  Ingo Molnar wrote:
   * Avi Kivity [EMAIL PROTECTED] wrote:
   Avi Kivity wrote:
Testing shows wrmsr and rdtsc function normally.
  
   I'll try pinning the vcpus to cpus and see if that helps.
  
   It does.
  
   do we let the guest read the physical CPU's TSC? That would be 
trouble.

 
  vmx (and svm) allow us to add an offset to the physical tsc.  We 
set it

  on startup to -tsc (so that an rdtsc on boot would return 0), and
  massage it on vcpu migration so that guest rdtsc is monotonic.
 
  The net effect is that tsc on a vcpu can experience large forward 
jumps

  and changes in rate, but no negative jumps.

 Changes in rate does not sound good. It's possibly what's screwing up
 my paravirt clock implementation in smp.

Do you mean in the case of VM migration, or just starting them on a single
host?


It's the cpu preemption stuff on local host and not VM migration


 Since the host updates guest time prior to putting vcpu to run, two
 vcpus that start running at different times will have different system
 values.

 Now if the vcpu that started running later probes the time first,
 we'll se the time going backwards. A constant tsc rate is the only way
 around
 my limited mind sees around the problem (besides, obviously, _not_
 making the system time per-vcpu).

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Ingo Molnar wrote:

* Dor Laor <[EMAIL PROTECTED]> wrote:

  

Here [include/asm-x86/tsc.h]:

/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
   unsigned long long ret;
   unsigned eax, edx;

   /*
  * Use RDTSCP if possible; it is guaranteed to be synchronous
 * and doesn't cause a VMEXIT on Hypervisors
*/
   alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
  ASM_OUTPUT2("=a" (eax), "=d" (edx)),
  "a" (0U), "d" (0U) : "ecx", "memory");
   ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
   if (ret)
   return ret;

   /*
* Don't do an additional sync on CPUs where we know
* RDTSC is already synchronous:
*/
//alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
   rdtscll(ret);



The patch below should resolve this - could you please test and Ack it? 
  

It works, actually I already commented it out.

Acked-by: Dor Laor <[EMAIL PROTECTED]>

But this CPUID was present in v2.6.23 too, so why did it only show up in

2.6.24-rc for you?

  

I tried to figure out but all the code movements for i386 go in the way.
In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did 
not suffer from it.

Thanks for the ultra fast reply :)
Dor

Ingo

-->
Subject: x86: fix get_cycles_sync() overhead
From: Ingo Molnar <[EMAIL PROTECTED]>

get_cycles_sync() is causing massive overhead in KVM networking:

   http://lkml.org/lkml/2007/12/11/54

remove the explicit CPUID serialization - it causes VM exits and is
pointless: we care about GTOD coherency but that goes to user-space
via a syscall, and syscalls are serialization points anyway.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 include/asm-x86/tsc.h |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-x86.q/include/asm-x86/tsc.h
===
--- linux-x86.q.orig/include/asm-x86/tsc.h
+++ linux-x86.q/include/asm-x86/tsc.h
@@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
unsigned eax, edx;
 
 	/*

-* Use RDTSCP if possible; it is guaranteed to be synchronous
-* and doesn't cause a VMEXIT on Hypervisors
+* Use RDTSCP if possible; it is guaranteed to be synchronous
+* and doesn't cause a VMEXIT on Hypervisors
 */
alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
   ASM_OUTPUT2("=a" (eax), "=d" (edx)),
@@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
return ret;
 
 	/*

-* Don't do an additional sync on CPUs where we know
-* RDTSC is already synchronous:
+* Use RDTSC on other CPUs. This might not be fully synchronous,
+* but it's not a problem: the only coherency we care about is
+* the GTOD output to user-space, and syscalls are synchronization
+* points anyway:
 */
-   alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
- "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
rdtscll(ret);
 
 	return ret;


  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Andi Kleen wrote:

[headers rewritten because of gmane crosspost breakage]

  
In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
decrease for KVM networking.



That should not have changed for quite some time.

Also it depends on the CPU of course.
  
I didn't find the exact place of the change but using fedora 2.6.23-8 
there is no problem.
3aefbe0746580a710d4392a884ac1e4aac7c728f turn X86_FEATURE_SYNC_RDTSC  
off for most

intel cpus, but it was committed in May.

  

The reason is many vmexit (exit reason is cpuid instruction) caused by
calls to gettimeofday that uses tsc sourceclock.
read_tsc calls get_cycles_sync which might call cpuid in order to 
serialize the cpu.


Can you explain why the cpu needs to be serialized for every gettime call?



Otherwise RDTSC can be speculated around and happen outside the protection
of the seqlock and that can sometimes lead to non monotonic time reporting.
  

What about moving the result into memory and calling mb() instead?

Anyways after a lot of discussions it turns out there are ways to archive
this without CPUID and there is a solution implemented for this in ff
tree which I will submit for .25. It's a little complicated though
and not a quick fix.

  
Do we need to be that accurate? (It will also slightly improve physical 
hosts).
I believe you have a reason and the answer is yes. In that case can you 
replace the serializing instruction
with an instruction that does not trigger vmexit? Maybe use 'ltr' for 
example?



ltr doesn't synchronize RDTSC.

  
According to Intel spec it is a serializing instruction along with cpuid 
and others.

-Andi

  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Ingo Molnar wrote:


* Dor Laor <[EMAIL PROTECTED]> wrote:

> Hi Ingo, Thomas,
>
> In the latest kernel (2.6.24-rc3) I noticed a drastic performance
> decrease for KVM networking. The reason is many vmexit (exit reason is
> cpuid instruction) caused by calls to gettimeofday that uses tsc
> sourceclock. read_tsc calls get_cycles_sync which might call cpuid in
> order to serialize the cpu.
>
> Can you explain why the cpu needs to be serialized for every gettime
> call? Do we need to be that accurate? (It will also slightly improve
> physical hosts). I believe you have a reason and the answer is yes. In
> that case can you replace the serializing instruction with an
> instruction that does not trigger vmexit? Maybe use 'ltr' for example?

hm, where exactly does it call CPUID?

Ingo


Here, commented out [include/asm-x86/tsc.h]:
/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
   unsigned long long ret;
   unsigned eax, edx;

   /*
  * Use RDTSCP if possible; it is guaranteed to be synchronous
 * and doesn't cause a VMEXIT on Hypervisors
*/
   alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
  ASM_OUTPUT2("=a" (eax), "=d" (edx)),
  "a" (0U), "d" (0U) : "ecx", "memory");
   ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
   if (ret)
   return ret;

   /*
* Don't do an additional sync on CPUs where we know
* RDTSC is already synchronous:
*/
//alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
   rdtscll(ret);

   return ret;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Hi Ingo, Thomas,

In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
decrease for KVM networking.
The reason is many vmexit (exit reason is cpuid instruction) caused by 
calls to gettimeofday that uses tsc sourceclock.
read_tsc calls get_cycles_sync which might call cpuid in order to 
serialize the cpu.


Can you explain why the cpu needs to be serialized for every gettime call?
Do we need to be that accurate? (It will also slightly improve physical 
hosts).
I believe you have a reason and the answer is yes. In that case can you 
replace the serializing instruction
with an instruction that does not trigger vmexit? Maybe use 'ltr' for 
example?


Regards,
Dor.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Hi Ingo, Thomas,

In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
decrease for KVM networking.
The reason is many vmexit (exit reason is cpuid instruction) caused by 
calls to gettimeofday that uses tsc sourceclock.
read_tsc calls get_cycles_sync which might call cpuid in order to 
serialize the cpu.


Can you explain why the cpu needs to be serialized for every gettime call?
Do we need to be that accurate? (It will also slightly improve physical 
hosts).
I believe you have a reason and the answer is yes. In that case can you 
replace the serializing instruction
with an instruction that does not trigger vmexit? Maybe use 'ltr' for 
example?


Regards,
Dor.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Ingo Molnar wrote:


* Dor Laor [EMAIL PROTECTED] wrote:

 Hi Ingo, Thomas,

 In the latest kernel (2.6.24-rc3) I noticed a drastic performance
 decrease for KVM networking. The reason is many vmexit (exit reason is
 cpuid instruction) caused by calls to gettimeofday that uses tsc
 sourceclock. read_tsc calls get_cycles_sync which might call cpuid in
 order to serialize the cpu.

 Can you explain why the cpu needs to be serialized for every gettime
 call? Do we need to be that accurate? (It will also slightly improve
 physical hosts). I believe you have a reason and the answer is yes. In
 that case can you replace the serializing instruction with an
 instruction that does not trigger vmexit? Maybe use 'ltr' for example?

hm, where exactly does it call CPUID?

Ingo


Here, commented out [include/asm-x86/tsc.h]:
/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
   unsigned long long ret;
   unsigned eax, edx;

   /*
  * Use RDTSCP if possible; it is guaranteed to be synchronous
 * and doesn't cause a VMEXIT on Hypervisors
*/
   alternative_io(ASM_NOP3, .byte 0x0f,0x01,0xf9, X86_FEATURE_RDTSCP,
  ASM_OUTPUT2(=a (eax), =d (edx)),
  a (0U), d (0U) : ecx, memory);
   ret = (((unsigned long long)edx)  32) | ((unsigned long long)eax);
   if (ret)
   return ret;

   /*
* Don't do an additional sync on CPUs where we know
* RDTSC is already synchronous:
*/
//alternative_io(cpuid, ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//  =a (eax), 0 (1) : ebx,ecx,edx,memory);
   rdtscll(ret);

   return ret;
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Andi Kleen wrote:

[headers rewritten because of gmane crosspost breakage]

  
In the latest kernel (2.6.24-rc3) I noticed a drastic performance 
decrease for KVM networking.



That should not have changed for quite some time.

Also it depends on the CPU of course.
  
I didn't find the exact place of the change but using fedora 2.6.23-8 
there is no problem.
3aefbe0746580a710d4392a884ac1e4aac7c728f turn X86_FEATURE_SYNC_RDTSC  
off for most

intel cpus, but it was committed in May.

  

The reason is many vmexit (exit reason is cpuid instruction) caused by
calls to gettimeofday that uses tsc sourceclock.
read_tsc calls get_cycles_sync which might call cpuid in order to 
serialize the cpu.


Can you explain why the cpu needs to be serialized for every gettime call?



Otherwise RDTSC can be speculated around and happen outside the protection
of the seqlock and that can sometimes lead to non monotonic time reporting.
  

What about moving the result into memory and calling mb() instead?

Anyways after a lot of discussions it turns out there are ways to archive
this without CPUID and there is a solution implemented for this in ff
tree which I will submit for .25. It's a little complicated though
and not a quick fix.

  
Do we need to be that accurate? (It will also slightly improve physical 
hosts).
I believe you have a reason and the answer is yes. In that case can you 
replace the serializing instruction
with an instruction that does not trigger vmexit? Maybe use 'ltr' for 
example?



ltr doesn't synchronize RDTSC.

  
According to Intel spec it is a serializing instruction along with cpuid 
and others.

-Andi

  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Performance overhead of get_cycles_sync

2007-12-11 Thread Dor Laor

Ingo Molnar wrote:

* Dor Laor [EMAIL PROTECTED] wrote:

  

Here [include/asm-x86/tsc.h]:

/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
   unsigned long long ret;
   unsigned eax, edx;

   /*
  * Use RDTSCP if possible; it is guaranteed to be synchronous
 * and doesn't cause a VMEXIT on Hypervisors
*/
   alternative_io(ASM_NOP3, .byte 0x0f,0x01,0xf9, X86_FEATURE_RDTSCP,
  ASM_OUTPUT2(=a (eax), =d (edx)),
  a (0U), d (0U) : ecx, memory);
   ret = (((unsigned long long)edx)  32) | ((unsigned long long)eax);
   if (ret)
   return ret;

   /*
* Don't do an additional sync on CPUs where we know
* RDTSC is already synchronous:
*/
//alternative_io(cpuid, ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//  =a (eax), 0 (1) : ebx,ecx,edx,memory);
   rdtscll(ret);



The patch below should resolve this - could you please test and Ack it? 
  

It works, actually I already commented it out.

Acked-by: Dor Laor [EMAIL PROTECTED]

But this CPUID was present in v2.6.23 too, so why did it only show up in

2.6.24-rc for you?

  

I tried to figure out but all the code movements for i386 go in the way.
In the previous email I reported to Andi that Fedora kernel 2.6.23-8 did 
not suffer from it.

Thanks for the ultra fast reply :)
Dor

Ingo

--
Subject: x86: fix get_cycles_sync() overhead
From: Ingo Molnar [EMAIL PROTECTED]

get_cycles_sync() is causing massive overhead in KVM networking:

   http://lkml.org/lkml/2007/12/11/54

remove the explicit CPUID serialization - it causes VM exits and is
pointless: we care about GTOD coherency but that goes to user-space
via a syscall, and syscalls are serialization points anyway.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
---
 include/asm-x86/tsc.h |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-x86.q/include/asm-x86/tsc.h
===
--- linux-x86.q.orig/include/asm-x86/tsc.h
+++ linux-x86.q/include/asm-x86/tsc.h
@@ -39,8 +39,8 @@ static __always_inline cycles_t get_cycl
unsigned eax, edx;
 
 	/*

-* Use RDTSCP if possible; it is guaranteed to be synchronous
-* and doesn't cause a VMEXIT on Hypervisors
+* Use RDTSCP if possible; it is guaranteed to be synchronous
+* and doesn't cause a VMEXIT on Hypervisors
 */
alternative_io(ASM_NOP3, .byte 0x0f,0x01,0xf9, X86_FEATURE_RDTSCP,
   ASM_OUTPUT2(=a (eax), =d (edx)),
@@ -50,11 +50,11 @@ static __always_inline cycles_t get_cycl
return ret;
 
 	/*

-* Don't do an additional sync on CPUs where we know
-* RDTSC is already synchronous:
+* Use RDTSC on other CPUs. This might not be fully synchronous,
+* but it's not a problem: the only coherency we care about is
+* the GTOD output to user-space, and syscalls are synchronization
+* points anyway:
 */
-   alternative_io(cpuid, ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
- =a (eax), 0 (1) : ebx,ecx,edx,memory);
rdtscll(ret);
 
 	return ret;


  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-08 Thread Dor Laor

Anthony Liguori wrote:

This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.


  
While it's a little premature, we can start thinking of irq path 
improvements.
The current patch acks a private isr and afterwards apic eoi will also 
be hit since its

a level trig irq. This means 2 vmexits per irq.
We can start with regular pci irqs and move afterwards to msi.
Some other ugly hack options [we're better use msi]:
   - Read the eoi directly from apic and save the first private isr ack
   - Convert the specific irq line to edge triggered and dont share it
What do you guys think?

+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+static irqreturn_t vp_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   u8 isr;
+
+   /* reading the ISR has the effect of also clearing it so it's very
+* important to save off the value. */
+   isr = ioread8(vp_dev->ioaddr + VIRTIO_PCI_ISR);
+
+   /* It's definitely not us if the ISR was not high */
+   if (!isr)
+   return IRQ_NONE;
+
+   spin_lock(_dev->lock);
+   list_for_each_entry(info, _dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock(_dev->lock);
+
+   return ret;
+}
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-08 Thread Dor Laor

Anthony Liguori wrote:

Avi Kivity wrote:
  

Anthony Liguori wrote:
  


This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

  

  

Didn't see support for dma.



Not sure what you're expecting there.  Using dma_ops in virtio_ring?

  

 I think that with Amit's pvdma patches you
can support dma-capable devices as well without too much fuss.
  



What is the use case you're thinking of?  A semi-paravirt driver that 
does dma directly to a device?


Regards,

Anthony Liguori

  
You would also lose performance since pv-dma will trigger an exit for 
each virtio io while

virtio kicks the hypervisor after several IOs were queued.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-08 Thread Dor Laor

Anthony Liguori wrote:

Avi Kivity wrote:
  

Anthony Liguori wrote:
  


This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

  

  

Didn't see support for dma.



Not sure what you're expecting there.  Using dma_ops in virtio_ring?

  

 I think that with Amit's pvdma patches you
can support dma-capable devices as well without too much fuss.
  



What is the use case you're thinking of?  A semi-paravirt driver that 
does dma directly to a device?


Regards,

Anthony Liguori

  
You would also lose performance since pv-dma will trigger an exit for 
each virtio io while

virtio kicks the hypervisor after several IOs were queued.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-08 Thread Dor Laor

Anthony Liguori wrote:

This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.


  
While it's a little premature, we can start thinking of irq path 
improvements.
The current patch acks a private isr and afterwards apic eoi will also 
be hit since its

a level trig irq. This means 2 vmexits per irq.
We can start with regular pci irqs and move afterwards to msi.
Some other ugly hack options [we're better use msi]:
   - Read the eoi directly from apic and save the first private isr ack
   - Convert the specific irq line to edge triggered and dont share it
What do you guys think?

+/* A small wrapper to also acknowledge the interrupt when it's handled.
+ * I really need an EIO hook for the vring so I can ack the interrupt once we
+ * know that we'll be handling the IRQ but before we invoke the callback since
+ * the callback may notify the host which results in the host attempting to
+ * raise an interrupt that we would then mask once we acknowledged the
+ * interrupt. */
+static irqreturn_t vp_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   u8 isr;
+
+   /* reading the ISR has the effect of also clearing it so it's very
+* important to save off the value. */
+   isr = ioread8(vp_dev-ioaddr + VIRTIO_PCI_ISR);
+
+   /* It's definitely not us if the ISR was not high */
+   if (!isr)
+   return IRQ_NONE;
+
+   spin_lock(vp_dev-lock);
+   list_for_each_entry(info, vp_dev-virtqueues, node) {
+   if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock(vp_dev-lock);
+
+   return ret;
+}
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] 2.6.23.1-rt4 and kvm 48

2007-11-01 Thread Dor Laor

David Brown wrote:

Uhm, not sure who to send this too...

I thought I'd try out the realtime patch set and it didn't work at all
with kvm. The console didn't dump anything and the system completely
locked up.

Anyone have any suggestions as to how to get more output on this issue?

It got to the point of bringing up the tap interface and attaching it
to the bridge but that was about it for the console messages.

Thanks,
- David Brown

  

I tried to recreate your problem using 2.6.23-1 and latest rt patch (rt5).
The problem is that the kernel is not stable at all, I can't even 
compile the code over vnc -

my connection is constantly lost. So it might not be kvm problem?
Can you try is with -no-kvm and see if it's working - then its just a 
regular userspace process.


Anyway if all other things are stable on your end, can you send us 
dmesg/strace outputs?

Also try without the good -no-kvm-irqchip.
Regards,
Dor.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] 2.6.23.1-rt4 and kvm 48

2007-11-01 Thread Dor Laor

David Brown wrote:

Uhm, not sure who to send this too...

I thought I'd try out the realtime patch set and it didn't work at all
with kvm. The console didn't dump anything and the system completely
locked up.

Anyone have any suggestions as to how to get more output on this issue?

It got to the point of bringing up the tap interface and attaching it
to the bridge but that was about it for the console messages.

Thanks,
- David Brown

  

I tried to recreate your problem using 2.6.23-1 and latest rt patch (rt5).
The problem is that the kernel is not stable at all, I can't even 
compile the code over vnc -

my connection is constantly lost. So it might not be kvm problem?
Can you try is with -no-kvm and see if it's working - then its just a 
regular userspace process.


Anyway if all other things are stable on your end, can you send us 
dmesg/strace outputs?

Also try without the good -no-kvm-irqchip.
Regards,
Dor.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
kvm-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Lguest] [V9fs-developer] [kvm-devel] [RFC] 9p: add KVM/QEMUpci transport

2007-08-29 Thread Dor Laor
My current view of the IO stack is the following:

-- --  --
--   -
|NET_PCI_BACK| |BLK_PCI_BACK|  |9P_PCI_BACK|   |NET_FRONT|
|BLK_FRONT| |9P_FRONT|
-- --  --
--   -

   -  ---
---
   |KVM_PCI_BUS|  |hypercall_ops|
|shared_mem_virtio|
   -  ---
---

So the 9P implementation should add the front end logic and the
p9_pci_backend that glues the shared_memory, pci_bus and hypercalls
together.



>That's also in our plans. There was no virtio support in KVM when I
>started working in the transport.
>
>Thanks,
>Lucho
>
>On 8/29/07, Anthony Liguori <[EMAIL PROTECTED]> wrote:
>> I think that it would be nicer to implement the p9 transport on top
of
>> virtio instead of directly on top of PCI.  I think your PCI transport
>> would make a pretty nice start of a PCI virtio transport though.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>> On Tue, 2007-08-28 at 13:52 -0500, Eric Van Hensbergen wrote:
>> > From: Latchesar Ionkov <[EMAIL PROTECTED]>
>> >
>> > This adds a shared memory transport for a synthetic 9p device for
>> > paravirtualized file system support under KVM/QEMU.
>> >
>> > Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]>
>> > Signed-off-by: Eric Van Hensbergen <[EMAIL PROTECTED]>
>> > ---
>> >  Documentation/filesystems/9p.txt |2 +
>> >  net/9p/Kconfig   |   10 ++-
>> >  net/9p/Makefile  |4 +
>> >  net/9p/trans_pci.c   |  295
>++
>> >  4 files changed, 310 insertions(+), 1 deletions(-)
>> >  create mode 100644 net/9p/trans_pci.c
>> >
>> > diff --git a/Documentation/filesystems/9p.txt
>b/Documentation/filesystems/9p.txt
>> > index 1a5f50d..e1879bd 100644
>> > --- a/Documentation/filesystems/9p.txt
>> > +++ b/Documentation/filesystems/9p.txt
>> > @@ -46,6 +46,8 @@ OPTIONS
>> >   tcp  - specifying a normal TCP/IP connection
>> >   fd   - used passed file descriptors for
>connection
>> >  (see rfdno and wfdno)
>> > + pci  - use a PCI pseudo device for 9p
>communication
>> > + over shared memory between a guest
and
>host
>> >
>> >uname=name user name to attempt mount as on the remote server.
>The
>> >   server may override or ignore this value.  Certain
>user
>> > diff --git a/net/9p/Kconfig b/net/9p/Kconfig
>> > index 09566ae..8517560 100644
>> > --- a/net/9p/Kconfig
>> > +++ b/net/9p/Kconfig
>> > @@ -16,13 +16,21 @@ menuconfig NET_9P
>> >  config NET_9P_FD
>> >   depends on NET_9P
>> >   default y if NET_9P
>> > - tristate "9P File Descriptor Transports (Experimental)"
>> > + tristate "9p File Descriptor Transports (Experimental)"
>> >   help
>> > This builds support for file descriptor transports for 9p
>> > which includes support for TCP/IP, named pipes, or passed
>> > file descriptors.  TCP/IP is the default transport for 9p,
>> > so if you are going to use 9p, you'll likely want this.
>> >
>> > +config NET_9P_PCI
>> > + depends on NET_9P
>> > + tristate "9p PCI Shared Memory Transport (Experimental)"
>> > + help
>> > +   This builds support for a PCI psuedo-device currently
>available
>> > +   under KVM/QEMU which allows for 9p transactions over shared
>> > +   memory between the guest and the host.
>> > +
>> >  config NET_9P_DEBUG
>> >   bool "Debug information"
>> >   depends on NET_9P
>> > diff --git a/net/9p/Makefile b/net/9p/Makefile
>> > index 7b2a67a..26ce89d 100644
>> > --- a/net/9p/Makefile
>> > +++ b/net/9p/Makefile
>> > @@ -1,5 +1,6 @@
>> >  obj-$(CONFIG_NET_9P) := 9pnet.o
>> >  obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o
>> > +obj-$(CONFIG_NET_9P_PCI) += 9pnet_pci.o
>> >
>> >  9pnet-objs := \
>> >   mod.o \
>> > @@ -14,3 +15,6 @@ obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o
>> >
>> >  9pnet_fd-objs := \
>> >   trans_fd.o \
>> > +
>> > +9pnet_pci-objs := \
>> > + trans_pci.o \
>> > diff --git a/net/9p/trans_pci.c b/net/9p/trans_pci.c
>> > new file mode 100644
>> > index 000..36ddc5f
>> > --- /dev/null
>> > +++ b/net/9p/trans_pci.c
>> > @@ -0,0 +1,295 @@
>> > +/*
>> > + * net/9p/trans_pci.c
>> > + *
>> > + * 9P over PCI transport layer. For use with KVM/QEMU.
>> > + *
>> > + *  Copyright (C) 2007 by Latchesar Ionkov <[EMAIL PROTECTED]>
>> > + *
>> > + *  This program is free software; you can redistribute it and/or
>modify
>> > + *  it under the terms of the GNU General Public License version 2
>> > + *  as published by the Free Software Foundation.
>> > + *
>> > + *  This program is distributed in the hope that it will be
useful,
>> > + *  but WITHOUT ANY WARRANTY; without even the implied 

RE: [Lguest] [V9fs-developer] [kvm-devel] [RFC] 9p: add KVM/QEMUpci transport

2007-08-29 Thread Dor Laor
My current view of the IO stack is the following:

-- --  --
--   -
|NET_PCI_BACK| |BLK_PCI_BACK|  |9P_PCI_BACK|   |NET_FRONT|
|BLK_FRONT| |9P_FRONT|
-- --  --
--   -

   -  ---
---
   |KVM_PCI_BUS|  |hypercall_ops|
|shared_mem_virtio|
   -  ---
---

So the 9P implementation should add the front end logic and the
p9_pci_backend that glues the shared_memory, pci_bus and hypercalls
together.



That's also in our plans. There was no virtio support in KVM when I
started working in the transport.

Thanks,
Lucho

On 8/29/07, Anthony Liguori [EMAIL PROTECTED] wrote:
 I think that it would be nicer to implement the p9 transport on top
of
 virtio instead of directly on top of PCI.  I think your PCI transport
 would make a pretty nice start of a PCI virtio transport though.

 Regards,

 Anthony Liguori

 On Tue, 2007-08-28 at 13:52 -0500, Eric Van Hensbergen wrote:
  From: Latchesar Ionkov [EMAIL PROTECTED]
 
  This adds a shared memory transport for a synthetic 9p device for
  paravirtualized file system support under KVM/QEMU.
 
  Signed-off-by: Latchesar Ionkov [EMAIL PROTECTED]
  Signed-off-by: Eric Van Hensbergen [EMAIL PROTECTED]
  ---
   Documentation/filesystems/9p.txt |2 +
   net/9p/Kconfig   |   10 ++-
   net/9p/Makefile  |4 +
   net/9p/trans_pci.c   |  295
++
   4 files changed, 310 insertions(+), 1 deletions(-)
   create mode 100644 net/9p/trans_pci.c
 
  diff --git a/Documentation/filesystems/9p.txt
b/Documentation/filesystems/9p.txt
  index 1a5f50d..e1879bd 100644
  --- a/Documentation/filesystems/9p.txt
  +++ b/Documentation/filesystems/9p.txt
  @@ -46,6 +46,8 @@ OPTIONS
tcp  - specifying a normal TCP/IP connection
fd   - used passed file descriptors for
connection
   (see rfdno and wfdno)
  + pci  - use a PCI pseudo device for 9p
communication
  + over shared memory between a guest
and
host
 
 uname=name user name to attempt mount as on the remote server.
The
server may override or ignore this value.  Certain
user
  diff --git a/net/9p/Kconfig b/net/9p/Kconfig
  index 09566ae..8517560 100644
  --- a/net/9p/Kconfig
  +++ b/net/9p/Kconfig
  @@ -16,13 +16,21 @@ menuconfig NET_9P
   config NET_9P_FD
depends on NET_9P
default y if NET_9P
  - tristate 9P File Descriptor Transports (Experimental)
  + tristate 9p File Descriptor Transports (Experimental)
help
  This builds support for file descriptor transports for 9p
  which includes support for TCP/IP, named pipes, or passed
  file descriptors.  TCP/IP is the default transport for 9p,
  so if you are going to use 9p, you'll likely want this.
 
  +config NET_9P_PCI
  + depends on NET_9P
  + tristate 9p PCI Shared Memory Transport (Experimental)
  + help
  +   This builds support for a PCI psuedo-device currently
available
  +   under KVM/QEMU which allows for 9p transactions over shared
  +   memory between the guest and the host.
  +
   config NET_9P_DEBUG
bool Debug information
depends on NET_9P
  diff --git a/net/9p/Makefile b/net/9p/Makefile
  index 7b2a67a..26ce89d 100644
  --- a/net/9p/Makefile
  +++ b/net/9p/Makefile
  @@ -1,5 +1,6 @@
   obj-$(CONFIG_NET_9P) := 9pnet.o
   obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o
  +obj-$(CONFIG_NET_9P_PCI) += 9pnet_pci.o
 
   9pnet-objs := \
mod.o \
  @@ -14,3 +15,6 @@ obj-$(CONFIG_NET_9P_FD) += 9pnet_fd.o
 
   9pnet_fd-objs := \
trans_fd.o \
  +
  +9pnet_pci-objs := \
  + trans_pci.o \
  diff --git a/net/9p/trans_pci.c b/net/9p/trans_pci.c
  new file mode 100644
  index 000..36ddc5f
  --- /dev/null
  +++ b/net/9p/trans_pci.c
  @@ -0,0 +1,295 @@
  +/*
  + * net/9p/trans_pci.c
  + *
  + * 9P over PCI transport layer. For use with KVM/QEMU.
  + *
  + *  Copyright (C) 2007 by Latchesar Ionkov [EMAIL PROTECTED]
  + *
  + *  This program is free software; you can redistribute it and/or
modify
  + *  it under the terms of the GNU General Public License version 2
  + *  as published by the Free Software Foundation.
  + *
  + *  This program is distributed in the hope that it will be
useful,
  + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
  + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + *  GNU General Public License for more details.
  + *
  + *  You should have received a copy of the GNU General Public
License
  + *  along with this program; if not, write to:
  + *  Free Software Foundation
  + *  51 Franklin Street, Fifth 

RE: [Lguest] [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor
>>
>> Nice driver. I'm hoping we can do a virtio driver using a similar
>> concept.
>>
>> > +#define PCI_VENDOR_ID_9P 0x5002
>> > +#define PCI_DEVICE_ID_9P 0x000D
>>
>> Where do these numbers come from? Can we be sure they don't conflict
>with
>> actual hardware?
>
>I stole the VENDOR_ID from kvm's hypercall driver. There are no any
>guarantees that it doesn't conflict with actual hardware. As it was
>discussed before, there is still no ID assigned for the virtual
>devices.


Currently 5002 does not registered to Qumranet nor KVM.
We will do something about it pretty soon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor
>> > This adds a shared memory transport for a synthetic 9p device for
>> > paravirtualized file system support under KVM/QEMU.
>>
>> Nice driver. I'm hoping we can do a virtio driver using a similar
>> concept.
>>
>
>Yes.  I'm looking at the patches from Dor now, it should be pretty
>straight forward.  The PCI is interesting in its own right for other
>(non-virtual) projects we've been playing with
>
> -eric

Great, we can add lots of pci bus shared functionality into the
kvm_pci_bus.c
--Dor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor
  This adds a shared memory transport for a synthetic 9p device for
  paravirtualized file system support under KVM/QEMU.

 Nice driver. I'm hoping we can do a virtio driver using a similar
 concept.


Yes.  I'm looking at the patches from Dor now, it should be pretty
straight forward.  The PCI is interesting in its own right for other
(non-virtual) projects we've been playing with

 -eric

Great, we can add lots of pci bus shared functionality into the
kvm_pci_bus.c
--Dor
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Lguest] [kvm-devel] [RFC] 9p: add KVM/QEMU pci transport

2007-08-28 Thread Dor Laor

 Nice driver. I'm hoping we can do a virtio driver using a similar
 concept.

  +#define PCI_VENDOR_ID_9P 0x5002
  +#define PCI_DEVICE_ID_9P 0x000D

 Where do these numbers come from? Can we be sure they don't conflict
with
 actual hardware?

I stole the VENDOR_ID from kvm's hypercall driver. There are no any
guarantees that it doesn't conflict with actual hardware. As it was
discussed before, there is still no ID assigned for the virtual
devices.


Currently 5002 does not registered to Qumranet nor KVM.
We will do something about it pretty soon.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
>> >>   Guest0  -   blocked on I/O
>> >>
>> >>   IRQ14 from your hardware
>> >>   Block IRQ14
>> >>   Sent to guest (guest is blocked)
>> >>
>> >>   IRQ14 from hard disk
>> >>   Ignored (as blocked)
>> >>
>>
>>
>> But now the timer will pop and the hard disk will get its irq.
>> The guest will be released right after.
>
>How do you plan to do this ? If you unmask the interrupt then it will
>immediately jam solid with IRQs from your hardware and the line will be
>disabled.

Hope it should work like the following [Please correct me if I'm wrong]:
- Make the device the last irqaction in the list
- Our dummy handler will always return IRQ_HANDLED in case any other
previous
  irqaction did not return such. It will also issue the timer and mask
the irq in this case.
  The line is temporarily jammed but not disabled - note_interrupt()
will not consider 
  our irq unhandled and won't disable it.
  btw, if I'm not mistaken only after bad 99900/10 the irq is
disabled.
- If the timer pops before the guest acks the irq, the timer handler
will
  ack the irq and unmask it. The timer's job is only to prevent
deadlocks.

Maybe it's better to code it first then send RFC.
Or wanted to get a feed back before hand to hear opinions and to know
whether to use the 
deferred option or the irq polarity option. Both of them can lead to the
above deadlock 
without the timer hack.
Best regards, Dor.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
>Alan Cox wrote:
>>> What if we will force the specific device to the end of the list.
>Once
>>> IRQ_NONE was returned by the other devices, we will mask the irq,
>>> forward the irq to the guest, issue a timer for 1msec. Motivation:
>>> 1msec is long enough for the guest to ack the irq + host unmask the
>irq
>>>
>>
>> It makes no difference. The deadlock isn't fixable by timing hacks.
>> Consider the following sequence
>>
>>
>>  Guest0  -   blocked on I/O
>>
>>  IRQ14 from your hardware
>>  Block IRQ14
>>  Sent to guest (guest is blocked)
>>
>>  IRQ14 from hard disk
>>  Ignored (as blocked)
>>


But now the timer will pop and the hard disk will get its irq.
The guest will be released right after.

>>  Deadlock
>>
>
>IMO the only reasonable solution is to disallow interrupt forwarding
>with shared irqs.  If someone later comes up with a bright idea, we can
>implement it.  Otherwise the problem will solve itself with hardware
>moving to msi.
>

I though of that but the problem is that we'd like to use it with
current hardware
devices that are shared.
:(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
>> In particular, this requires interrupt handling to be done by the
>guest --
>> The host shouldn't load the corresponding device driver or otherwise
>access
>> the device. Since the host kernel is not aware of the device
semantics
>it
>> cannot acknowledge the interrupt at the device level.
>
>Tricky indeed.
>
>> As far as the host kernel is concerned the VM is a user level
process.
>We
>> require the ability to forward interrupt handling to such entities.
>The
>> current kernel interrupt handling path doesn't allow deferring
>interrupt
>> handling _and_ acknowledgement.
>
>We don't support this model at all, and it doesn't appear to work
>anyway.
>
>> 0. Adding an IRQ_DEFERRED mechanism to the interrupt handling path.
>ISRs
>> returning IRQ_DEFERRED will keep the interrupt masked until future
>> acknowledge.
>
>Deadlock. If you get an IRQ for a guest and you block the IRQ until the
>guest handles it you may (eg if the IRQ is shared) get priority
>inversion
>with another interrupt source on the same line the guest requires first
>(eg disks and other I/O)

What if we will force the specific device to the end of the list. Once
IRQ_NONE was returned by the other devices, we will mask the irq,
forward the irq to the guest, issue a timer for 1msec. Motivation:
1msec is long enough for the guest to ack the irq + host unmask the irq
+
cancell the timer. (ping round-trip for a guest is about 100msec)
If the timer poped, it will unmask irqs + run over the device list to
check
whether one of them has a pending irq.

This will solve the deadlock possiblity in a small price of potential
latency.

...

>> Any ideas ? Thoughts ?
>
>Mask the interrupt in the main kernel, pass an event of some kind to
the
>guest. You can describe most devices from guest to kernel in a safe
form
>as
>
>device, bar, offset, register size, mask, bits to set, bits to clear
>
>(or bits to test when deciding if it is the irq source)
>

The problem is that each device has its own bits and it cannot be a
general solution.
Except that the device driver inside the guest should be changed because
the host already
disabled the irq/status for them.


I know the above solution in not neat but we do want to contribute it.
Any other ideas are welcome,
10x, Dor.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
 In particular, this requires interrupt handling to be done by the
guest --
 The host shouldn't load the corresponding device driver or otherwise
access
 the device. Since the host kernel is not aware of the device
semantics
it
 cannot acknowledge the interrupt at the device level.

Tricky indeed.

 As far as the host kernel is concerned the VM is a user level
process.
We
 require the ability to forward interrupt handling to such entities.
The
 current kernel interrupt handling path doesn't allow deferring
interrupt
 handling _and_ acknowledgement.

We don't support this model at all, and it doesn't appear to work
anyway.

 0. Adding an IRQ_DEFERRED mechanism to the interrupt handling path.
ISRs
 returning IRQ_DEFERRED will keep the interrupt masked until future
 acknowledge.

Deadlock. If you get an IRQ for a guest and you block the IRQ until the
guest handles it you may (eg if the IRQ is shared) get priority
inversion
with another interrupt source on the same line the guest requires first
(eg disks and other I/O)

What if we will force the specific device to the end of the list. Once
IRQ_NONE was returned by the other devices, we will mask the irq,
forward the irq to the guest, issue a timer for 1msec. Motivation:
1msec is long enough for the guest to ack the irq + host unmask the irq
+
cancell the timer. (ping round-trip for a guest is about 100msec)
If the timer poped, it will unmask irqs + run over the device list to
check
whether one of them has a pending irq.

This will solve the deadlock possiblity in a small price of potential
latency.

...

 Any ideas ? Thoughts ?

Mask the interrupt in the main kernel, pass an event of some kind to
the
guest. You can describe most devices from guest to kernel in a safe
form
as

device, bar, offset, register size, mask, bits to set, bits to clear

(or bits to test when deciding if it is the irq source)


The problem is that each device has its own bits and it cannot be a
general solution.
Except that the device driver inside the guest should be changed because
the host already
disabled the irq/status for them.


I know the above solution in not neat but we do want to contribute it.
Any other ideas are welcome,
10x, Dor.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
Alan Cox wrote:
 What if we will force the specific device to the end of the list.
Once
 IRQ_NONE was returned by the other devices, we will mask the irq,
 forward the irq to the guest, issue a timer for 1msec. Motivation:
 1msec is long enough for the guest to ack the irq + host unmask the
irq


 It makes no difference. The deadlock isn't fixable by timing hacks.
 Consider the following sequence


  Guest0  -   blocked on I/O

  IRQ14 from your hardware
  Block IRQ14
  Sent to guest (guest is blocked)

  IRQ14 from hard disk
  Ignored (as blocked)



But now the timer will pop and the hard disk will get its irq.
The guest will be released right after.

  Deadlock


IMO the only reasonable solution is to disallow interrupt forwarding
with shared irqs.  If someone later comes up with a bright idea, we can
implement it.  Otherwise the problem will solve itself with hardware
moving to msi.


I though of that but the problem is that we'd like to use it with
current hardware
devices that are shared.
:(
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [RFC] Deferred interrupt handling.

2007-07-18 Thread Dor Laor
Guest0  -   blocked on I/O
 
IRQ14 from your hardware
Block IRQ14
Sent to guest (guest is blocked)
 
IRQ14 from hard disk
Ignored (as blocked)
 


 But now the timer will pop and the hard disk will get its irq.
 The guest will be released right after.

How do you plan to do this ? If you unmask the interrupt then it will
immediately jam solid with IRQs from your hardware and the line will be
disabled.

Hope it should work like the following [Please correct me if I'm wrong]:
- Make the device the last irqaction in the list
- Our dummy handler will always return IRQ_HANDLED in case any other
previous
  irqaction did not return such. It will also issue the timer and mask
the irq in this case.
  The line is temporarily jammed but not disabled - note_interrupt()
will not consider 
  our irq unhandled and won't disable it.
  btw, if I'm not mistaken only after bad 99900/10 the irq is
disabled.
- If the timer pops before the guest acks the irq, the timer handler
will
  ack the irq and unmask it. The timer's job is only to prevent
deadlocks.

Maybe it's better to code it first then send RFC.
Or wanted to get a feed back before hand to hear opinions and to know
whether to use the 
deferred option or the irq polarity option. Both of them can lead to the
above deadlock 
without the timer hack.
Best regards, Dor.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [ANNOUNCE] kvm-15 release

2007-02-26 Thread Dor Laor

>Avi Kivity wrote:
>> - new userspace interface (work in progress)
>
>kvmfs in kvm-15 kernel code does not to build with older kernels
(2.6.16
>fails, 2.6.18 works ok), looks like the reason are some changes in
>superblock handling.
>
>Do you intend to fix that?

Did you run the make sync under the svn kernel directory?
It uses sed to replace the f_path.dentry with backward compatible
f_dentry.

>
>cheers,
>  Gerd
>
>--
>Gerd Hoffmann <[EMAIL PROTECTED]>
>
>---
--
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share
>your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php=sourceforge=DEVD
EV
>___
>kvm-devel mailing list
>kvm-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/kvm-devel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [ANNOUNCE] kvm-15 release

2007-02-26 Thread Dor Laor

Avi Kivity wrote:
 - new userspace interface (work in progress)

kvmfs in kvm-15 kernel code does not to build with older kernels
(2.6.16
fails, 2.6.18 works ok), looks like the reason are some changes in
superblock handling.

Do you intend to fix that?

Did you run the make sync under the svn kernel directory?
It uses sed to replace the f_path.dentry with backward compatible
f_dentry.


cheers,
  Gerd

--
Gerd Hoffmann [EMAIL PROTECTED]

---
--
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVD
EV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [PATCH 10/13] KVM: Wire up hypercall handlers toa central arch-independent location

2007-02-22 Thread Dor Laor
>
>
>> Somthing else that came up in a conversation with Dor: the need for a
>> clean way to raise a guest interrupt.  The guest may be sleeping in
>> userspace, scheduled out, or running on another cpu (and requiring an
>> ipi to get it out of guest mode).
>
>yeah it'd be nice if I could just call a function for it rather than
>poking into kvm internals ;)
>
>> Right now I'm thinking about using the signal machinery since it
appears
>> to do exactly the right thing.
>
>signals are *expensive* though.
>
>If you design an interrupt interface, it'd rock if you could make it
>such that it is "raise  interrupt within  miliseconds from
>now", rather than making it mostly synchronous. That way irq mitigation
>becomes part of the interface rather than having to duplicate it all
>over the virtual drivers...

Why do you need to raise an interrupt within a timeout?
I thought on just asking for a synchronous, as-fast-as-you-can-get
interrupt. If you need an interrupt that should pop within some
milliseconds you can set a timer.

>
>
>
>--
>if you want to mail me at work (you don't), use arjan (at)
linux.intel.com
>Test the interaction between Linux and your BIOS via
>http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [PATCH 10/13] KVM: Wire up hypercall handlers to a central arch-independent location

2007-02-22 Thread Dor Laor
>
>Pavel Machek wrote:
>> On Mon 2007-02-19 10:30:52, Avi Kivity wrote:
>>
>>> Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
>>>
>>
>> changelog?
>>
>>
>
>Well, I can't think of anything to add beyond $subject.  The patch adds
>calls from the arch-dependent hypercall handlers to a new arch
>independent function.
>
>
>>> +   switch (nr) {
>>> +   default:
>>> +   ;
>>> +   }
>>>
>>
>> Eh?
>>
>>
>
>No hypercalls defined yet.
>

I have Ingo's network PV hypercalls to commit in my piplien.
Till then we can just add the test hypercall:
case __NR_hypercall_test:
printk(KERN_DEBUG "%s __NR_hypercall_test\n",
__FUNCTION__);
ret = 0x5a5a;
break;
default:
BUG();

>
>--
>error compiling committee.c: too many arguments to function
>
>
>---
--
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share
>your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php=sourceforge=DEVD
EV
>___
>kvm-devel mailing list
>kvm-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/kvm-devel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [PATCH 10/13] KVM: Wire up hypercall handlers to a central arch-independent location

2007-02-22 Thread Dor Laor

Pavel Machek wrote:
 On Mon 2007-02-19 10:30:52, Avi Kivity wrote:

 Signed-off-by: Avi Kivity [EMAIL PROTECTED]


 changelog?



Well, I can't think of anything to add beyond $subject.  The patch adds
calls from the arch-dependent hypercall handlers to a new arch
independent function.


 +   switch (nr) {
 +   default:
 +   ;
 +   }


 Eh?



No hypercalls defined yet.


I have Ingo's network PV hypercalls to commit in my piplien.
Till then we can just add the test hypercall:
case __NR_hypercall_test:
printk(KERN_DEBUG %s __NR_hypercall_test\n,
__FUNCTION__);
ret = 0x5a5a;
break;
default:
BUG();


--
error compiling committee.c: too many arguments to function


---
--
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVD
EV
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [PATCH 10/13] KVM: Wire up hypercall handlers toa central arch-independent location

2007-02-22 Thread Dor Laor


 Somthing else that came up in a conversation with Dor: the need for a
 clean way to raise a guest interrupt.  The guest may be sleeping in
 userspace, scheduled out, or running on another cpu (and requiring an
 ipi to get it out of guest mode).

yeah it'd be nice if I could just call a function for it rather than
poking into kvm internals ;)

 Right now I'm thinking about using the signal machinery since it
appears
 to do exactly the right thing.

signals are *expensive* though.

If you design an interrupt interface, it'd rock if you could make it
such that it is raise this interrupt within x miliseconds from
now, rather than making it mostly synchronous. That way irq mitigation
becomes part of the interface rather than having to duplicate it all
over the virtual drivers...

Why do you need to raise an interrupt within a timeout?
I thought on just asking for a synchronous, as-fast-as-you-can-get
interrupt. If you need an interrupt that should pop within some
milliseconds you can set a timer.




--
if you want to mail me at work (you don't), use arjan (at)
linux.intel.com
Test the interaction between Linux and your BIOS via
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.20] KVM: Use ARRAY_SIZE macro when appropriate

2007-02-06 Thread Dor Laor
>
>Hi all,
>
>A patch to use ARRAY_SIZE macro already defined in kernel.h
>
>Signed-off-by: Ahmed S. Darwish <[EMAIL PROTECTED]>

Applied, 10x
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.20] KVM: Use ARRAY_SIZE macro when appropriate

2007-02-06 Thread Dor Laor

Hi all,

A patch to use ARRAY_SIZE macro already defined in kernel.h

Signed-off-by: Ahmed S. Darwish [EMAIL PROTECTED]

Applied, 10x
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] kvm & dyntick

2007-01-12 Thread Dor Laor
>* Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
>> > dyntick-enabled guest:
>> > - reduce the load on the host when the guest is idling
>> >   (currently an idle guest consumes a few percent cpu)
>>
>> yeah. KVM under -rt already works with dynticks enabled on both the
>> host and the guest. (but it's more optimal to use a dedicated
>> hypercall to set the next guest-interrupt)
>
>using the dynticks code from the -rt kernel makes the overhead of an
>idle guest go down by a factor of 10-15:
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 2556 mingo 15   0  598m 159m 157m R  1.5  8.0   0:26.20 qemu
>
>( for this to work on my system i have added a 'hyper' clocksource
>  hypercall API for KVM guests to use - this is needed instead of the
>  running-to-slowly TSC. )
>
>   Ingo

This is great news for PV guests.

Never-the-less we still need to improve our full virtualized guest
support. 
First we need a mechanism (can we use the timeout_granularity?) to
dynamically change the host timer frequency so we can support guests
with 100hz that dynamically change their freq to 1000hz and back.

Afterwards we'll need to compensate the lost alarm signals to the guests
by using one of 
 - hrtimers to inject the lost interrupts for specific guests. The
problem this will increase the overall load.
 - Injecting several virtual irq to the guests one after another (using
interrupt window exit). The question is how the guest will be effected
from this unfair behavior.

Can dyntick help HVMs? Will the answer be the same for guest-dense
hosts? I understood that the main gain of dyn-tick is for idle time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] kvm dyntick

2007-01-12 Thread Dor Laor
* Ingo Molnar [EMAIL PROTECTED] wrote:

  dyntick-enabled guest:
  - reduce the load on the host when the guest is idling
(currently an idle guest consumes a few percent cpu)

 yeah. KVM under -rt already works with dynticks enabled on both the
 host and the guest. (but it's more optimal to use a dedicated
 hypercall to set the next guest-interrupt)

using the dynticks code from the -rt kernel makes the overhead of an
idle guest go down by a factor of 10-15:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 2556 mingo 15   0  598m 159m 157m R  1.5  8.0   0:26.20 qemu

( for this to work on my system i have added a 'hyper' clocksource
  hypercall API for KVM guests to use - this is needed instead of the
  running-to-slowly TSC. )

   Ingo

This is great news for PV guests.

Never-the-less we still need to improve our full virtualized guest
support. 
First we need a mechanism (can we use the timeout_granularity?) to
dynamically change the host timer frequency so we can support guests
with 100hz that dynamically change their freq to 1000hz and back.

Afterwards we'll need to compensate the lost alarm signals to the guests
by using one of 
 - hrtimers to inject the lost interrupts for specific guests. The
problem this will increase the overall load.
 - Injecting several virtual irq to the guests one after another (using
interrupt window exit). The question is how the guest will be effected
from this unfair behavior.

Can dyntick help HVMs? Will the answer be the same for guest-dense
hosts? I understood that the main gain of dyn-tick is for idle time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: open /dev/kvm: No such file or directory

2006-12-28 Thread Dor Laor


>On linux-26..20-rc2, "modprobe kvm-intel" loaded the module
>successful, but running qemu returns a error ...
>
>/usr/local/kvm/bin/qemu -hda vdisk.img -cdrom cd.iso -boot d -m 128
>open /dev/kvm: No such file or directory
>Could not initialize KVM, will disable KVM support

Are you sure the kvm_intel & kvm modules are loaded?
Maybe you're bios does not support virtualization.
Please check your dmesg.

>
>/dev/kvm does not exist should I create this before running qemu?
>If so, what's the parameters to "mknod"?

It's a dynamic misc device, you don't need to create it.

>
>
>Thanks,
>Jeff.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: open /dev/kvm: No such file or directory

2006-12-28 Thread Dor Laor


On linux-26..20-rc2, modprobe kvm-intel loaded the module
successful, but running qemu returns a error ...

/usr/local/kvm/bin/qemu -hda vdisk.img -cdrom cd.iso -boot d -m 128
open /dev/kvm: No such file or directory
Could not initialize KVM, will disable KVM support

Are you sure the kvm_intel  kvm modules are loaded?
Maybe you're bios does not support virtualization.
Please check your dmesg.


/dev/kvm does not exist should I create this before running qemu?
If so, what's the parameters to mknod?

It's a dynamic misc device, you don't need to create it.



Thanks,
Jeff.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel
in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/