Re: Re: Re: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, That is, you can't feed distinct instances of guest kernel trace. I'm not clear for "distinct instances". Is this about SMP or multiple guests? Would you explain about this? Distinct boot instances. If the guest reboots TSC can be written to. OK, I understood. I'll resend a patch series for this feature. Thanks! -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
On Fri, Nov 30, 2012 at 10:36:43AM +0900, Yoshihiro YUNOMAE wrote: > Hi Marcelo, > > >>>Can you please write a succint but complete description of the method > >>>so it can be verified? > >> > >>Sure. > >> > >>- Prerequisite > >>1. the host TSC is synchronized and stable. > >>2. kvm_write_tsc_offset events include previous and next TSC offset > >>values. > >>3. Every event's trace_clock is TSC. > >> > >>- Assumption for the sake of ease > >>1. One VCPU > >>2. The guest executes no write_tsc or write_tsc only once. > >> > >>- Configuration > >>1. On the host, kvm_exit/entry events are recorded in the buffer A and > >>kvm_write_tsc_offset events are recorded in the buffer B. > >>2. Boot a guest > >> > >>- Sort method > >>1. > >> Confirm which the kvm_write_tsc_offset events are recorded except for > >>boot. Note that a vcpu thread writes TSC offset when boot as an initial > >>operation. > >> > >>1-1. > >> If not recorded, it means that the guest did not execute write_tsc. > >>So, we convert the guest TSC to the host TSC using TSC offset of boot. > >>=> END > >> > >>1-2. > >> If recorded, it means that the guest executed write_tsc. > >>So, we use new kvm_write_tsc_offset event information. > >> > >>2. > >>We convert the host TSC(Th) of the kvm_write_tsc_offset event to > >>the guest TSC(Tg) using previous_tsc_offset value: > >> Tg = Th + previous_tsc_offset > >> > >>3. > >>To search the point where the guest executed write_tsc, we find "n" > >>which satisfies the condition Tn < Tg < Tn+1 or Tn+1 < Tn < Tg from > >>older events of the guest. > >>The former condition means trace data of > >>the guest were recorded monotonically. On the other hand, the latter > >>condition means trace data of the guest moved backward. > >>4. > >>We convert the guest TSC of trace data to the host TSC using > >>previous_tsc_offset value before "n" and using next_tsc_offset value > >>after "n+1". > >>=> END > >> > >>- Note > >>We assumed one vcpu and no write_tsc or write_tsc only once for the > >>sake of ease. For other conditions, we will use similar method. > >> > >>Thanks, > > > >There is no certainty. Consider the following information available > > > >guest trace host trace > > 100: guest_tsc_write (tsc_offset=-100 => > > guest_tsc = 0) > > 104: guest_tsc_write (tsc_offset=-104 => > > guest_tsc = 0) > > 108: guest_tsc_write (tsc_offset=-108 => > > guest_tsc = 0) > >1: eventA > >2: eventB > >3: eventC > >1: eventD > >2: eventE > >3: eventF > > > >How can you tell which tsc_offset to use for eventD ? It could be either > >-104 or -108. The notion of "next_tsc_offset" is subject to such > >issue. > > > >I suppose its fine to restrict the interface as follows: the tool will > >accept one trace of events with monotonic timestamps and the user is > >responsible for correlating that to a host trace. > > OK, I'll add the restriction, which trace data must have monotonic > timestamps to use this feature. I think guests seldom will execute > write_tsc, so in many cases, timestamps will be monotonically recorded > in trace data. > > >That is, you can't feed distinct instances of guest kernel trace. > > I'm not clear for "distinct instances". Is this about SMP or multiple > guests? Would you explain about this? Distinct boot instances. If the guest reboots TSC can be written to. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, Can you please write a succint but complete description of the method so it can be verified? Sure. - Prerequisite 1. the host TSC is synchronized and stable. 2. kvm_write_tsc_offset events include previous and next TSC offset values. 3. Every event's trace_clock is TSC. - Assumption for the sake of ease 1. One VCPU 2. The guest executes no write_tsc or write_tsc only once. - Configuration 1. On the host, kvm_exit/entry events are recorded in the buffer A and kvm_write_tsc_offset events are recorded in the buffer B. 2. Boot a guest - Sort method 1. Confirm which the kvm_write_tsc_offset events are recorded except for boot. Note that a vcpu thread writes TSC offset when boot as an initial operation. 1-1. If not recorded, it means that the guest did not execute write_tsc. So, we convert the guest TSC to the host TSC using TSC offset of boot. => END 1-2. If recorded, it means that the guest executed write_tsc. So, we use new kvm_write_tsc_offset event information. 2. We convert the host TSC(Th) of the kvm_write_tsc_offset event to the guest TSC(Tg) using previous_tsc_offset value: Tg = Th + previous_tsc_offset 3. To search the point where the guest executed write_tsc, we find "n" which satisfies the condition Tn < Tg < Tn+1 or Tn+1 < Tn < Tg from older events of the guest. The former condition means trace data of the guest were recorded monotonically. On the other hand, the latter condition means trace data of the guest moved backward. 4. We convert the guest TSC of trace data to the host TSC using previous_tsc_offset value before "n" and using next_tsc_offset value after "n+1". => END - Note We assumed one vcpu and no write_tsc or write_tsc only once for the sake of ease. For other conditions, we will use similar method. Thanks, There is no certainty. Consider the following information available guest trace host trace 100: guest_tsc_write (tsc_offset=-100 => guest_tsc = 0) 104: guest_tsc_write (tsc_offset=-104 => guest_tsc = 0) 108: guest_tsc_write (tsc_offset=-108 => guest_tsc = 0) 1: eventA 2: eventB 3: eventC 1: eventD 2: eventE 3: eventF How can you tell which tsc_offset to use for eventD ? It could be either -104 or -108. The notion of "next_tsc_offset" is subject to such issue. I suppose its fine to restrict the interface as follows: the tool will accept one trace of events with monotonic timestamps and the user is responsible for correlating that to a host trace. OK, I'll add the restriction, which trace data must have monotonic timestamps to use this feature. I think guests seldom will execute write_tsc, so in many cases, timestamps will be monotonically recorded in trace data. That is, you can't feed distinct instances of guest kernel trace. I'm not clear for "distinct instances". Is this about SMP or multiple guests? Would you explain about this? Thanks, -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
On Tue, Nov 27, 2012 at 07:53:47PM +0900, Yoshihiro YUNOMAE wrote: > Hi Marcelo, > > (2012/11/27 8:16), Marcelo Tosatti wrote: > >On Mon, Nov 26, 2012 at 08:05:10PM +0900, Yoshihiro YUNOMAE wrote: > >500h. event tsc_write tsc_offset=-3000 > > > >Then a guest trace containing events with a TSC timestamp. > >Which tsc_offset to use? > > > >(that is the problem, which unless i am mistaken can only be solved > >easily if the guest can convert RDTSC -> TSC of host). > > There are three following cases of changing TSC offset: > 1. Reset TSC at guest boot time > 2. Adjust TSC offset due to some host's problems > 3. Write TSC on guests > The scenario which you mentioned is case 3, so we'll discuss this case. > Here, we assume that a guest is allocated single CPU for the sake of > ease. > > If a guest executes write_tsc, TSC values jumps to forward or backward. > For the forward case, trace data are as follows: > >< guest > > cyclesevents cycles events > 3000 tsc_offset=-2950 > 3001 kvm_enter > 53 eventX > > 100 (write_tsc=+900) > 3060 kvm_exit > 3075 tsc_offset=-2050 > 3080 kvm_enter > 1050 event1 > 1055 event2 > ... > > > This case is simple. The guest TSC of the first kvm_enter is calculated > as follows: > > (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 > > Similarly, the guest TSC of the second kvm_enter is 130. So, the guest > events between 51 and 130, that is, 53 eventX is inserted between the > first pair of kvm_enter and kvm_exit. To insert events of the guests > between 51 and 130, we convert the guest TSC to the host TSC using TSC > offset 2950. > > For the backward case, trace data are as follows: > > < guest > > cyclesevents cycles events > 3000 tsc_offset=-2950 > 3001 kvm_enter > 53 eventX > > 100 (write_tsc=-50) > 3060 kvm_exit > 3075 tsc_offset=-2050 > 3080 kvm_enter > 90 event1 > 95 event2 > ... > >>> > >>>3400 100(write_tsc=-50) > >>> > >>> 90event3 > >>> 95event4 > >>> > As you say, in this case, the previous method is invalid. When we > calculate the guest TSC value for the tsc_offset=-3000 event, the value > is 75 on the guest. This seems like prior event of write_tsc=-50 event. > So, we need to consider more. > > In this case, it is important that we can understand where the guest > executes write_tsc or the host rewrites the TSC offset. write_tsc on > the guest equals wrmsr 0x0010, so this instruction induces vm_exit. > This implies that the guest does not operate when the host changes TSC > offset on the cpu. In other words, the guest cannot use new TSC before > the host rewrites the new TSC offset. So, if timestamp on the guest is > not monotonically increased, we can understand the guest executes > write_tsc. Moreover, in the region where timestamp is decreasing, we > can understand when the host rewrote the TSC offset in the guest trace > data. Therefore, we can sort trace data in chronological order. > >>> > >>>This requires an entire trace of events. That is, to be able > >>>to reconstruct timeline you require the entire trace from the moment > >>>guest starts. So that you can correlate wrmsr-to-tsc on the guest with > >>>vmexit-due-to-tsc-write on the host. > >>> > >>>Which means that running out of space for trace buffer equals losing > >>>ability to order events. > >>> > >>>Is that desirable? It seems cumbersome to me. > >> > >>As you say, tracing events can overwrite important events like > >>kvm_exit/entry or write_tsc_offset. So, Steven's multiple buffer is > >>needed by this feature. Normal events which often hit record the buffer > >>A, and important events which rarely hit record the buffer B. In our > >>case, the important event is write_tsc_offset. > >>>Also the need to correlate each write_tsc event in the guest trace > >>>with a corresponding tsc_offset write in the host trace means that it > >>>is _necessary_ for the guest and host to enable tracing simultaneously. > >>>Correct? > >>> > >>>Also, there are WRMSR executions in the guest for which there is > >>>no event in the trace buffer. F
Re: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, (2012/11/27 8:16), Marcelo Tosatti wrote: On Mon, Nov 26, 2012 at 08:05:10PM +0900, Yoshihiro YUNOMAE wrote: 500h. event tsc_write tsc_offset=-3000 Then a guest trace containing events with a TSC timestamp. Which tsc_offset to use? (that is the problem, which unless i am mistaken can only be solved easily if the guest can convert RDTSC -> TSC of host). There are three following cases of changing TSC offset: 1. Reset TSC at guest boot time 2. Adjust TSC offset due to some host's problems 3. Write TSC on guests The scenario which you mentioned is case 3, so we'll discuss this case. Here, we assume that a guest is allocated single CPU for the sake of ease. If a guest executes write_tsc, TSC values jumps to forward or backward. For the forward case, trace data are as follows:< guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=+900) 3060 kvm_exit 3075 tsc_offset=-2050 3080 kvm_enter 1050 event1 1055 event2 ... This case is simple. The guest TSC of the first kvm_enter is calculated as follows: (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 Similarly, the guest TSC of the second kvm_enter is 130. So, the guest events between 51 and 130, that is, 53 eventX is inserted between the first pair of kvm_enter and kvm_exit. To insert events of the guests between 51 and 130, we convert the guest TSC to the host TSC using TSC offset 2950. For the backward case, trace data are as follows: < guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=-50) 3060 kvm_exit 3075 tsc_offset=-2050 3080 kvm_enter 90 event1 95 event2 ... 3400 100(write_tsc=-50) 90event3 95event4 As you say, in this case, the previous method is invalid. When we calculate the guest TSC value for the tsc_offset=-3000 event, the value is 75 on the guest. This seems like prior event of write_tsc=-50 event. So, we need to consider more. In this case, it is important that we can understand where the guest executes write_tsc or the host rewrites the TSC offset. write_tsc on the guest equals wrmsr 0x0010, so this instruction induces vm_exit. This implies that the guest does not operate when the host changes TSC offset on the cpu. In other words, the guest cannot use new TSC before the host rewrites the new TSC offset. So, if timestamp on the guest is not monotonically increased, we can understand the guest executes write_tsc. Moreover, in the region where timestamp is decreasing, we can understand when the host rewrote the TSC offset in the guest trace data. Therefore, we can sort trace data in chronological order. This requires an entire trace of events. That is, to be able to reconstruct timeline you require the entire trace from the moment guest starts. So that you can correlate wrmsr-to-tsc on the guest with vmexit-due-to-tsc-write on the host. Which means that running out of space for trace buffer equals losing ability to order events. Is that desirable? It seems cumbersome to me. As you say, tracing events can overwrite important events like kvm_exit/entry or write_tsc_offset. So, Steven's multiple buffer is needed by this feature. Normal events which often hit record the buffer A, and important events which rarely hit record the buffer B. In our case, the important event is write_tsc_offset. Also the need to correlate each write_tsc event in the guest trace with a corresponding tsc_offset write in the host trace means that it is _necessary_ for the guest and host to enable tracing simultaneously. Correct? Also, there are WRMSR executions in the guest for which there is no event in the trace buffer. From SeaBIOS, during boot. In that case, there is no explicit event in the guest trace which you can correlate with tsc_offset changes in the host side. I understand that you want to say, but we don't correlate between write_tsc event and write_tsc_offset event directly. This is because the write_tsc tracepoint (also WRMSR instruction) is not prepared in the current kernel. So, in the previous mail (https://lkml.org/lkml/2012/11/22/53), I suggested the method which we don't need to prepare the write_tsc tracepoint. In the method, we enable ftrace before the guest boots, and we need to keep all write_tsc_offset events in the buffer. If we
Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
On Mon, Nov 26, 2012 at 08:05:10PM +0900, Yoshihiro YUNOMAE wrote: > >>>500h. event tsc_write tsc_offset=-3000 > >>> > >>>Then a guest trace containing events with a TSC timestamp. > >>>Which tsc_offset to use? > >>> > >>>(that is the problem, which unless i am mistaken can only be solved > >>>easily if the guest can convert RDTSC -> TSC of host). > >> > >>There are three following cases of changing TSC offset: > >> 1. Reset TSC at guest boot time > >> 2. Adjust TSC offset due to some host's problems > >> 3. Write TSC on guests > >>The scenario which you mentioned is case 3, so we'll discuss this case. > >>Here, we assume that a guest is allocated single CPU for the sake of > >>ease. > >> > >>If a guest executes write_tsc, TSC values jumps to forward or backward. > >>For the forward case, trace data are as follows: > >> > >>< guest > > >>cyclesevents cycles events > >> 3000 tsc_offset=-2950 > >> 3001 kvm_enter > >> 53 eventX > >> > >> 100 (write_tsc=+900) > >> 3060 kvm_exit > >> 3075 tsc_offset=-2050 > >> 3080 kvm_enter > >>1050 event1 > >>1055 event2 > >> ... > >> > >> > >>This case is simple. The guest TSC of the first kvm_enter is calculated > >>as follows: > >> > >> (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 > >> > >>Similarly, the guest TSC of the second kvm_enter is 130. So, the guest > >>events between 51 and 130, that is, 53 eventX is inserted between the > >>first pair of kvm_enter and kvm_exit. To insert events of the guests > >>between 51 and 130, we convert the guest TSC to the host TSC using TSC > >>offset 2950. > >> > >>For the backward case, trace data are as follows: > >> > >> < guest > > >>cyclesevents cycles events > >> 3000 tsc_offset=-2950 > >> 3001 kvm_enter > >> 53 eventX > >> > >> 100 (write_tsc=-50) > >> 3060 kvm_exit > >> 3075 tsc_offset=-2050 > >> 3080 kvm_enter > >> 90 event1 > >> 95 event2 > >> ... > > > >3400100(write_tsc=-50) > > > > 90event3 > > 95event4 > > > >>As you say, in this case, the previous method is invalid. When we > >>calculate the guest TSC value for the tsc_offset=-3000 event, the value > >>is 75 on the guest. This seems like prior event of write_tsc=-50 event. > >>So, we need to consider more. > >> > >>In this case, it is important that we can understand where the guest > >>executes write_tsc or the host rewrites the TSC offset. write_tsc on > >>the guest equals wrmsr 0x0010, so this instruction induces vm_exit. > >>This implies that the guest does not operate when the host changes TSC > >>offset on the cpu. In other words, the guest cannot use new TSC before > >>the host rewrites the new TSC offset. So, if timestamp on the guest is > >>not monotonically increased, we can understand the guest executes > >>write_tsc. Moreover, in the region where timestamp is decreasing, we > >>can understand when the host rewrote the TSC offset in the guest trace > >>data. Therefore, we can sort trace data in chronological order. > > > >This requires an entire trace of events. That is, to be able > >to reconstruct timeline you require the entire trace from the moment > >guest starts. So that you can correlate wrmsr-to-tsc on the guest with > >vmexit-due-to-tsc-write on the host. > > > >Which means that running out of space for trace buffer equals losing > >ability to order events. > > > >Is that desirable? It seems cumbersome to me. > > As you say, tracing events can overwrite important events like > kvm_exit/entry or write_tsc_offset. So, Steven's multiple buffer is > needed by this feature. Normal events which often hit record the buffer > A, and important events which rarely hit record the buffer B. In our > case, the important event is write_tsc_offset. > >Also the need to correlate each write_tsc event in the guest trace > >with a corresponding tsc_offset write in the host trace means that it > >is _necessary_ for the guest and host to enable tracing simultaneously. > >Correct? > > > >Also, there are WRMSR executions in the guest for which there is > >no event in the trace buffer. From SeaBIOS, during boot. > >In that case, there is no explicit event in the guest trace which you > >can correlate with tsc_offset changes in the host side. > > I understand that you want to say, but we don't correlate between > write_tsc event and write_tsc_offset event directly. This is because > the write_tsc tracepoint (also
Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, (2012/11/24 7:46), Marcelo Tosatti wrote: On Thu, Nov 22, 2012 at 02:21:20PM +0900, Yoshihiro YUNOMAE wrote: Hi Marcelo, (2012/11/21 7:51), Marcelo Tosatti wrote: On Tue, Nov 20, 2012 at 07:36:33PM +0900, Yoshihiro YUNOMAE wrote: Hi Marcelo, Sorry for the late reply. (2012/11/17 4:15), Marcelo Tosatti wrote: On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: Thank you for commenting on my patch set. (2012/11/14 11:31), Steven Rostedt wrote: On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt wrote: On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: To merge the data like previous pattern, we apply this patch set. Then, we can get TSC offset of the guest as follows: $ dmesg | grep kvm [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ## | PID TSC offset | HOST TSC value --+ Using printk to export something like this is IMO a nasty hack. Can't we create a /sys or /proc file to export the same thing? Since the value changes over the course of the trace, and seems to be part of the context of the trace, I think I'd include it as a tracepoint. I'm fine with that too. Using some tracepoint is a nice idea, but there is one problem. Here, our discussion point is "the event which TSC offset is changed does not frequently occur, but the buffer must keep the event data." There are two ideas for using tracepoint. First, we define new tracepoint for changed TSC offset. This is simple and the overhead will be low. However, this trace event stored in the buffer will be overwritten by other trace events because this TSC offset event does not frequently occur. Second, we add TSC offset information to the tracepoint frequently occured. For example, we assume that TSC offset information is added to arguments of trace_kvm_exit(). The TSC offset is in the host trace. So given a host trace with two TSC offset updates, how do you know which events in the guest trace (containing a number of events) refer to which tsc offset update? Unless i am missing something, you can't solve this easily (well, except exporting information to the guest that allows it to transform RDTSC -> host TSC value, which can be done via pvclock). As you say, TSC offset events are in the host trace, but we don't need to notify guests of updating TSC offset. The offset event will output the next TSC offset value and the current TSC value, so we can calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be converted to host TSC using the TSC offset, so we can integrate those trace data. Think of this scenario: host trace 1h. event tsc write tsc_offset=-1000 3h. vmenter 4h. vmexit ... (event sequence) 99h. vmexit 100h. event tsc_write tsc_offset=-2000 101h. vmenter ... (event sequence). 500h. event tsc_write tsc_offset=-3000 Then a guest trace containing events with a TSC timestamp. Which tsc_offset to use? (that is the problem, which unless i am mistaken can only be solved easily if the guest can convert RDTSC -> TSC of host). There are three following cases of changing TSC offset: 1. Reset TSC at guest boot time 2. Adjust TSC offset due to some host's problems 3. Write TSC on guests The scenario which you mentioned is case 3, so we'll discuss this case. Here, we assume that a guest is allocated single CPU for the sake of ease. If a guest executes write_tsc, TSC values jumps to forward or backward. For the forward case, trace data are as follows:< guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=+900) 3060 kvm_exit 3075 tsc_offset=-2050 3080 kvm_enter 1050 event1 1055 event2 ... This case is simple. The guest TSC of the first kvm_enter is calculated as follows: (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 Similarly, the guest TSC of the second kvm_enter is 130. So, the guest events between 51 and 130, that is, 53 eventX is inserted between the first pair of kvm_enter and kvm_exit. To insert events of the guests between 51 and 130, we convert the guest TSC to the host TSC using TSC offset 2950. For the backward case, trace data are as follows: < guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=-50) 3060 kvm_exit 3075 tsc
Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
On Thu, Nov 22, 2012 at 02:21:20PM +0900, Yoshihiro YUNOMAE wrote: > Hi Marcelo, > > (2012/11/21 7:51), Marcelo Tosatti wrote: > >On Tue, Nov 20, 2012 at 07:36:33PM +0900, Yoshihiro YUNOMAE wrote: > >>Hi Marcelo, > >> > >>Sorry for the late reply. > >> > >>(2012/11/17 4:15), Marcelo Tosatti wrote: > >>>On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: > Thank you for commenting on my patch set. > > (2012/11/14 11:31), Steven Rostedt wrote: > >On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: > >>On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt > >>wrote: > >>>On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: > >>> > To merge the data like previous pattern, we apply this patch set. > Then, we can > get TSC offset of the guest as follows: > > $ dmesg | grep kvm > [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now > clock ## > > | > PID TSC offset > | > HOST TSC > value --+ > > >>> > >>>Using printk to export something like this is IMO a nasty hack. > >>> > >>>Can't we create a /sys or /proc file to export the same thing? > >> > >>Since the value changes over the course of the trace, and seems to be > >>part of the context of the trace, I think I'd include it as a > >>tracepoint. > >> > > > >I'm fine with that too. > > Using some tracepoint is a nice idea, but there is one problem. Here, > our discussion point is "the event which TSC offset is changed does not > frequently occur, but the buffer must keep the event data." > > There are two ideas for using tracepoint. First, we define new > tracepoint for changed TSC offset. This is simple and the overhead will > be low. However, this trace event stored in the buffer will be > overwritten by other trace events because this TSC offset event does > not frequently occur. Second, we add TSC offset information to the > tracepoint frequently occured. For example, we assume that TSC offset > information is added to arguments of trace_kvm_exit(). > >>> > >>>The TSC offset is in the host trace. So given a host trace with two TSC > >>>offset updates, how do you know which events in the guest trace > >>>(containing a number of events) refer to which tsc offset update? > >>> > >>>Unless i am missing something, you can't solve this easily (well, except > >>>exporting information to the guest that allows it to transform RDTSC -> > >>>host TSC value, which can be done via pvclock). > >> > >>As you say, TSC offset events are in the host trace, but we don't need > >>to notify guests of updating TSC offset. The offset event will output > >>the next TSC offset value and the current TSC value, so we can > >>calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be > >>converted to host TSC using the TSC offset, so we can integrate those > >>trace data. > > > >Think of this scenario: > > > >host trace > >1h. event tsc write tsc_offset=-1000 > >3h. vmenter > >4h. vmexit > >... (event sequence) > >99h. vmexit > >100h. event tsc_write tsc_offset=-2000 > >101h. vmenter > >... (event sequence). > >500h. event tsc_write tsc_offset=-3000 > > > >Then a guest trace containing events with a TSC timestamp. > >Which tsc_offset to use? > > > >(that is the problem, which unless i am mistaken can only be solved > >easily if the guest can convert RDTSC -> TSC of host). > > There are three following cases of changing TSC offset: > 1. Reset TSC at guest boot time > 2. Adjust TSC offset due to some host's problems > 3. Write TSC on guests > The scenario which you mentioned is case 3, so we'll discuss this case. > Here, we assume that a guest is allocated single CPU for the sake of > ease. > > If a guest executes write_tsc, TSC values jumps to forward or backward. > For the forward case, trace data are as follows: > >< guest > > cyclesevents cycles events > 3000 tsc_offset=-2950 > 3001 kvm_enter > 53 eventX > > 100 (write_tsc=+900) > 3060 kvm_exit > 3075 tsc_offset=-2050 > 3080 kvm_enter >1050 event1 >1055 event2 > ... > > > This case is simple. The guest TSC of the first kvm_enter is calculated > as follows: > > (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 > > Similarly, the guest TSC of the second kvm_enter is 130. So, the guest > events between 51 and 130, that is, 53 eventX is inserted between t
Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, (2012/11/21 7:51), Marcelo Tosatti wrote: On Tue, Nov 20, 2012 at 07:36:33PM +0900, Yoshihiro YUNOMAE wrote: Hi Marcelo, Sorry for the late reply. (2012/11/17 4:15), Marcelo Tosatti wrote: On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: Thank you for commenting on my patch set. (2012/11/14 11:31), Steven Rostedt wrote: On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt wrote: On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: To merge the data like previous pattern, we apply this patch set. Then, we can get TSC offset of the guest as follows: $ dmesg | grep kvm [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ## | PID TSC offset | HOST TSC value --+ Using printk to export something like this is IMO a nasty hack. Can't we create a /sys or /proc file to export the same thing? Since the value changes over the course of the trace, and seems to be part of the context of the trace, I think I'd include it as a tracepoint. I'm fine with that too. Using some tracepoint is a nice idea, but there is one problem. Here, our discussion point is "the event which TSC offset is changed does not frequently occur, but the buffer must keep the event data." There are two ideas for using tracepoint. First, we define new tracepoint for changed TSC offset. This is simple and the overhead will be low. However, this trace event stored in the buffer will be overwritten by other trace events because this TSC offset event does not frequently occur. Second, we add TSC offset information to the tracepoint frequently occured. For example, we assume that TSC offset information is added to arguments of trace_kvm_exit(). The TSC offset is in the host trace. So given a host trace with two TSC offset updates, how do you know which events in the guest trace (containing a number of events) refer to which tsc offset update? Unless i am missing something, you can't solve this easily (well, except exporting information to the guest that allows it to transform RDTSC -> host TSC value, which can be done via pvclock). As you say, TSC offset events are in the host trace, but we don't need to notify guests of updating TSC offset. The offset event will output the next TSC offset value and the current TSC value, so we can calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be converted to host TSC using the TSC offset, so we can integrate those trace data. Think of this scenario: host trace 1h. event tsc write tsc_offset=-1000 3h. vmenter 4h. vmexit ... (event sequence) 99h. vmexit 100h. event tsc_write tsc_offset=-2000 101h. vmenter ... (event sequence). 500h. event tsc_write tsc_offset=-3000 Then a guest trace containing events with a TSC timestamp. Which tsc_offset to use? (that is the problem, which unless i am mistaken can only be solved easily if the guest can convert RDTSC -> TSC of host). There are three following cases of changing TSC offset: 1. Reset TSC at guest boot time 2. Adjust TSC offset due to some host's problems 3. Write TSC on guests The scenario which you mentioned is case 3, so we'll discuss this case. Here, we assume that a guest is allocated single CPU for the sake of ease. If a guest executes write_tsc, TSC values jumps to forward or backward. For the forward case, trace data are as follows:< guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=+900) 3060 kvm_exit 3075 tsc_offset=-2050 3080 kvm_enter 1050 event1 1055 event2 ... This case is simple. The guest TSC of the first kvm_enter is calculated as follows: (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51 Similarly, the guest TSC of the second kvm_enter is 130. So, the guest events between 51 and 130, that is, 53 eventX is inserted between the first pair of kvm_enter and kvm_exit. To insert events of the guests between 51 and 130, we convert the guest TSC to the host TSC using TSC offset 2950. For the backward case, trace data are as follows: < guest > cyclesevents cycles events 3000 tsc_offset=-2950 3001 kvm_enter 53 eventX 100 (write_tsc=-50) 3060 kvm_exit 3075 tsc_offset=-2050 3080 kvm_enter 90 event1 95 event2
Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
On Tue, Nov 20, 2012 at 07:36:33PM +0900, Yoshihiro YUNOMAE wrote: > Hi Marcelo, > > Sorry for the late reply. > > (2012/11/17 4:15), Marcelo Tosatti wrote: > >On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: > >>Thank you for commenting on my patch set. > >> > >>(2012/11/14 11:31), Steven Rostedt wrote: > >>>On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: > On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt > wrote: > >On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: > > > >>To merge the data like previous pattern, we apply this patch set. Then, > >>we can > >>get TSC offset of the guest as follows: > >> > >>$ dmesg | grep kvm > >>[ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now > >>clock ## > >> > >> | > >> PID TSC offset > >> | > >>HOST TSC > >> value --+ > >> > > > >Using printk to export something like this is IMO a nasty hack. > > > >Can't we create a /sys or /proc file to export the same thing? > > Since the value changes over the course of the trace, and seems to be > part of the context of the trace, I think I'd include it as a > tracepoint. > > >>> > >>>I'm fine with that too. > >> > >>Using some tracepoint is a nice idea, but there is one problem. Here, > >>our discussion point is "the event which TSC offset is changed does not > >>frequently occur, but the buffer must keep the event data." > >> > >>There are two ideas for using tracepoint. First, we define new > >>tracepoint for changed TSC offset. This is simple and the overhead will > >>be low. However, this trace event stored in the buffer will be > >>overwritten by other trace events because this TSC offset event does > >>not frequently occur. Second, we add TSC offset information to the > >>tracepoint frequently occured. For example, we assume that TSC offset > >>information is added to arguments of trace_kvm_exit(). > > > >The TSC offset is in the host trace. So given a host trace with two TSC > >offset updates, how do you know which events in the guest trace > >(containing a number of events) refer to which tsc offset update? > > > >Unless i am missing something, you can't solve this easily (well, except > >exporting information to the guest that allows it to transform RDTSC -> > >host TSC value, which can be done via pvclock). > > As you say, TSC offset events are in the host trace, but we don't need > to notify guests of updating TSC offset. The offset event will output > the next TSC offset value and the current TSC value, so we can > calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be > converted to host TSC using the TSC offset, so we can integrate those > trace data. Think of this scenario: host trace 1h. event tsc write tsc_offset=-1000 3h. vmenter 4h. vmexit ... (event sequence) 99h. vmexit 100h. event tsc_write tsc_offset=-2000 101h. vmenter ... (event sequence). 500h. event tsc_write tsc_offset=-3000 Then a guest trace containing events with a TSC timestamp. Which tsc_offset to use? (that is the problem, which unless i am mistaken can only be solved easily if the guest can convert RDTSC -> TSC of host). > >Another issue as mentioned is lack of TSC synchronization in the host. > >Should you provide such a feature without the possibility of proper > >chronological order on systems with unsynchronized TSC? > > I think, we cannot support this sorting feature using TSC on systems > with unsynchronized TSC. On systems with unsynchronized TSC, it is > difficult to sort not only trace data of guests and the host but trace > data of a guest or a host using TSC in chronological order. Actually, > if we want to output tracing data of ftrace in chronological order with > unsynchronized TSC, we will use the "global" mode as the timestamp. The > global mode uses wallclock added TSC correction, so the mode guarantees > to sort in chronological order for trace data of the guest or of > the host. If we use this mode to sort the trace data of guests and the > host in chronological order, we need to consider about the difference > between the guest and the host and timekeeping of guests and the host, > so it is difficult to solve these issues. At least, I haven't came up > with the good solution. I suppose the tradeoff is performance (RDTSC) versus reliability, when using ftrace. But then, even ftrace on the host suffers from the same problem, with unsynchronized TSCs. > We cannot sort the trace data of guests and the host in chronological > order with unsynchronized TSC, but if we can set following > synchronization events for both guests and the host, we will know where > we should sort. > > First, a guest and the host uses the global mode as the timest
Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Steven, Sorry for the late reply. (2012/11/17 0:05), Steven Rostedt wrote: On Wed, 2012-11-14 at 17:26 +0900, Yoshihiro YUNOMAE wrote: Thank you for commenting on my patch set. (2012/11/14 11:31), Steven Rostedt wrote: On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt wrote: On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: To merge the data like previous pattern, we apply this patch set. Then, we can get TSC offset of the guest as follows: $ dmesg | grep kvm [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ## | PID TSC offset | HOST TSC value --+ Using printk to export something like this is IMO a nasty hack. Can't we create a /sys or /proc file to export the same thing? Since the value changes over the course of the trace, and seems to be part of the context of the trace, I think I'd include it as a tracepoint. I'm fine with that too. Using some tracepoint is a nice idea, but there is one problem. Here, our discussion point is "the event which TSC offset is changed does not frequently occur, but the buffer must keep the event data." If you can hold off a bit, for the 3.9 window, I plan on pushing multiple buffers for ftrace. That is, you can create a separate buffer just for the TSC offset events: cd /sys/kernel/debug echo tsc > instances/new echo 1 > instances/tsc/events/tsc/offset/enable Then the buffer will be used only for that event. That's good. The tracepoint will output as follows: qemu-kvm-12345 [000] 123456789: kvm_write_tsc_offset: now_tsc=123456789 previous_offset=0 next_offset=123456780 Thanks, -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Hi Marcelo, Sorry for the late reply. (2012/11/17 4:15), Marcelo Tosatti wrote: On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: Thank you for commenting on my patch set. (2012/11/14 11:31), Steven Rostedt wrote: On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt wrote: On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: To merge the data like previous pattern, we apply this patch set. Then, we can get TSC offset of the guest as follows: $ dmesg | grep kvm [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ## | PID TSC offset | HOST TSC value --+ Using printk to export something like this is IMO a nasty hack. Can't we create a /sys or /proc file to export the same thing? Since the value changes over the course of the trace, and seems to be part of the context of the trace, I think I'd include it as a tracepoint. I'm fine with that too. Using some tracepoint is a nice idea, but there is one problem. Here, our discussion point is "the event which TSC offset is changed does not frequently occur, but the buffer must keep the event data." There are two ideas for using tracepoint. First, we define new tracepoint for changed TSC offset. This is simple and the overhead will be low. However, this trace event stored in the buffer will be overwritten by other trace events because this TSC offset event does not frequently occur. Second, we add TSC offset information to the tracepoint frequently occured. For example, we assume that TSC offset information is added to arguments of trace_kvm_exit(). The TSC offset is in the host trace. So given a host trace with two TSC offset updates, how do you know which events in the guest trace (containing a number of events) refer to which tsc offset update? Unless i am missing something, you can't solve this easily (well, except exporting information to the guest that allows it to transform RDTSC -> host TSC value, which can be done via pvclock). As you say, TSC offset events are in the host trace, but we don't need to notify guests of updating TSC offset. The offset event will output the next TSC offset value and the current TSC value, so we can calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be converted to host TSC using the TSC offset, so we can integrate those trace data. Another issue as mentioned is lack of TSC synchronization in the host. Should you provide such a feature without the possibility of proper chronological order on systems with unsynchronized TSC? I think, we cannot support this sorting feature using TSC on systems with unsynchronized TSC. On systems with unsynchronized TSC, it is difficult to sort not only trace data of guests and the host but trace data of a guest or a host using TSC in chronological order. Actually, if we want to output tracing data of ftrace in chronological order with unsynchronized TSC, we will use the "global" mode as the timestamp. The global mode uses wallclock added TSC correction, so the mode guarantees to sort in chronological order for trace data of the guest or of the host. If we use this mode to sort the trace data of guests and the host in chronological order, we need to consider about the difference between the guest and the host and timekeeping of guests and the host, so it is difficult to solve these issues. At least, I haven't came up with the good solution. We cannot sort the trace data of guests and the host in chronological order with unsynchronized TSC, but if we can set following synchronization events for both guests and the host, we will know where we should sort. First, a guest and the host uses the global mode as the timestamp of ftrace. Next, a user on the guest writes "1" to the synchronization I/F as the ID, then the synchronization event "1" is recorded in a ring-buffer of the guest. The synchronization operation induces hypercall, so the host can handle the event. After the operation moves to the host, the host records the event "1" in a ring-buffer of the host. In the end, the operation returns to the host, and the synchronization is finished. When we integrate tracing data of the guest and the host, we calculate difference of the timestamp between the synchronizing events with the same ID. This value is a temporary "offset". We will convert the timestamp of the guests to the timestamp of the host before the next synchronizing event. If the synchronizing event cycle is very short, we will not need to consider the timekeeping. Then, we can sort the trace data in chronological order. Would you comment for this or do you have another idea? Thanks, -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Lab