size of elements per trace record.

Stefan Hajnoczi Wed, 18 Jan 2012 02:59:32 -0800

On Wed, Jan 18, 2012 at 10:52 AM, Harsh Bora <ha...@linux.vnet.ibm.com> wrote:
> On 01/18/2012 04:11 PM, Harsh Bora wrote:
>>
>> On 01/18/2012 04:01 PM, Stefan Hajnoczi wrote:
>>>
>>> On Wed, Jan 18, 2012 at 9:14 AM, Harsh Bora<ha...@linux.vnet.ibm.com>
>>> wrote:
>>>>
>>>> On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote:
>>>>>>
>>>>>> + unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp);
>>>>>> writeout_idx += num_available;
>>>>>> }
>>>>>>
>>>>>> idx = writeout_idx % TRACE_BUF_LEN;
>>>>>> - while (get_trace_record(idx,&record)) {
>>>>>> - trace_buf[idx].event = 0; /* clear valid bit */
>>>>>> - unused = fwrite(&record, sizeof(record), 1, trace_fp);
>>>>>> - idx = ++writeout_idx % TRACE_BUF_LEN;
>>>>>> + while (get_trace_record(idx,&recordptr)) {
>>>>>> + unused = fwrite(recordptr, recordptr->length, 1, trace_fp);
>>>>>> + writeout_idx += recordptr->length;
>>>>>> + g_free(recordptr);
>>>>>> + recordptr = (TraceRecord *)&trace_buf[idx];
>>>>>>
>>>>>> + recordptr->event = 0;
>>>>>> + idx = writeout_idx % TRACE_BUF_LEN;
>>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> I'm wondering if it's worth using a different approach here. Writing
>>>>> out individual records has bothered me.
>>>>>
>>>>> If we have a second buffer, as big as trace_buf[], then a function can
>>>>> copy out all records and make them available in trace_buf again:
>>>>>
>>>>> /**
>>>>> * Copy completed trace records out of the ring buffer
>>>>> *
>>>>> * @idx offset into trace_buf[]
>>>>> * @buf destination buffer
>>>>> * @len size of destination buffer
>>>>> * @ret the number of bytes consumed
>>>>> */
>>>>> size_t consume_trace_records(unsigned int idx, void *buf, size_t len);
>>>>>
>>>>> That means consume gobbles up as many records as it can:
>>>>> * Until it reaches an invalid record which has not been written yet
>>>>> * Until it reaches the end of trace_buf[], the caller can call again
>>>>> with idx wrapped to 0
>>>>>
>>>>> After copying into buf[] it clears the valid bit in trace_buf[].
>>>>>
>>>>> Then the loop which calls consume_trace_records() does a single
>>>>> fwrite(3) and increments idx/writeout_idx.
>>>>>
>>>>> The advantage to this is that we write many records in one go and I
>>>>> think it splits up the writeout steps a little nicer than what we've
>>>>> previously done.
>>>>>
>>>>
>>>> I think this optimization can be introduced as a separate patch later.
>>>> Let me know if you think otherwise.
>>>
>>>
>>> Yes, that could be done later. However there is something incorrect
>>> here. Threads will continue to write trace records into the
>>> ringbuffer while the write-out thread is doing I/O. Think about what
>>> happens when threads overtake the write-out index modulo ringbuffer
>>> size. Since records are variable-length the write-out thread's next
>>> index could point into the middle of an overwritten record. And that
>>> means the ->length field is junk - we may crash if we use it.
>>
>>
>> In case of overwritten records, the valid bit of event id will also be
>> overwritten, and therefore we will not consider that record. Moreover,
>> the writeout thread will further get to know that some events were
>> dropped and will start with the latest trace_idx, right ?
>>
>> However, to handle the extreme rare case of having an overwritten value
>> such that its valid bit appears to be set, we can put a check for <
>> NR_TRACE_EVENTS. Even better to have a magic byte for events also ?
>>
>> Harsh
>
>
> Also, I just realized that we need to put buffer boundary checks while
> copying a trace record:
>
>
>>> -static bool get_trace_record(unsigned int idx, TraceRecord *record)
>>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr)
>>>   {
>>> -    if (!(trace_buf[idx].event&  TRACE_RECORD_VALID)) {
>>> +    TraceRecord *record = (TraceRecord *)&trace_buf[idx];
>>> +    if (!(record->event&  TRACE_RECORD_VALID)) {
>>>          return false;
>>>      }
>>>
>>>      __sync_synchronize(); /* read memory barrier before accessing record
>>> */
>>>
>>> -    *record = trace_buf[idx];
>>> -    record->event&= ~TRACE_RECORD_VALID;
>>> +    *recordptr = g_malloc(record->length);
>>> +    /* make a copy of record to avoid being overwritten */
>>> +    memcpy(*recordptr, record, record->length);
>
> So, I need to use a wrapper which should copy byte by byte taking care of
> boundary overlaps, and that will also save us from crashing (as we will
> never be crossing buffer boundaries), right ?


It won't be enough if length has a junk value like 2^32 - 1.  We don't
want to allocate/copy 4 GB :).

Stefan

Re: [Qemu-devel] [RFC PATCH v3 2/3] simpletrace-v2: Handle variable number/size of elements per trace record.

Reply via email to