On Wed, Jan 18, 2012 at 10:52 AM, Harsh Bora <ha...@linux.vnet.ibm.com> wrote: > On 01/18/2012 04:11 PM, Harsh Bora wrote: >> >> On 01/18/2012 04:01 PM, Stefan Hajnoczi wrote: >>> >>> On Wed, Jan 18, 2012 at 9:14 AM, Harsh Bora<ha...@linux.vnet.ibm.com> >>> wrote: >>>> >>>> On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote: >>>>>> >>>>>> + unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp); >>>>>> writeout_idx += num_available; >>>>>> } >>>>>> >>>>>> idx = writeout_idx % TRACE_BUF_LEN; >>>>>> - while (get_trace_record(idx,&record)) { >>>>>> - trace_buf[idx].event = 0; /* clear valid bit */ >>>>>> - unused = fwrite(&record, sizeof(record), 1, trace_fp); >>>>>> - idx = ++writeout_idx % TRACE_BUF_LEN; >>>>>> + while (get_trace_record(idx,&recordptr)) { >>>>>> + unused = fwrite(recordptr, recordptr->length, 1, trace_fp); >>>>>> + writeout_idx += recordptr->length; >>>>>> + g_free(recordptr); >>>>>> + recordptr = (TraceRecord *)&trace_buf[idx]; >>>>>> >>>>>> + recordptr->event = 0; >>>>>> + idx = writeout_idx % TRACE_BUF_LEN; >>>>>> } >>>>> >>>>> >>>>> >>>>> I'm wondering if it's worth using a different approach here. Writing >>>>> out individual records has bothered me. >>>>> >>>>> If we have a second buffer, as big as trace_buf[], then a function can >>>>> copy out all records and make them available in trace_buf again: >>>>> >>>>> /** >>>>> * Copy completed trace records out of the ring buffer >>>>> * >>>>> * @idx offset into trace_buf[] >>>>> * @buf destination buffer >>>>> * @len size of destination buffer >>>>> * @ret the number of bytes consumed >>>>> */ >>>>> size_t consume_trace_records(unsigned int idx, void *buf, size_t len); >>>>> >>>>> That means consume gobbles up as many records as it can: >>>>> * Until it reaches an invalid record which has not been written yet >>>>> * Until it reaches the end of trace_buf[], the caller can call again >>>>> with idx wrapped to 0 >>>>> >>>>> After copying into buf[] it clears the valid bit in trace_buf[]. >>>>> >>>>> Then the loop which calls consume_trace_records() does a single >>>>> fwrite(3) and increments idx/writeout_idx. >>>>> >>>>> The advantage to this is that we write many records in one go and I >>>>> think it splits up the writeout steps a little nicer than what we've >>>>> previously done. >>>>> >>>> >>>> I think this optimization can be introduced as a separate patch later. >>>> Let me know if you think otherwise. >>> >>> >>> Yes, that could be done later. However there is something incorrect >>> here. Threads will continue to write trace records into the >>> ringbuffer while the write-out thread is doing I/O. Think about what >>> happens when threads overtake the write-out index modulo ringbuffer >>> size. Since records are variable-length the write-out thread's next >>> index could point into the middle of an overwritten record. And that >>> means the ->length field is junk - we may crash if we use it. >> >> >> In case of overwritten records, the valid bit of event id will also be >> overwritten, and therefore we will not consider that record. Moreover, >> the writeout thread will further get to know that some events were >> dropped and will start with the latest trace_idx, right ? >> >> However, to handle the extreme rare case of having an overwritten value >> such that its valid bit appears to be set, we can put a check for < >> NR_TRACE_EVENTS. Even better to have a magic byte for events also ? >> >> Harsh > > > Also, I just realized that we need to put buffer boundary checks while > copying a trace record: > > >>> -static bool get_trace_record(unsigned int idx, TraceRecord *record) >>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr) >>> { >>> - if (!(trace_buf[idx].event& TRACE_RECORD_VALID)) { >>> + TraceRecord *record = (TraceRecord *)&trace_buf[idx]; >>> + if (!(record->event& TRACE_RECORD_VALID)) { >>> return false; >>> } >>> >>> __sync_synchronize(); /* read memory barrier before accessing record >>> */ >>> >>> - *record = trace_buf[idx]; >>> - record->event&= ~TRACE_RECORD_VALID; >>> + *recordptr = g_malloc(record->length); >>> + /* make a copy of record to avoid being overwritten */ >>> + memcpy(*recordptr, record, record->length); > > So, I need to use a wrapper which should copy byte by byte taking care of > boundary overlaps, and that will also save us from crashing (as we will > never be crossing buffer boundaries), right ?
It won't be enough if length has a junk value like 2^32 - 1. We don't want to allocate/copy 4 GB :). Stefan