On 08.03.2019 13:46, Jiri Olsa wrote: > On Thu, Mar 07, 2019 at 06:26:47PM +0300, Alexey Budankov wrote: >> >> On 07.03.2019 15:14, Jiri Olsa wrote: >>> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote: >>>> >>>> On 05.03.2019 15:25, Jiri Olsa wrote: >>>>> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote: >>>>> >>>>> SNIP >>>>> >>>>>> >>>>>> /* >>>>>> * Increment md->refcount to guard md->data[idx] buffer >>>>>> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void >>>>>> *to, int idx, >>>>>> md->prev = head; >>>>>> perf_mmap__consume(md); >>>>>> >>>>>> - rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + >>>>>> size, *off); >>>>>> + rc = push(to, md->aio.data[idx], size0 + size, *off, >>>>>> &md->aio.cblocks[idx]); >>>>>> if (!rc) { >>>>>> *off += size0 + size; >>>>>> } else { >>>>>> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map) >>>>>> } >>>>>> >>>>>> int perf_mmap__push(struct perf_mmap *md, void *to, >>>>>> - int push(struct perf_mmap *map, void *to, void >>>>>> *buf, size_t size)) >>>>>> + int push(struct perf_mmap *map, void *to, void >>>>>> *buf, size_t size), >>>>>> + perf_mmap__compress_fn_t compress, void *comp_data) >>>>>> { >>>>>> u64 head = perf_mmap__read_head(md); >>>>>> unsigned char *data = md->base + page_size; >>>>>> unsigned long size; >>>>>> void *buf; >>>>>> int rc = 0; >>>>>> + size_t mmap_len = perf_mmap__mmap_len(md); >>>>>> >>>>>> rc = perf_mmap__read_init(md); >>>>>> if (rc < 0) >>>>>> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to, >>>>>> buf = &data[md->start & md->mask]; >>>>>> size = md->mask + 1 - (md->start & md->mask); >>>>>> md->start += size; >>>>>> - >>>>>> + if (compress) { >>>>>> + size = compress(comp_data, md->data, mmap_len, >>>>>> buf, size); >>>>>> + buf = md->data; >>>>>> + } >>>>>> if (push(md, to, buf, size) < 0) { >>>>>> rc = -1; >>>>>> goto out; >>>>> >>>>> when we discussed the compress callback should be another layer >>>>> in perf_mmap__push I was thinking more of the layered/fifo design, >>>>> like: >>>>> >>>>> normaly we call: >>>>> >>>>> perf_mmap__push(... push = record__pushfn ...) >>>>> -> reads mmap data and calls push(data), which translates as: >>>>> >>>>> record__pushfn(data); >>>>> - which stores the data >>>>> >>>>> >>>>> for compressed it'd be: >>>>> >>>>> perf_mmap__push(... push = compressed_push ...) >>>>> >>>>> -> reads mmap data and calls push(data), which translates as: >>>>> >>>>> compressed_push(data) >>>>> -> reads data, compresses them and calls, next push >>>>> callback in line: >>>>> >>>>> record__pushfn(data) >>>>> - which stores the data >>>>> >>>>> >>>>> there'd need to be the logic for compressed_push to >>>>> remember the 'next push' function >>>> >>>> That is suboptimal for AIO. Also compression is an independent operation >>>> that >>>> could be applied on any of push stages you mean. >>> >>> not sure what you mean by suboptimal, but I think >>> that it can still happen in subsequent push callback >>> >>>> >>>>> >>>>> but I think this was the original idea behind the >>>>> perf_mmap__push -> it gets the data and pushes them for >>>>> the next processing.. it should stay as simple as that >>>> >>>> Agree on keeping simplicity and, at the moment, there is no any push to >>>> the next >>>> processing in the code so provided implementation fits as for serial as >>>> for AIO >>>> at the same time sticking to simplicity as much as possibly. If you see >>>> something >>>> that would fit better please speak up and share. >>> >>> I have to insist that perf_mmap__push stays untouched >>> and we do other processing in the push callbacks >> >> What is about perf_mmap__aio_push()? >> >> Without compression it does >> memcpy(), memcpy(), aio_push() >> >> With compression its does >> memcpy_with_compression(), memcpy_with_compression(), aio_push() > > so to be on the same page.. normal processing without compression is: > > perf_mmap__push does: > push(mmap buf) > record__pushfn > record__write > write(buf) > > perf_mmap__aio_push does: > memcpy(aio buf, mmap buf) > push(aio buf) > record__aio_pushfn > record__aio_write > aio_write(aio buf) > > > and for compression it would be: > > perf_mmap__push does: > push(mmap buf) > compress_push > memcpy(compress buffer, mmapbuf) EXTRA copy > record__pushfn > record__write > write(buf) > > perf_mmap__aio_push does: > memcpy(aio buf, mmap buf) > memcpy(compress buffer, mmapbuf) EXTRA copy > push(aio buf) > record__aio_pushfn > record__aio_write > aio_write(aio buf) > > > side note: that actualy makes me think why do we even have > perf_mmap__aio_push, > it looks like we could copy the buf in the callback push function with no > harm?
Well, yes, perf_mmap__aio_push() can be avoided and perf_mmap__push() can be used as for serial as for AIO, moving all the specifics to record code from mmap.c, like this: Serial perf_mmap__push(, record__pushfn) push(), possibly two times record__pushfn() if (-z) zstd_compress(map->base => map->data) <-- compressing memcpy() record__write(-z ? map->data, map->base) AIO record__aio_push() perf_mmap__push(, record__aio_pushfn()) push(), possibly two times record__aio_pushfn() if (-z) zstd_compress(map->base => map->aio.data[i]) <--- compressing memcpy() else memcpy(map->base => map->aio.data[i]) <--- plain memcpy() record__aio_write(map->aio.data[i]) So now it looks optimal as from performance and data loss reduction perspective as from design perspective. What do you think? ~Alexey > > so.. there's one extra memcpy for compression, is it right? > I might miss some part which makes this scheme unusable.. > > thanks, > jirka >