On Wed, Nov 01, 2023 at 04:37:12PM +0000, Daniel P. Berrangé wrote: > It doesn't contain thread number information directly, but it can > be implicit from the data layout. > > If you want parallel I/O, each thread has to know it is the only > one reading/writing to a particular region of the file. With the > fixed RAM layout in this series, the file offset directly maps > to the memory region. So if a thread has been given a guest page > to save it knows it will be the only thing writing to the file > at that offset. There is no relationship at all between the > number of threads and the file layout. > > If you can't directly map pages to file offsets, then you need > some other way to lay out date such that each thread can safely > write. If you split up a file based on fixed size chunks, then > the number of chunks you end up with in the file is likely to be > a multiple of the number of threads you had saving data.
What I was thinking is provision fixed size chunk in ramblock address space, e.g. 64M pages for each thread. It compresses with a local buffer, then request the file offsets to write only after the compression completed, because we'll need that to request file offset. > > This means if you restore using a different number of threads, > you can't evenly assign file chunks to each restore thread. > > There's no info about thread IDs in the file, but the data layout > reflects how mcuh threads were doing work. > > > Assuming decompress can do the same by assigning different chunks to each > > decompress thread, no matter how many are there. > > > > Would that work? > > Again you get uneven workloads if the number of restore threads is > different than the save threads, as some threads will have to process > more chunks than other threads. If the chunks are small this might > not matter, if they are big it could matter. Maybe you meant when the chunk size is only calculated from thread numbers, and when chunk is very large? If we have fixed size ramblock chunks, the number of chunks can be mostly irrelevant, e.g. for 4G guest it can contain 4G/64M=128 chunks. 128 chunks can easily be decompressed concurrently with mostly whatever number of recv threads. Parallel IO is not a problem either, afaict, if each thread can request its file offset to read/write. The write side is a bit tricky if with what I said above, it can only be requested and exclusively assigned to the writer thread after compression has finished and the thread knows how many bytes it needs to put the results. On read side we know the binary size of each chunk, so we can already mark each chunk exclusive to the each reader thread. Thanks, -- Peter Xu