On Thu, Feb 27, 2020 at 10:12 PM Stefan Ring <stefan...@gmail.com> wrote: > Victory! I have a reproducer in the form of a plain C libgfapi client. > > However, I have not been able to trigger corruption by just executing > the simple pattern in an artificial way. Currently, I need to feed my > reproducer 2 GB of data that I streamed out of the qemu block driver. > I get two possible end states out of my reproducer: The correct one or > a corrupted one, where 48 KB are zeroed out. It takes no more than 10 > runs to get each of them at least once. The corrupted end state is > exactly the same that I got from the real qemu process from where I > obtained the streamed trace. This gives me a lot of confidence in the > soundness of my reproducer. > > More details will follow.
Ok, so the exact sequence of activity around the corruption is this: 8700 and so on are the sequential request numbers. All of these requests are writes. Blocks are 512 bytes. 8700 grows the file to a certain size (2134144 blocks) <8700 retires, nothing in flight> 8701 writes 55 blocks inside currently allocated file range, close to the end (7 blocks short) 8702 writes 54 blocks from the end of 8701, growing the file by 47 blocks <8702 retires, 8701 remains in flight> 8703 writes from the end of 8702, growing the file by 81 blocks <8703 retires, 8701 remains in flight> 8704 writes 1623 blocks also from the end of 8702, growing the file by 1542 blocks <8701 retires> <8704 retires> The exact range covered by 8703 ends up zeroed out. If 8701 retires earlier (before 8702 is issued), everything is fine.