On Sep 29 15:43, Dmitry Fomichev wrote: > > -----Original Message----- > > From: Qemu-block <qemu-block- > > bounces+dmitry.fomichev=wdc....@nongnu.org> On Behalf Of Klaus > > Jensen > > Sent: Tuesday, September 29, 2020 6:47 AM > > To: Damien Le Moal <damien.lem...@wdc.com> > > Cc: Fam Zheng <f...@euphon.net>; Kevin Wolf <kw...@redhat.com>; qemu- > > bl...@nongnu.org; Niklas Cassel <niklas.cas...@wdc.com>; Klaus Jensen > > <k.jen...@samsung.com>; qemu-devel@nongnu.org; Alistair Francis > > <alistair.fran...@wdc.com>; Keith Busch <kbu...@kernel.org>; Philippe > > Mathieu-Daudé <phi...@redhat.com>; Matias Bjorling > > <matias.bjorl...@wdc.com> > > Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types > > and Zoned Namespace Command Set > > > > On Sep 28 22:54, Damien Le Moal wrote: > > > On 2020/09/29 6:25, Keith Busch wrote: > > > > On Mon, Sep 28, 2020 at 08:36:48AM +0200, Klaus Jensen wrote: > > > >> On Sep 28 02:33, Dmitry Fomichev wrote: > > > >>> You are making it sound like the entire WDC series relies on this > > approach. > > > >>> Actually, the persistency is introduced in the second to last patch > > > >>> in the > > > >>> series and it only adds a couple of lines of code in the i/o path to > > > >>> mark > > > >>> zones dirty. This is possible because of using mmap() and I find the > > > >>> way > > > >>> it is done to be quite elegant, not ugly :) > > > >>> > > > >> > > > >> No, I understand that your implementation works fine without > > > >> persistance, but persistance is key. That is why my series adds it in > > > >> the first patch. Without persistence it is just a toy. And the QEMU > > > >> device is not just an "NVMe-version" of null_blk. > > > > > > > > I really think we should be a bit more cautious of commiting to an > > > > on-disk format for the persistent state. Both this and Klaus' persistent > > > > state feels a bit ad-hoc, and with all the other knobs provided, it > > > > looks too easy to have out-of-sync states, or just not being able to > > > > boot at all if a qemu versions have different on-disk formats. > > > > > > > > Is anyone really considering zone emulation for production level stuff > > > > anyway? I can't imagine a real scenario where you'd want put yourself > > > > through that: you are just giving yourself all the downsides of a zoned > > > > block device and none of the benefits. AFAIK, this is provided as a > > > > development vehicle, closer to a "toy". > > > > > > > > I think we should consider trimming this down to a more minimal set that > > > > we *do* agree on and commit for inclusion ASAP. We can iterate all the > > > > bells & whistles and flush out the meta data's data marshalling scheme > > > > for persistence later. > > > > > > +1 on this. Removing the persistence also removes the debate on > > endianess. With > > > that out of the way, it should be straightforward to get agreement on a > > series > > > that can be merged quickly to get developers started with testing ZNS > > software > > > with QEMU. That is the most important goal here. 5.9 is around the corner, > > we > > > need something for people to get started with ZNS quickly. > > > > > > > Wait. What. No. Stop! > > > > It is unmistakably clear that you are invalidating my arguments about > > portability and endianness issues by suggesting that we just remove > > persistent state and deal with it later, but persistence is the killer > > feature that sets the QEMU emulated device apart from other emulation > > options. It is not about using emulation in production (because yeah, > > why would you?), but persistence is what makes it possible to develop > > and test "zoned FTLs" or something that requires recovery at power up. > > This is what allows testing of how your host software deals with opened > > zones being transitioned to FULL on power up and the persistent tracking > > of LBA allocation (in my series) can be used to properly test error > > recovery if you lost state in the app. > > > > Please, work with me on this instead of just removing such an essential > > feature. Since persistence seems to be the only thing we are really > > discussing, we should have plenty of time until the soft-freeze to come > > up with a proper solution on that. > > > > I agree that my version had a format that was pretty ad-hoc and that > > won't fly - it needs magic and version capabilities like in Dmitry's > > series, which incidentially looks a lot like what we did in the > > OpenChannel implementation, so I agree with the strategy. > > Are you insinuating that I somehow took stuff from OCSSD code and try > to claim priority this way? I am not at all that familiar with that code. > And I've already sent you the link to tcmu-runner code that served me > as an inspiration for implementing persistence in WDC patchset. > That code has been around for years, uses mmap, works great and has > nothing to do with you. >
No. I am not insinuating anything. The OpenChannel device also used a blockdev, but, yes, incidentially (and sorry, I should not have used that word), it looked like how we did it there and I noted that I agreed with the strategy. > > > > ZNS-wise, the only thing my implementation stores is the zone > > descriptors (in spec-native little-endian format) and the zone > > descriptor extensions. So there are no endian issues with those. The > > allocation tracking bitmap is always stored in little endian, but > > converted to big-endian if running on a big-endian host. > > > > Let me just conjure something up. > > > > #define NVME_PSTATE_MAGIC ... > > #define NVME_PSTATE_V1 1 > > > > typedef struct NvmePstateHeader { > > uint32_t magic; > > uint32_t version; > > > > uint64_t blk_len; > > > > uint8_t lbads; > > uint8_t iocs; > > > > uint8_t rsvd18[3054]; > > > > struct { > > uint64_t zsze; > > uint8_t zdes; > > } QEMU_PACKED zns; > > > > uint8_t rsvd3089[1007]; > > } QEMU_PACKED NvmePstateHeader; > > > > Why conjure something that already exists in WDC patchset? And that part > has been published in the very first version of our patches, weeks before > your entire ZNS series was posted. Add an rsvd[] here and there and that code > can be as suitable to achieve the stated goals as what you have above. > Yes, I read your code. I know you have a header and I also noted above that "it needs magic and version capabilities like in Dmitry's series". > > series, > > With such a header we have all we need. We can bail out if any > > parameters do not match and similar to nvme data structures it contains > > reserved areas for future use. I'll be posting a v2 with this. If this > > still feels too ad-hoc, we can be inspired by QCOW2 and the "extension" > > feature. > > > > I can agree that we drop other optional features like zone excursions > > and the reset/finish recommended limit simulation, but PLEASE DO NOT > > remove persistence and upstream a half-baked version when we are so > > close and have time to get it right. > > One important thing IMO is to reduce future need for metadata versioning. > This requires a really good effort to design and review the proper metadata > format that would become stable for some time. Think about portability. > To get out something "conjured up" now and then have to move to V2 > metadata in the next release is even worse than no persistence at all. > So maybe is makes sense to go with Keith's suggestion. As I've said, we have time until the soft-freeze to get this right. I "conjured" something up to show a point. The reason we review and iterate is to NOT upstream something thats is conjured up. But we gotta start somewhere, no? So what is so bad about me posting a v2?
signature.asc
Description: PGP signature