Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
On Mittwoch, 4. September 2019 15:02:30 CEST Christian Schoenebeck wrote: > > > Well, mailman is handling this correctly. It replaces the "From:" field > > > with a placeholder and instead adds my actual email address as > > > "Reply-To:" field. That's the common way to handle this on mailing > > > lists, > > > as also mentioned here: > > > https://en.wikipedia.org/wiki/DMARC#From:_rewriting > > > > > > So IMO patchew should automatically use the value of "Reply-To:" in that > > > case as author of patches instead. > > > > > > Reducing security cannot be the solution. > > > > No, there's no need to reduce security. Just change your local git > > configuration to produce a 'From:' line in the commit body.. > > Got it. :) > > > >> How are you sending patches ? With git send-email ? If so, maybe you > > >> can > > >> pass something like --from='"Christian Schoenebeck" > > >> '. Since this is a different string, git will > > >> assume you're sending someone else's patch : it will automatically add > > >> an > > >> extra From: made out of the commit Author as recorded in the git tree. > > > > I think it is probably as simple as a 'git config' command to tell git > > to always put a 'From:' in the body of self-authored patches when using > > git format-patch; however, as I don't suffer from munged emails, I > > haven't actually tested what that setting would be. Well, I tried that Eric. The expected solution would be enabling this git setting: git config [--global] format.from true https://git-scm.com/docs/git-config#Documentation/git-config.txt-formatfrom But as you can already read from the manual, the overall behaviour of git regarding a separate "From:" line in the email body was intended solely for the use case sender != author. So in practice (at least in my git version) git always makes a raw string comparison between sender (name and email) string and author string and only adds the separate From: line to the body if they differ. Hence also "git format-patch --from=" only works here if you use a different author string (name and email) there, otherwise on a perfect string match it is simply ignored and you end up with only one "From:" in the email header. So eventually I added one extra character in my name for now and removed it manually in the dumped emails subsequently (see today's "[PATCH v7 0/3] 9p: Fix file ID collisions"). Besides that direct string comparison restriction; git also seems to have a bug here. Because even if you have sender != author, then git falsely uses author as sender of the cover letter, whereas the emails of the individual patches are encoded correctly. Best regards, Christian Schoenebeck
[Qemu-devel] [PATCH v7 2/3] 9p: stat_to_qid: implement slow path
From: Christian Schoenebeck stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log warning message about potential degraded performance in qid_path_prefixmap(). - Wrapped qpf_table initialization to dedicated qpf_table_init() function. - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 74 +++- hw/9pfs/9p.h | 9 +++ 2 files changed, 76 insertions(+), 7 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 8eb89c5c7d..d9be2d45d3 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -579,23 +579,34 @@ static uint32_t qpp_hash(QppEntry e) return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); } +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + static bool qpp_lookup_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } -static void qpp_table_remove(void *p, uint32_t h, void *up) +static bool qpf_lookup_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino == e2->ino; +} + +static void qp_table_remove(void *p, uint32_t h, void *up) { g_free(p); } -static void qpp_table_destroy(struct qht *ht) +static void qp_table_destroy(struct qht *ht) { if (!ht || !ht->map) { return; } -qht_iter(ht, qpp_table_remove, NULL); +qht_iter(ht, qp_table_remove, NULL); qht_destroy(ht); } @@ -604,6 +615,50 @@ static void qpp_table_init(struct qht *ht) qht_init(ht, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); } +static void qpf_table_init(struct qht *ht) +{ +qht_init(ht, qpf_lookup_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qpf_table_init(&pdu->s->qpf_table); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +error_report_once( +"9p: No more prefixes available for remapping inodes from " +"host to guest." +); +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + /* * stat_to_qid needs to map inode number (64 bits) and device id (32 bits) * to a unique QID path (64 bits). To avoid having to map and keep track @@ -629,9 +684,8 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, if (!val) { if (pdu->s->qp_prefix_next == 0) { /* we ran out of prefixes */ -error_report_once( -"9p: No more prefixes available for remapping inodes from " -"host to guest." +warn_report_once( +"9p: Potential degraded performance of inode remapping" ); return -ENFILE; } @@ -656,6 +710,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { /* map inode+device to qid path (fast path) */ err = qid_path_prefixmap(pdu, stbuf, &qidp->path); +if (err == -ENFILE) { +/* fast path didn't work, fall back to full map */ +err = qid_path_fullmap(pdu, stbuf, &qidp->path); +} if (err) { return err; } @@ -3820,6 +3878,7 @@ int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t, qpp_table_init(&s->qpp_table); s->qp_prefix_next = 1; /* reserve 0 to detect overflow */ +s->qp
[Qemu-devel] [PATCH v7 0/3] 9p: Fix file ID collisions
This is v7 of a proposed patch set for fixing file ID collisions with 9pfs. v6->v7: * Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). * Be pedantic and abort with error on wrong value for new command line argument 'multidevs'. * Adjusted patches to qemu code style guidelines. * Fixed potential crash in qp_table_destroy() on error path. * Use dedicated hash table init functions (qpd_table_init(), qpf_table_init(), qpp_table_init()): https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00144.html * Use warn_report_once() instead of error_report_once() for issues interpreted merely as being warnings, not errors. * Simplified hash table destruction (due to simplified error path introduced by SHA1 7fc4c49e91). * Dropped capturing root_ino for now: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg00146.html * Don't update proxy docs, since new command line argument 'multidevs' is limited to the local backend for now. * Mention in docs that readdir() is currently not blocked by 'multidevs=forbid'. * Rename qid_path_prefixmap() -> qid_path_suffixmap() in patch 3 (due to the semantic change of that function by that patch). Christian Schoenebeck (3): 9p: Added virtfs option 'multidevs=remap|forbid|warn' 9p: stat_to_qid: implement slow path 9p: Use variable length suffixes for inode remapping fsdev/file-op-9p.h | 5 + fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 17 ++ hw/9pfs/9p.c| 456 ++-- hw/9pfs/9p.h| 59 ++ qemu-options.hx | 26 ++- vl.c| 7 +- 7 files changed, 552 insertions(+), 25 deletions(-) -- 2.20.1
[Qemu-devel] [PATCH v7 3/3] 9p: Use variable length suffixes for inode remapping
From: Christian Schoenebeck Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 268 +-- hw/9pfs/9p.h | 44 - 2 files changed, 279 insertions(+), 33 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index d9be2d45d3..37abcdb71e 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -27,6 +27,7 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include int open_fd_hw; int total_open_fd; @@ -573,6 +574,116 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) +{ +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) +{ +return ((uint64_t)mirror8bit(value & 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8) | + ((uint64_t)mirror8bit((value >> 56) & 0xff)); +} + +/** + * @brief Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller the minimum bit count for the Exp. + * Golomb generated affixes will be (at lowest index) however for the + * price of having higher maximum bit count of generated affixes (at highest + * index). Likewise increasing this parameter yields in smaller maximum bit + * count for the price of having higher minimum bit count. + * + * In practice that means: a good value for k depends on the expected amount + * of devices to be exposed by one export. For a small amount of devices k + * should be small, for a large amount of devices k might be increased + * instead. The default of k=0 should be fine for most users though. + * + * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of + * k should not change as long as guest is still running! Because that would + * cause completely different inode numbers to be generated on guest. + */ +#define EXP_GOLOMB_K0 + +/** + * @brief Exponential Golomb algorithm for arbitrary k (including k=0). + * + * The Exponential Golomb algorithm generates @b prefixes (@b not suffixes!) + * with growing length and with the ma
[Qemu-devel] [PATCH v7 1/3] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
From: Christian Schoenebeck 'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 7fc4c49e91). - Added virtfs option 'multidevs', original patch simply did the inode remapping without being asked. - Updated hash calls to new xxhash API. - Updated docs for new option 'multidevs'. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Wrapped qpp_table initialization to dedicated qpp_table_init() function. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 5 ++ fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 17 hw/9pfs/9p.c| 188 +++- hw/9pfs/9p.h| 12 +++ qemu-options.hx | 26 +- vl.c| 7 +- 7 files changed, 237 insertions(+), 25 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index c757c8099f..f2f7772c86 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,11 @@ typedef struct ExtendedOps { #define V9FS_RDONLY 0x0040 #define V9FS_PROXY_SOCK_FD 0x0080 #define V9FS_PROXY_SOCK_NAME0x0100 +/* + * multidevs option (either one of the two applies exclusively) + */ +#define V9FS_REMAP_INODES 0x0200 +#define V9FS_FORBID_MULTIDEVS 0x0400 #define V9FS_SEC_MASK 0x003C diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c index 7c31af..07a18c6e48 100644 --- a/fsdev/qemu-fsdev-opts.c +++ b/fsdev/qemu-fsdev-opts.c @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = { }, { .name = "readonly", .type = QEMU_OPT_BOOL, - +}, { +.name = "multidevs", +.type = QEMU_OPT_STRING, }, { .name = "socket", .type = QEMU_OPT_STRING, @@ -75,6 +77,9 @@ static QemuOptsList qemu_virtfs_opts = { }, { .name = "readonly", .type = QEMU_OPT_BOOL, +}, { +.name = "multidevs", +.type = QEMU_OPT_STRING, }, { .name = "socket", .type = QEMU_OPT_STRING, diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 077a8c4e2b..e407716177 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = { "writeout", "fmode", "dmode", +"multidevs", "throttling.bps-total", "throttling.bps-read", "throttling.bps-write", @@ -121,6 +122,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) const char *fsdev_id = qemu_opts_id(opts); const char *fsdriver = qemu_opt_get(opts, "fsdriver"); const char *writeout = qemu_opt_get(opts, "writeout"); +const char *multidevs = qemu_opt_get(opts, "multidevs"); bool ro = qemu_opt_get_bool(opts, "readonly", 0); if (!fsdev_id) { @@ -161,6 +163,21 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) } else { fsle->fse.export_flags &= ~V9FS_RDONLY; } +if (multidevs) { +if (!strcmp(multidevs, "remap")) { +fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS; +fsle->fse.export_flags |= V9FS_REMAP_INODES; +} else if (!strcmp(multidevs, "forbid")) { +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; +fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS; +} else if (!strcmp(multidevs, "warn")) { +fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS; +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; +} else { +error_setg(errp, "fsdev: '%s' is invalid for multidevs", multidevs); +return -1; +} +} if (fsle->fse.ops->parse_opts) { if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { diff --git a/hw/9
Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
On Dienstag, 3. September 2019 21:38:15 CEST Eric Blake wrote: > On 9/2/19 5:29 PM, Christian Schoenebeck via Qemu-devel wrote: > >>>>> === OUTPUT BEGIN === > >>>>> 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one > >>>>> export > >>>>> as an error) ERROR: Author email address is mangled by the mailing > >>>>> list > >>>>> #2: > >>>>> Author: Christian Schoenebeck via Qemu-devel > >>>> > >>>> This is problematic since it ends up in the Author: field in git. > >>>> Please > >>>> find a way to fix that. > >>> > >>> Like in which way do you imagine that? And where is the actual practical > >>> problem? I mean every patch still has my signed-off-by tag with the > >>> correct > >>> email address ending up in git history. > >> > >> Yes, this only breaks Author: if the patch is applied from the list. > > Except that many maintainers DO apply mail from the list (thanks to 'git > am'). Fixing patchew to unmunge things is an appealing idea, but would > not fix the problem for maintainers not cloning from patchew, so even if > patchew avoids the problem locally, it should still continue to warn > about the problem. > > >>> The cause for this issue is that the domain is configured to require > >>> DKIM > >>> signatures for all outgoing emails. That's why mailman replaces my > >>> address > >>> by "Christian Schoenebeck via Qemu-devel " > >>> placeholder since it could not provide a valid signature. > > And when you know that mailman is going to munge your address, the fix > is to configure git to output 'From: correct name ' > as the first line of the BODY of the message, since 'git am' favors the > unmunged From: from the body over the munged From: from the headers. Ah I see, I will try that with the next 9p patch set round (probably tomorrow). Thanks for the hint! I actually had a quick glimpse on the patchew sources yesterday to see if there was some undocumented alternative header like "X-git-author:" or something like that, but could not find one. > > Well, mailman is handling this correctly. It replaces the "From:" field > > with a placeholder and instead adds my actual email address as > > "Reply-To:" field. That's the common way to handle this on mailing lists, > > as also mentioned here: > > https://en.wikipedia.org/wiki/DMARC#From:_rewriting > > > > So IMO patchew should automatically use the value of "Reply-To:" in that > > case as author of patches instead. > > > > Reducing security cannot be the solution. > > No, there's no need to reduce security. Just change your local git > configuration to produce a 'From:' line in the commit body.. Got it. :) > >> How are you sending patches ? With git send-email ? If so, maybe you can > >> pass something like --from='"Christian Schoenebeck" > >> '. Since this is a different string, git will > >> assume you're sending someone else's patch : it will automatically add an > >> extra From: made out of the commit Author as recorded in the git tree. > > I think it is probably as simple as a 'git config' command to tell git > to always put a 'From:' in the body of self-authored patches when using > git format-patch; however, as I don't suffer from munged emails, I > haven't actually tested what that setting would be. > > > I use "git format-patch ..." to dump the invidiual emails as raw email > > sources and then I'll send those raw emails from the command line. So I > > have even more control of what is exactly sent out and could of course > > also add custom email header fields if required, if that would solve the > > situation somehow, i.e. manually as first test and later in automated > > way. That's not the issue here. > Working around the problem does not require munging email headers, but > adding a line to the email body.
Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
On Montag, 2. September 2019 17:34:32 CEST Greg Kurz wrote: > On Sun, 01 Sep 2019 21:28:45 +0200 > > Christian Schoenebeck wrote: > > On Donnerstag, 29. August 2019 19:02:34 CEST Greg Kurz wrote: > > > On Thu, 22 Aug 2019 15:18:54 -0700 (PDT) > > > > > > no-re...@patchew.org wrote: > > > > Patchew URL: > > > > https://patchew.org/QEMU/cover.1566503584.git.qemu_...@crudebyte.com/ > > > > > > > > > > > > > > > > Hi, > > > > > > > > This series seems to have some coding style problems. See output below > > > > for > > > > > > more information: > > [snip] > > > > > > === OUTPUT BEGIN === > > > > 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one > > > > export > > > > as an error) ERROR: Author email address is mangled by the mailing > > > > list > > > > #2: > > > > Author: Christian Schoenebeck via Qemu-devel > > > > > > This is problematic since it ends up in the Author: field in git. Please > > > find a way to fix that. > > > > Like in which way do you imagine that? And where is the actual practical > > problem? I mean every patch still has my signed-off-by tag with the > > correct > > email address ending up in git history. > > Yes, this only breaks Author: if the patch is applied from the list. > > > The cause for this issue is that the domain is configured to require DKIM > > signatures for all outgoing emails. That's why mailman replaces my address > > by "Christian Schoenebeck via Qemu-devel " > > placeholder since it could not provide a valid signature. > > > > There were good reasons for enabling DKIM and it is a good thing for all > > domains in general, since that ensures that (i.e. foreign) email addresses > > cannot be used as sender address if the actual sender is not authorized > > for > > sending emails with that address. > > Don't know much about DKIM but google seems to confirm what you say. When you view the source of my emails you'll always see a "DKIM-Signature:" field in the email header, which is a signature of the email's body and the specific email header fields listed in the "DKIM-Signature:" block, so the original server can choose by itself which header fields to include ("h=...") for signing, but the standard requires the From: field must always be included. A receiving server then obtains the public key from the domain's DNS records and checks if the DKIM signature of the email was correct: https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail Additionally the receiving server obtains the so called "DMARC" entry from the domain's DNS records. The "DMARC" DNS entry contains the domain's general policies how receiving email servers shall handle verification of emails of this domain. For instance the DMARC entry may list SMTP servers (e.g. IPs, DNS names) eligble to send emails on behalf of the domain at all, and it defines what receiving email servers shall do with emails which were identified of not being from an eligible source (e.g. sender IP not listed in DMARC entry or missing or wrong DKIM signature in email header). For instance if the policy is "quarantine" in the DMARC entry then receiving servers are advised to simply drop invalid emails: https://en.wikipedia.org/wiki/DMARC > So, this means that patchew will complain each time you post if we can't > find a proper way to address that... :-\ Well, mailman is handling this correctly. It replaces the "From:" field with a placeholder and instead adds my actual email address as "Reply-To:" field. That's the common way to handle this on mailing lists, as also mentioned here: https://en.wikipedia.org/wiki/DMARC#From:_rewriting So IMO patchew should automatically use the value of "Reply-To:" in that case as author of patches instead. Reducing security cannot be the solution. > > What I changed in the meantime though is that you should get all my > > patches > > directly to your personal address, not only from the list. Or did you > > receive v6 again just from the list? > > I've received the patches in my mailbox but I prefer to use the patchwork's > pwclient CLI to apply patches... and patchwork captures the patches from > the list, so I end up having to patch the authorship manually anyway. > > How are you sending patches ? With git send-email ? If so, maybe you can > pass something like --from='"Christian Schoenebeck" > '. Since this is a differ
Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
On Montag, 2. September 2019 13:49:34 CEST Greg Kurz wrote: > On Sun, 01 Sep 2019 20:56:16 +0200 > > Christian Schoenebeck wrote: > > On Freitag, 30. August 2019 14:22:38 CEST Greg Kurz wrote: > > > Some more comments below. > > > > [snip] > > > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > > index 8cc65c2c67..c96ea51116 100644 > > > > --- a/hw/9pfs/9p.c > > > > +++ b/hw/9pfs/9p.c > > > > @@ -25,6 +25,7 @@ > > > > > > > > #include "trace.h" > > > > #include "migration/blocker.h" > > > > #include "sysemu/qtest.h" > > > > [snip] > > > > > > @@ -3672,8 +3807,13 @@ int v9fs_device_realize_common(V9fsState *s, > > > > const > > > > V9fsTransport *t,> > > > > > > > > goto out; > > > > > > > > } > > > > > > > > +s->root_ino = stat.st_ino; > > > > > > This isn't used anywhere. It looks like a leftover of the readdir fix > > > in v5. > > > > Yes, both correct. I intentionally left it though, since I found it a > > natural complement always capturing the root inode along to the root > > device. > Fair enough. The local backend opens an fd to the root directory, to be used > by any access to the 9p share. I think root_dev/root_ino should be obtained > with fstat() on this fd, to be sure they are consistent. Maybe add an extra > struct stat * argument to the init function ? I'd rather see this done as a > preparatory "backend to cache 9p root device/inode during init" patch. Convinced. I'll drop root_ino from this patch set for now.
Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
On Montag, 2. September 2019 12:16:26 CEST Greg Kurz wrote: > > > > @@ -571,22 +572,109 @@ static void coroutine_fn virtfs_reset(V9fsPDU > > > > *pdu) > > > > > > > > P9_STAT_MODE_NAMED_PIPE | \ > > > > P9_STAT_MODE_SOCKET) > > > > > > > > -/* This is the algorithm from ufs in spfs */ > > > > + > > > > +/* creative abuse of tb_hash_func7, which is based on xxhash */ > > > > +static uint32_t qpp_hash(QppEntry e) > > > > +{ > > > > +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); > > > > +} > > > > + > > > > +static bool qpp_lookup_func(const void *obj, const void *userp) > > > > +{ > > > > +const QppEntry *e1 = obj, *e2 = userp; > > > > +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; > > > > +} > > > > + > > > > +static void qpp_table_remove(void *p, uint32_t h, void *up) > > > > +{ > > > > +g_free(p); > > > > +} > > > > + > > > > +static void qpp_table_destroy(struct qht *ht) > > > > +{ > > > > +qht_iter(ht, qpp_table_remove, NULL); > > > > +qht_destroy(ht); > > > > +} > > > > > > Ok to have a function for this instead of open-coding but I'd > > > like to see qpp_table_init() for consistency. > > > > Well, these are just qht_init() one-liners, but if you really want to have > > dedicated, local init functions for them, okay. > > Yeah, even if it's a one-liner, I prefer consistency. Alternatively, with > an idempotent v9fs_device_unrealize_common() like in [1], you'd have > only one user for qpp_table_destroy() and you can open-code it. This > would address my consistency concern even better :) > > [1] > https://github.com/gkurz/qemu/commit/7fc4c49e910df2e155b36bf0a05de9209bd92d I'll rather add qpp_table_init() then, because grouping the two calls qht_iter() and qht_destroy() together to a dedicated function qpp_table_destroy() still makes sense semantically IMO.
Re: [Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'
On Freitag, 30. August 2019 13:48:27 CEST Greg Kurz wrote: > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > > index 8cc65c2c67..39c6c2a894 100644 > > > --- a/hw/9pfs/9p.c > > > +++ b/hw/9pfs/9p.c > > > > [snip] > > > > > @@ -1940,6 +2041,19 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU > > > *pdu, V9fsFidState *fidp, int32_t count = 0; > > > > > > off_t saved_dir_pos; > > > struct dirent *dent; > > > > > > +struct stat stbuf; > > > +bool fidIsExportRoot; > > > + > > > +/* > > > + * determine if fidp is the export root, which is required for safe > > > + * handling of ".." below > > > + */ > > > +err = v9fs_co_lstat(pdu, &fidp->path, &stbuf); > > > +if (err < 0) { > > > +return err; > > > +} > > > +fidIsExportRoot = pdu->s->dev_id == stbuf.st_dev && > > > + pdu->s->root_ino == stbuf.st_ino; > > > > > > /* save the directory position */ > > > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > > > > > > @@ -1964,16 +2078,51 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU > > > *pdu, V9fsFidState *fidp, v9fs_string_free(&name); > > > > > > return count; > > > > > > } > > > > > > -/* > > > - * Fill up just the path field of qid because the client uses > > > - * only that. To fill the entire qid structure we will have > > > - * to stat each dirent found, which is expensive > > > - */ > > > -size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); > > > -memcpy(&qid.path, &dent->d_ino, size); > > > -/* Fill the other fields with dummy values */ > > > -qid.type = 0; > > > -qid.version = 0; > > > + > > > +if (fidIsExportRoot && !strcmp("..", dent->d_name)) { > > > +/* > > > + * if "." is export root, then return qid of export root > > > for > > > + * ".." to avoid exposing anything outside the export > > > + */ > > > +err = fid_to_qid(pdu, fidp, &qid); > > > +if (err < 0) { > > > +v9fs_readdir_unlock(&fidp->fs.dir); > > > +v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > > > +v9fs_string_free(&name); > > > +return err; > > > +} > > > > Hmm, I start to wonder whether I should postpone that particular bug fix > > and not make it part of that QID fix patch series (not even as separate > > patch there). Because that fix needs some more adjustments. E.g. I should > > adjust dent->d_type here as well; but more notably it should also > > distinguish between the case where the export root is mounted as / on > > guest or not and that's where this fix could become ugly and grow in > > size. > > > > To make the case clear: calling on guest > > > > readdir(pathOfSome9pExportRootOnGuest); > > > > currently always returns for its ".." result entry the inode number and > > d_type of the export root's parent directory on host, so it exposes > > information of host outside the 9p export. > > > > I don't see that as security issue, since the information revealed is > > limited to the inode number and d_type, but it is definitely incorrect > > behaviour. > Definitely. This should be fixed independently of this series. Maybe follow > the same approach as commit 56f101ecce0e "9pfs: handle walk of ".." in the > root directory", ie. basically make /.. an alias of /. That's actually what the suggested fix already did in v5 here (see diff above). However I was worried whether I thought about all edge cases. So I also need some more testing and hence clearly decided to postpone this fix for now. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
On Donnerstag, 29. August 2019 19:02:34 CEST Greg Kurz wrote: > On Thu, 22 Aug 2019 15:18:54 -0700 (PDT) > > no-re...@patchew.org wrote: > > Patchew URL: > > https://patchew.org/QEMU/cover.1566503584.git.qemu_...@crudebyte.com/ > > > > > > > > Hi, > > > > This series seems to have some coding style problems. See output below for > > more information: [snip] > > > > === OUTPUT BEGIN === > > 1/4 Checking commit bb69de63f788 (9p: Treat multiple devices on one export > > as an error) ERROR: Author email address is mangled by the mailing list > > #2: > > Author: Christian Schoenebeck via Qemu-devel > > This is problematic since it ends up in the Author: field in git. Please > find a way to fix that. Like in which way do you imagine that? And where is the actual practical problem? I mean every patch still has my signed-off-by tag with the correct email address ending up in git history. The cause for this issue is that the domain is configured to require DKIM signatures for all outgoing emails. That's why mailman replaces my address by "Christian Schoenebeck via Qemu-devel " placeholder since it could not provide a valid signature. There were good reasons for enabling DKIM and it is a good thing for all domains in general, since that ensures that (i.e. foreign) email addresses cannot be used as sender address if the actual sender is not authorized for sending emails with that address. What I changed in the meantime though is that you should get all my patches directly to your personal address, not only from the list. Or did you receive v6 again just from the list? > Other warnings/errors should also be fixed but they look trivial. Yeah, they are trivial. *But* there is one thing ... > > Author: Christian Schoenebeck via Qemu-devel > > > > ERROR: space prohibited after that open parenthesis '(' > > #92: FILE: hw/9pfs/9p.c:586: > > +return ((uint64_t)mirror8bit( value& 0xff) << 56) | > > > > ERROR: space prohibited before that close parenthesis ')' > > #98: FILE: hw/9pfs/9p.c:592: > > + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | > > > > ERROR: space prohibited before that close parenthesis ')' > > #99: FILE: hw/9pfs/9p.c:593: > > + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; ... I would like to ignore this specific bot whining, because that particular function looks much more readable the way it is (in that patch) right now. > > WARNING: Block comments use a leading /* on a separate line > > #102: FILE: hw/9pfs/9p.c:596: > > +/** @brief Parameter k for the Exponential Golomb algorihm to be used. > > > > WARNING: Block comments use a leading /* on a separate line > > #121: FILE: hw/9pfs/9p.c:615: > > +/** @brief Exponential Golomb algorithm for arbitrary k (including k=0). > > > > WARNING: Block comments use a leading /* on a separate line > > #148: FILE: hw/9pfs/9p.c:642: > > +/** @brief Converts a suffix into a prefix, or a prefix into a suffix.
Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
On Freitag, 30. August 2019 14:22:38 CEST Greg Kurz wrote: > Some more comments below. [snip] > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index 8cc65c2c67..c96ea51116 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -25,6 +25,7 @@ > > > > #include "trace.h" > > #include "migration/blocker.h" > > #include "sysemu/qtest.h" [snip] > > @@ -3672,8 +3807,13 @@ int v9fs_device_realize_common(V9fsState *s, const > > V9fsTransport *t,> > > goto out; > > > > } > > > > +s->root_ino = stat.st_ino; > > This isn't used anywhere. It looks like a leftover of the readdir fix > in v5. Yes, both correct. I intentionally left it though, since I found it a natural complement always capturing the root inode along to the root device. > > s->dev_id = stat.st_dev; > > > > +/* QID path hash table. 1 entry ought to be enough for anybody ;) */ > > +qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); > > +s->qp_prefix_next = 1; /* reserve 0 to detect overflow */ > > + > > > > s->ctx.fst = &fse->fst; > > fsdev_throttle_init(s->ctx.fst); > > > > @@ -3687,6 +3827,7 @@ out: > > } > > g_free(s->tag); > > g_free(s->ctx.fs_root); > > > > +qpp_table_destroy(&s->qpp_table); > > > > v9fs_path_free(&path); > > > > } > > return rc; > > > > @@ -3699,6 +3840,7 @@ void v9fs_device_unrealize_common(V9fsState *s, > > Error **errp)> > > } > > fsdev_throttle_cleanup(s->ctx.fst); > > g_free(s->tag); > > > > +qpp_table_destroy(&s->qpp_table); > > > > g_free(s->ctx.fs_root); > > > > } > > > > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h > > index 5e316178d5..a283b0193e 100644 > > --- a/hw/9pfs/9p.h > > +++ b/hw/9pfs/9p.h > > @@ -8,6 +8,7 @@ > > > > #include "fsdev/9p-iov-marshal.h" > > #include "qemu/thread.h" > > #include "qemu/coroutine.h" > > > > +#include "qemu/qht.h" > > > > enum { > > > > P9_TLERROR = 6, > > > > @@ -235,6 +236,15 @@ struct V9fsFidState > > > > V9fsFidState *rclm_lst; > > > > }; > > > > +#define QPATH_INO_MASK((1ULL << 48) - 1) > > + > > +/* QID path prefix entry, see stat_to_qid */ > > +typedef struct { > > +dev_t dev; > > +uint16_t ino_prefix; > > +uint16_t qp_prefix; > > +} QppEntry; > > + > > > > struct V9fsState > > { > > > > QLIST_HEAD(, V9fsPDU) free_list; > > > > @@ -256,7 +266,10 @@ struct V9fsState > > > > Error *migration_blocker; > > V9fsConf fsconf; > > V9fsQID root_qid; > > > > +ino_t root_ino; > > Thinking again, I'm wondering if root_ino and dev_id could be used > instead of root_qid in v9fs_walk()... Would you have a look at that > if you decide to fix the readdir issue ? I keep it in mind when looking at the postponed readdir() issue again. > > dev_t dev_id; > > > > +struct qht qpp_table; > > +uint16_t qp_prefix_next; > > > > }; > > > > /* 9p2000.L open flags */ > >
Re: [Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
On Donnerstag, 29. August 2019 18:55:28 CEST Greg Kurz wrote: > > diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c > > index 7c31af..07a18c6e48 100644 > > --- a/fsdev/qemu-fsdev-opts.c > > +++ b/fsdev/qemu-fsdev-opts.c > > @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = { > > > > }, { > > > > .name = "readonly", > > .type = QEMU_OPT_BOOL, > > > > - > > +}, { > > +.name = "multidevs", > > +.type = QEMU_OPT_STRING, > > > > }, { > > > > .name = "socket", > > .type = QEMU_OPT_STRING, > > > > @@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = { > > > > .name = "readonly", > > .type = QEMU_OPT_BOOL, > > > > }, { > > > > +.name = "multidevs", > > +.type = QEMU_OPT_STRING, > > +}, { > > > > .name = "socket", > > .type = QEMU_OPT_STRING, > > > > }, { > > > > diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c > > index 077a8c4e2b..ed03d559a9 100644 > > --- a/fsdev/qemu-fsdev.c > > +++ b/fsdev/qemu-fsdev.c > > @@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = { > > > > "writeout", > > "fmode", > > "dmode", > > > > +"multidevs", > > So we only allow this for the "local" backend. Any reason not to > add this to "proxy" as well ? > > I didn't do it for the "throttling" options because it is a > feature I didn't care to support much, but "multidevs" is more > a fix than a fancy feature. Well, to be honest I haven't cared about the proxy backend at all. Like I mentioned before, I am a bit sceptical that the proxy feature is actually used by people in real life at all (at least in its current implementation). So personally I don't see much sense in investing time for adding & testing new things on this backend ATM. > > +if (multidevs) { > > +if (!strcmp(multidevs, "remap")) { > > +fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS; > > +fsle->fse.export_flags |= V9FS_REMAP_INODES; > > +} else if (!strcmp(multidevs, "forbid")) { > > +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; > > +fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS; > > +} > > And... > > } else if (!strcmp(multidevs, "warn")) { > fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS; > fsle->fse.export_flags &= ~V9FS_REMAP_INODES; > } else { > error_setg(errp, "invalid multidevs property '%s'", multidevs); > return -1; > } > > ... because we're a bit pedantic for command line options :) And I thought I adapted to relaxed handling of CLI arguments. See existing option "writeout". :) > > > +} > > > > if (fsle->fse.ops->parse_opts) { > > > > if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index 8cc65c2c67..c96ea51116 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -25,6 +25,7 @@ > > > > #include "trace.h" > > #include "migration/blocker.h" > > #include "sysemu/qtest.h" > > > > +#include "qemu/xxhash.h" > > > > int open_fd_hw; > > int total_open_fd; > > > > @@ -571,22 +572,109 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) > > > > P9_STAT_MODE_NAMED_PIPE | \ > > P9_STAT_MODE_SOCKET) > > > > -/* This is the algorithm from ufs in spfs */ > > + > > +/* creative abuse of tb_hash_func7, which is based on xxhash */ > > +static uint32_t qpp_hash(QppEntry e) > > +{ > > +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); > > +} > > + > > +static bool qpp_lookup_func(const void *obj, const void *userp) > > +{ > > +const QppEntry *e1 = obj, *e2 = userp; > > +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; > > +} > > + > > +static void qpp_table_remove(void *p, uint32_t h, void *up) > > +{ > > +g_free(p); > > +} > > + > > +static void qpp_table_destroy(struct qht *ht) > > +{ > > +qht_iter(ht, qpp_table_remove, NULL); > > +qht_destroy(ht); > > +} > > Ok to have a function for this instead of open-coding but I'd > like to see qpp_table_init() for consistency. Well, these are just qht_init() one-liners, but if you really want to have dedicated, local init functions for them, okay. > > static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID > > *qidp) { > > > > +int err; > > > > size_t size; > > > > -if (pdu->s->dev_id != stbuf->st_dev) { > > -error_report_once( > > -"9p: Multiple devices detected in same VirtFS export. " > > -"You must use a separate export for each device." > > -); > > -return -ENODEV; > > +if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { > > +/* map
Re: [Qemu-devel] [PATCH v6 1/4] 9p: Treat multiple devices on one export as an error
On Donnerstag, 29. August 2019 18:27:30 CEST Greg Kurz wrote: > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index 586a6dccba..8cc65c2c67 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) > > > > P9_STAT_MODE_SOCKET) > > > > /* This is the algorithm from ufs in spfs */ > > > > -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) > > +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID > > *qidp)> > > { > > > > size_t size; > > > > +if (pdu->s->dev_id != stbuf->st_dev) { > > +error_report_once( > > +"9p: Multiple devices detected in same VirtFS export. " > > +"You must use a separate export for each device." > > +); > > +return -ENODEV; > > As explained in the v5 review, we don't necessarily want to break existing > cross-device setups that just happen to work. Moreover, the next patch > re-allows them since remap_inodes=ignore is the default. I would thus > only do: > > warn_report_once( > "9p: Multiple devices detected in same VirtFS export, " > "which might lead to file ID collisions and severe " > "misbehaviours on guest! You should use a separate " > "export for each device shared from host." > ); > > So I've just changed that and applied to 9p-next since it is > a valuable improvement. Note that I've kept the signature change > of stat_to_qid() for simplicity even if it isn't needed before > the next patch. I'm fine with your changes. Dropping "return -ENODEV" in this patch was beyond my default level for details since this is really just a very detailed sort of git history tweaking. :) Like you already pointed out, not aborting (as default behaviour) would have been addressed with the subsequent patch anyway.
[Qemu-devel] [PATCH v6 4/4] 9p: Use variable length suffixes for inode remapping
Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 247 --- hw/9pfs/9p.h | 34 +++- 2 files changed, 251 insertions(+), 30 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 728641fb7f..0359469cfa 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,7 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include int open_fd_hw; int total_open_fd; @@ -572,6 +573,107 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) +{ +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) +{ +return ((uint64_t)mirror8bit( value& 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; +} + +/** @brief Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller the minimum bit count for the Exp. + * Golomb generated affixes will be (at lowest index) however for the + * price of having higher maximum bit count of generated affixes (at highest + * index). Likewise increasing this parameter yields in smaller maximum bit + * count for the price of having higher minimum bit count. + * + * In practice that means: a good value for k depends on the expected amount + * of devices to be exposed by one export. For a small amount of devices k + * should be small, for a large amount of devices k might be increased + * instead. The default of k=0 should be fine for most users though. + * + * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of + * k should not change as long as guest is still running! Because that would + * cause completely different inode numbers to be generated on guest. + */ +#define EXP_GOLOMB_K0 + +/** @brief Exponential Golomb algorithm for arbitrary k (including k=0). + * + * The Exponential Golomb algorithm generates @b prefixes (@b not suffixes!) + * with growing length and with the mathematical property of b
[Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
This is v6 of a proposed patch set for fixing file ID collisions with 9pfs. v5->v6: * Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 177fd3b6a8). * Replaced previous boolean option 'remap_inodes' by tertiary option 'multidevs=remap|forbid|warn', where 'warn' is the new/old default behaviour for not breaking existing installations: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg07098.html * Dropped incomplete fix in v9fs_do_readdir() which aimed to prevent exposing info outside export root with '..' entry. Postponed this fix for now for the reasons described: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01862.html Christian Schoenebeck (4): 9p: Treat multiple devices on one export as an error 9p: Added virtfs option 'multidevs=remap|forbid|warn' 9p: stat_to_qid: implement slow path 9p: Use variable length suffixes for inode remapping fsdev/file-op-9p.h | 5 + fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 11 ++ hw/9pfs/9p.c| 488 +--- hw/9pfs/9p.h| 51 + qemu-options.hx | 33 +++- vl.c| 6 +- 7 files changed, 565 insertions(+), 36 deletions(-) -- 2.11.0
[Qemu-devel] [PATCH v6 1/4] 9p: Treat multiple devices on one export as an error
The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies files within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. Signed-off-by: Antonios Motakis [CS: - Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). - error_report_once() if more than one device was shared by export. - Return -ENODEV instead of -ENOSYS in stat_to_qid(). - Fixed typo in log comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 69 hw/9pfs/9p.h | 1 + 2 files changed, 56 insertions(+), 14 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 586a6dccba..8cc65c2c67 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_SOCKET) /* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) { size_t size; +if (pdu->s->dev_id != stbuf->st_dev) { +error_report_once( +"9p: Multiple devices detected in same VirtFS export. " +"You must use a separate export for each device." +); +return -ENODEV; +} + memset(&qidp->path, 0, sizeof(qidp->path)); size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path)); memcpy(&qidp->path, &stbuf->st_ino, size); @@ -587,6 +595,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) if (S_ISLNK(stbuf->st_mode)) { qidp->type |= P9_QID_TYPE_SYMLINK; } + +return 0; } static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, @@ -599,7 +609,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, if (err < 0) { return err; } -stat_to_qid(&stbuf, qidp); +err = stat_to_qid(pdu, &stbuf, qidp); +if (err < 0) { +return err; +} return 0; } @@ -830,7 +843,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, memset(v9stat, 0, sizeof(*v9stat)); -stat_to_qid(stbuf, &v9stat->qid); +err = stat_to_qid(pdu, stbuf, &v9stat->qid); +if (err < 0) { +return err; +} v9stat->mode = stat_to_v9mode(stbuf); v9stat->atime = stbuf->st_atime; v9stat->mtime = stbuf->st_mtime; @@ -891,7 +907,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, #define P9_STATS_ALL 0x3fffULL /* Mask for All fields above */ -static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, +static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf, V9fsStatDotl *v9lstat) { memset(v9lstat, 0, sizeof(*v9lstat)); @@ -913,7 +929,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, /* Currently we only support BASIC fields in stat */ v9lstat->st_result_mask = P9_STATS_BASIC; -stat_to_qid(stbuf, &v9lstat->qid); +return stat_to_qid(pdu, stbuf, &v9lstat->qid); } static void print_sg(struct iovec *sg, int cnt) @@ -1115,7 +1131,6 @@ static void coroutine_fn v9fs_getattr(void *opaque) uint64_t request_mask; V9fsStatDotl v9stat_dotl; V9fsPDU *pdu = opaque; -V9fsState *s = pdu->s; retval = pdu_unmarshal(pdu, offset, "dq", &fid, &request_mask); if (retval < 0) { @@ -1136,7 +1151,10 @@ static void coroutine_fn v9fs_getattr(void *opaque) if (retval < 0) { goto out; } -stat_to_v9stat_dotl(s, &stbuf, &v9stat_dotl); +retval = stat_to_v9stat_dotl(pdu, &stbuf, &v9stat_dotl); +if (retval < 0) { +goto out; +} /* fill st_gen if requested and supported by underlying fs */ if (request_mask & P9_STATS_GEN) { @@ -1381,7 +1399,10 @@ static void coroutine_fn v9fs_walk(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} v9fs_path_copy(&dpath, &path); } memcpy(&qids[name_idx], &qid, sizeof(qid)); @@ -1483,7 +1504,10 @@ static void coroutine_fn v9fs_open(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} if (S_ISDIR(stbuf.st_mode)) { err = v9fs_co_opendir(pdu, fidp); if (err < 0) { @@ -1593,7 +1617,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque) fidp->flags |= FID_NON_RECLAIMABLE; } iounit = get_iounit(pdu
[Qemu-devel] [PATCH v6 3/4] 9p: stat_to_qid: implement slow path
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 177fd3b6a8). - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log error message about potential degraded performance in qid_path_prefixmap(). - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 70 ++-- hw/9pfs/9p.h | 9 2 files changed, 72 insertions(+), 7 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index c96ea51116..728641fb7f 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -579,23 +579,73 @@ static uint32_t qpp_hash(QppEntry e) return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); } +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + static bool qpp_lookup_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } -static void qpp_table_remove(void *p, uint32_t h, void *up) +static bool qpf_lookup_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino == e2->ino; +} + +static void qp_table_remove(void *p, uint32_t h, void *up) { g_free(p); } -static void qpp_table_destroy(struct qht *ht) +static void qp_table_destroy(struct qht *ht) { -qht_iter(ht, qpp_table_remove, NULL); +qht_iter(ht, qp_table_remove, NULL); qht_destroy(ht); } +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +error_report_once( +"9p: No more prefixes available for remapping inodes from " +"host to guest." +); +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) * to a unique QID path (64 bits). To avoid having to map and keep track * of up to 2^64 objects, we map only the 16 highest bits of the inode plus @@ -621,8 +671,7 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, if (pdu->s->qp_prefix_next == 0) { /* we ran out of prefixes */ error_report_once( -"9p: No more prefixes available for remapping inodes from " -"host to guest." +"9p: Potential degraded performance of inode remapping" ); return -ENFILE; } @@ -647,6 +696,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { /* map inode+device to qid path (fast path) */ err = qid_path_prefixmap(pdu, stbuf, &qidp->path); +if (err == -ENFILE) { +/* fast path didn't work, fall back to full map */ +err = qid_path_fullmap(pdu, stbuf, &qidp->path); +} if (err) { return err; } @@ -3813,6 +3866,7 @@ int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t, /* QID path hash table. 1 entry ought to be enough for anybody ;) */ qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); s->qp_prefix_next = 1; /* reserve 0 to detect overflow */ +s->qp_fullpath_next = 1; s->ctx.fst = &fse->fst; fsdev_throttle_init(s->ctx.fst); @@ -3827,7 +3881,8 @@ out: } g_free(s->tag); g_free(s->ctx.fs_root); -qpp_table_destroy(&s->qpp_table); +
[Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'
'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis [CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next (SHA1 177fd3b6a8). - Updated hash calls to new xxhash API. - Added virtfs option 'multidevs', original patch simply did the inode remapping without being asked. - Updated docs for new option 'multidevs'. - Capture root_ino in v9fs_device_realize_common() as well, not just the device id. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 5 ++ fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 11 +++ hw/9pfs/9p.c| 182 ++-- hw/9pfs/9p.h| 13 qemu-options.hx | 33 +++-- vl.c| 6 +- 7 files changed, 229 insertions(+), 28 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index c757c8099f..f2f7772c86 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,11 @@ typedef struct ExtendedOps { #define V9FS_RDONLY 0x0040 #define V9FS_PROXY_SOCK_FD 0x0080 #define V9FS_PROXY_SOCK_NAME0x0100 +/* + * multidevs option (either one of the two applies exclusively) + */ +#define V9FS_REMAP_INODES 0x0200 +#define V9FS_FORBID_MULTIDEVS 0x0400 #define V9FS_SEC_MASK 0x003C diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c index 7c31af..07a18c6e48 100644 --- a/fsdev/qemu-fsdev-opts.c +++ b/fsdev/qemu-fsdev-opts.c @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = { }, { .name = "readonly", .type = QEMU_OPT_BOOL, - +}, { +.name = "multidevs", +.type = QEMU_OPT_STRING, }, { .name = "socket", .type = QEMU_OPT_STRING, @@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = { .name = "readonly", .type = QEMU_OPT_BOOL, }, { +.name = "multidevs", +.type = QEMU_OPT_STRING, +}, { .name = "socket", .type = QEMU_OPT_STRING, }, { diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 077a8c4e2b..ed03d559a9 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = { "writeout", "fmode", "dmode", +"multidevs", "throttling.bps-total", "throttling.bps-read", "throttling.bps-write", @@ -121,6 +122,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) const char *fsdev_id = qemu_opts_id(opts); const char *fsdriver = qemu_opt_get(opts, "fsdriver"); const char *writeout = qemu_opt_get(opts, "writeout"); +const char *multidevs = qemu_opt_get(opts, "multidevs"); bool ro = qemu_opt_get_bool(opts, "readonly", 0); if (!fsdev_id) { @@ -161,6 +163,15 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) } else { fsle->fse.export_flags &= ~V9FS_RDONLY; } +if (multidevs) { +if (!strcmp(multidevs, "remap")) { +fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS; +fsle->fse.export_flags |= V9FS_REMAP_INODES; +} else if (!strcmp(multidevs, "forbid")) { +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; +fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS; +} +} if (fsle->fse.ops->parse_opts) { if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 8cc65c2c67..c96ea51116 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,22 +572,109 @@ static void coroutine_fn
Re: [Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'
On Mittwoch, 3. Juli 2019 13:13:26 CEST Christian Schoenebeck wrote: > To support multiple devices on the 9p share, and avoid > qid path collisions we take the device id as input [snip] > - Fixed v9fs_do_readdir() having exposed info outside >export root with '..' entry. [snip] > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index 8cc65c2c67..39c6c2a894 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c [snip] > @@ -1940,6 +2041,19 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU *pdu, > V9fsFidState *fidp, int32_t count = 0; > off_t saved_dir_pos; > struct dirent *dent; > +struct stat stbuf; > +bool fidIsExportRoot; > + > +/* > + * determine if fidp is the export root, which is required for safe > + * handling of ".." below > + */ > +err = v9fs_co_lstat(pdu, &fidp->path, &stbuf); > +if (err < 0) { > +return err; > +} > +fidIsExportRoot = pdu->s->dev_id == stbuf.st_dev && > + pdu->s->root_ino == stbuf.st_ino; > > /* save the directory position */ > saved_dir_pos = v9fs_co_telldir(pdu, fidp); > @@ -1964,16 +2078,51 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU > *pdu, V9fsFidState *fidp, v9fs_string_free(&name); > return count; > } > -/* > - * Fill up just the path field of qid because the client uses > - * only that. To fill the entire qid structure we will have > - * to stat each dirent found, which is expensive > - */ > -size = MIN(sizeof(dent->d_ino), sizeof(qid.path)); > -memcpy(&qid.path, &dent->d_ino, size); > -/* Fill the other fields with dummy values */ > -qid.type = 0; > -qid.version = 0; > + > +if (fidIsExportRoot && !strcmp("..", dent->d_name)) { > +/* > + * if "." is export root, then return qid of export root for > + * ".." to avoid exposing anything outside the export > + */ > +err = fid_to_qid(pdu, fidp, &qid); > +if (err < 0) { > +v9fs_readdir_unlock(&fidp->fs.dir); > +v9fs_co_seekdir(pdu, fidp, saved_dir_pos); > +v9fs_string_free(&name); > +return err; > +} Hmm, I start to wonder whether I should postpone that particular bug fix and not make it part of that QID fix patch series (not even as separate patch there). Because that fix needs some more adjustments. E.g. I should adjust dent->d_type here as well; but more notably it should also distinguish between the case where the export root is mounted as / on guest or not and that's where this fix could become ugly and grow in size. To make the case clear: calling on guest readdir(pathOfSome9pExportRootOnGuest); currently always returns for its ".." result entry the inode number and d_type of the export root's parent directory on host, so it exposes information of host outside the 9p export. I don't see that as security issue, since the information revealed is limited to the inode number and d_type, but it is definitely incorrect behaviour. Best regards, Christian Schoenebeck
[Qemu-devel] [PATCH v5 5/5] 9p: Use variable length suffixes for inode remapping
Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution is more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well, so there is less likely a need for generating and tracking additional suffixes, which might also be beneficial for nested virtualization where each level of virtualization would shift up the inode bits and increase the chance of expensive remapping actions. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a parameter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 247 --- hw/9pfs/9p.h | 34 +++- 2 files changed, 251 insertions(+), 30 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 549e279462..5bbd3e2d14 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,7 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include int open_fd_hw; int total_open_fd; @@ -572,6 +573,107 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) +{ +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) +{ +return ((uint64_t)mirror8bit( value& 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; +} + +/** @brief Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller the minimum bit count for the Exp. + * Golomb generated affixes will be (at lowest index) however for the + * price of having higher maximum bit count of generated affixes (at highest + * index). Likewise increasing this parameter yields in smaller maximum bit + * count for the price of having higher minimum bit count. + * + * In practice that means: a good value for k depends on the expected amount + * of devices to be exposed by one export. For a small amount of devices k + * should be small, for a large amount of devices k might be increased + * instead. The default of k=0 should be fine for most users though. + * + * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of + * k should not change as long as guest is still running! Because that would + * cause completely different inode numbers to be generated on guest. + */ +#define EXP_GOLOMB_K0 + +/** @brief Exponential Golomb algorithm for arbitrary k (including k=0). + * + * The Exponential Golomb algorithm generates @b prefixes (@b not suffixes!) + * with growing length and with the mathematical property of b
[Qemu-devel] [PATCH v5 2/5] 9p: Treat multiple devices on one export as an error
The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies files within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. Signed-off-by: Antonios Motakis [CS: - Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). - error_report_once() if more than one device was shared by export. - Return -ENODEV instead of -ENOSYS in stat_to_qid(). - Fixed typo in log comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 69 hw/9pfs/9p.h | 1 + 2 files changed, 56 insertions(+), 14 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 586a6dccba..8cc65c2c67 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_SOCKET) /* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) { size_t size; +if (pdu->s->dev_id != stbuf->st_dev) { +error_report_once( +"9p: Multiple devices detected in same VirtFS export. " +"You must use a separate export for each device." +); +return -ENODEV; +} + memset(&qidp->path, 0, sizeof(qidp->path)); size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path)); memcpy(&qidp->path, &stbuf->st_ino, size); @@ -587,6 +595,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) if (S_ISLNK(stbuf->st_mode)) { qidp->type |= P9_QID_TYPE_SYMLINK; } + +return 0; } static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, @@ -599,7 +609,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, if (err < 0) { return err; } -stat_to_qid(&stbuf, qidp); +err = stat_to_qid(pdu, &stbuf, qidp); +if (err < 0) { +return err; +} return 0; } @@ -830,7 +843,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, memset(v9stat, 0, sizeof(*v9stat)); -stat_to_qid(stbuf, &v9stat->qid); +err = stat_to_qid(pdu, stbuf, &v9stat->qid); +if (err < 0) { +return err; +} v9stat->mode = stat_to_v9mode(stbuf); v9stat->atime = stbuf->st_atime; v9stat->mtime = stbuf->st_mtime; @@ -891,7 +907,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, #define P9_STATS_ALL 0x3fffULL /* Mask for All fields above */ -static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, +static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf, V9fsStatDotl *v9lstat) { memset(v9lstat, 0, sizeof(*v9lstat)); @@ -913,7 +929,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, /* Currently we only support BASIC fields in stat */ v9lstat->st_result_mask = P9_STATS_BASIC; -stat_to_qid(stbuf, &v9lstat->qid); +return stat_to_qid(pdu, stbuf, &v9lstat->qid); } static void print_sg(struct iovec *sg, int cnt) @@ -1115,7 +1131,6 @@ static void coroutine_fn v9fs_getattr(void *opaque) uint64_t request_mask; V9fsStatDotl v9stat_dotl; V9fsPDU *pdu = opaque; -V9fsState *s = pdu->s; retval = pdu_unmarshal(pdu, offset, "dq", &fid, &request_mask); if (retval < 0) { @@ -1136,7 +1151,10 @@ static void coroutine_fn v9fs_getattr(void *opaque) if (retval < 0) { goto out; } -stat_to_v9stat_dotl(s, &stbuf, &v9stat_dotl); +retval = stat_to_v9stat_dotl(pdu, &stbuf, &v9stat_dotl); +if (retval < 0) { +goto out; +} /* fill st_gen if requested and supported by underlying fs */ if (request_mask & P9_STATS_GEN) { @@ -1381,7 +1399,10 @@ static void coroutine_fn v9fs_walk(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} v9fs_path_copy(&dpath, &path); } memcpy(&qids[name_idx], &qid, sizeof(qid)); @@ -1483,7 +1504,10 @@ static void coroutine_fn v9fs_open(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} if (S_ISDIR(stbuf.st_mode)) { err = v9fs_co_opendir(pdu, fidp); if (err < 0) { @@ -1593,7 +1617,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque) fidp->flags |= FID_NON_RECLAIMABLE; } iounit = get_iounit(pdu
[Qemu-devel] [PATCH v5 3/5] 9p: Added virtfs option 'remap_inodes'
To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis [CS: - Rebased to master head. - Updated hash calls to new xxhash API. - Added virtfs option 'remap_inodes', original patch simply did the inode remapping without being asked. - Updated docs for new option 'remap_inodes'. - Capture root_ino in v9fs_device_realize_common() as well, not just the device id. - Fixed v9fs_do_readdir() having exposed info outside export root with '..' entry. - Fixed v9fs_do_readdir() not having remapped inodes. - Log error message when running out of prefixes in qid_path_prefixmap(). - Fixed definition of QPATH_INO_MASK. - Dropped unnecessary parantheses in qpp_lookup_func(). - Dropped unnecessary g_malloc0() result checks. ] Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 6 ++ hw/9pfs/9p.c| 196 +++- hw/9pfs/9p.h| 13 qemu-options.hx | 25 -- vl.c| 3 + 7 files changed, 224 insertions(+), 27 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index c757c8099f..6c1663c252 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,7 @@ typedef struct ExtendedOps { #define V9FS_RDONLY 0x0040 #define V9FS_PROXY_SOCK_FD 0x0080 #define V9FS_PROXY_SOCK_NAME0x0100 +#define V9FS_REMAP_INODES 0x0200 #define V9FS_SEC_MASK 0x003C diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c index 7c31af..64a8052266 100644 --- a/fsdev/qemu-fsdev-opts.c +++ b/fsdev/qemu-fsdev-opts.c @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = { }, { .name = "readonly", .type = QEMU_OPT_BOOL, - +}, { +.name = "remap_inodes", +.type = QEMU_OPT_BOOL, }, { .name = "socket", .type = QEMU_OPT_STRING, @@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = { .name = "readonly", .type = QEMU_OPT_BOOL, }, { +.name = "remap_inodes", +.type = QEMU_OPT_BOOL, +}, { .name = "socket", .type = QEMU_OPT_STRING, }, { diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 077a8c4e2b..b6fa9799be 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -121,6 +121,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) const char *fsdev_id = qemu_opts_id(opts); const char *fsdriver = qemu_opt_get(opts, "fsdriver"); const char *writeout = qemu_opt_get(opts, "writeout"); +bool remap_inodes = qemu_opt_get_bool(opts, "remap_inodes", 0); bool ro = qemu_opt_get_bool(opts, "readonly", 0); if (!fsdev_id) { @@ -161,6 +162,11 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) } else { fsle->fse.export_flags &= ~V9FS_RDONLY; } +if (remap_inodes) { +fsle->fse.export_flags |= V9FS_REMAP_INODES; +} else { +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; +} if (fsle->fse.ops->parse_opts) { if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 8cc65c2c67..39c6c2a894 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,22 +572,98 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) -/* This is the algorithm from ufs in spfs */ + +/* creative abuse of tb_hash_func7, which is based on xxhash */ +static uint32_t qpp_hash(QppEntry e) +{ +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); +} + +static bool qpp_lookup_func(const void *obj, const void *userp) +{ +const QppEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; +} + +static void qpp_table_remove(void *p, uint32_t h, void *up) +{ +g_free(p); +} + +static void qpp_table_destroy(struct qht *ht) +{ +qht_iter(ht, qpp_table_remove, NULL); +qht_destroy(ht); +} + +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) + * to a unique QID path (64 bits). To avoid having to map and keep track + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus + * the device id to the 16 highest bits of the QID path. The 48 lowest bits + * of the QID pat
[Qemu-devel] [PATCH v5 1/5] 9p: unsigned type for type, version, path
There is no need for signedness on these QID fields for 9p. Signed-off-by: Antonios Motakis [CS: - Also make QID type unsigned. - Adjust donttouch_stat() to new types. - Adjust trace-events to new types. ] Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 6 +++--- hw/9pfs/9p.c | 6 +++--- hw/9pfs/trace-events | 14 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index c8823d878f..8f3babb60a 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -9,9 +9,9 @@ typedef struct V9fsString typedef struct V9fsQID { -int8_t type; -int32_t version; -int64_t path; +uint8_t type; +uint32_t version; +uint64_t path; } V9fsQID; typedef struct V9fsStat diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 55821343e5..586a6dccba 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -743,9 +743,9 @@ static int donttouch_stat(V9fsStat *stat) { if (stat->type == -1 && stat->dev == -1 && -stat->qid.type == -1 && -stat->qid.version == -1 && -stat->qid.path == -1 && +stat->qid.type == 0xff && +stat->qid.version == (uint32_t) -1 && +stat->qid.path == (uint64_t) -1 && stat->mode == -1 && stat->atime == -1 && stat->mtime == -1 && diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index c0a0a4ab5d..10188daf7f 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s" -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path) "tag %u id %u type %u version %u path %"PRIu64 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d length %"PRId64"}" v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) "tag %d id %d fid %d request_mask %"PRIu64 @@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t result_mask, uint32_t mod v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t nwnames) "tag %d id %d fid %d newfid %d nwnames %d" v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag %d id %d nwnames %d qids %p" v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d fid %d mode %d" -v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %u id %u qid={type %u version %u path %"PRIu64"} iounit %d" v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u" -v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_lcreate_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int32_t iounit) "tag %u id %u qid={type %u version %u path %"PRIu64"} iounit %d" v9fs_fsync(uint16_t tag, uint8_t id, int32_t fid, int datasync) "tag %d id %d fid %d datasync %d" v9fs_clunk(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_read(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t max_count) "tag %d id %d fid %d off %"PRIu64" max_count %u" @@ -26,21 +26,21 @@ v9fs_readdir_return(uint16_t tag, uint8_t id, uint32_t count, ssize_t retval) "t v9fs_write(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t count, int cnt) "tag %d id %d fid %d off %"PRIu64" count %u cnt %d" v9fs_write_return(uint16_t tag, uint8_t id, int32_t total, ssize_t err) "tag %d id %d total %d err %zd" v9fs_create(uint16_t tag, uint8_t id, int32_t fid, char* name, int32_t perm, int8_t mode) "tag %d id %d fid %d name %s perm %d mode %d" -v9fs_create_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_create_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %u id %u qid={type %u version %u path %"PRIu64"} iounit %d" v9fs_symlink(u
[Qemu-devel] [PATCH v5 4/5] 9p: stat_to_qid: implement slow path
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis [CS: - Rebased to master head. - Updated hash calls to new xxhash API. - Removed unnecessary parantheses in qpf_lookup_func(). - Removed unnecessary g_malloc0() result checks. - Log error message when running out of prefixes in qid_path_fullmap(). - Log error message about potential degraded performance in qid_path_prefixmap(). - Fixed typo in comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 70 ++-- hw/9pfs/9p.h | 9 2 files changed, 72 insertions(+), 7 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 39c6c2a894..549e279462 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -579,23 +579,73 @@ static uint32_t qpp_hash(QppEntry e) return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); } +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + static bool qpp_lookup_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } -static void qpp_table_remove(void *p, uint32_t h, void *up) +static bool qpf_lookup_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino == e2->ino; +} + +static void qp_table_remove(void *p, uint32_t h, void *up) { g_free(p); } -static void qpp_table_destroy(struct qht *ht) +static void qp_table_destroy(struct qht *ht) { -qht_iter(ht, qpp_table_remove, NULL); +qht_iter(ht, qp_table_remove, NULL); qht_destroy(ht); } +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +error_report_once( +"9p: No more prefixes available for remapping inodes from " +"host to guest." +); +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) * to a unique QID path (64 bits). To avoid having to map and keep track * of up to 2^64 objects, we map only the 16 highest bits of the inode plus @@ -621,8 +671,7 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, if (pdu->s->qp_prefix_next == 0) { /* we ran out of prefixes */ error_report_once( -"9p: No more prefixes available for remapping inodes from " -"host to guest." +"9p: Potential degraded performance of inode remapping" ); return -ENFILE; } @@ -647,6 +696,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { /* map inode+device to qid path (fast path) */ err = qid_path_prefixmap(pdu, stbuf, &qidp->path); +if (err == -ENFILE) { +/* fast path didn't work, fall back to full map */ +err = qid_path_fullmap(pdu, stbuf, &qidp->path); +} if (err) { return err; } @@ -3827,6 +3880,7 @@ int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t, /* QID path hash table. 1 entry ought to be enough for anybody ;) */ qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); s->qp_prefix_next = 1; /* reserve 0 to detect overflow */ +s->qp_fullpath_next = 1; s->ctx.fst = &fse->fst; fsdev_throttle_init(s->ctx.fst); @@ -3841,7 +3895,8 @@ out: } g_free(s->tag); g_free(s->ctx.fs_root); -qpp_table_destroy(&s->qpp_table); +qp_table_destroy(&s->qpp_table); +qp_table_destro
[Qemu-devel] [PATCH v5 0/5] 9p: Fix file ID collisions
This is v5 of a proposed patch set for fixing file ID collisions with 9pfs. v4->v5: All Patches: * Added details to individual commit logs of what has been changed exactly by me on top of Antonios' original 4 patches. Patch 1: * Fixed format specifiers in hw/9pfs/trace-events. Patch 2: * Fixed typo in commit log. * Assign dev_id to export root's device already in v9fs_device_realize_common(), not postponed in stat_to_qid(). * Return -ENODEV instead of -ENOSYS in stat_to_qid(). Patch 3: * Added missing manual parts for new virtfs option 'remap_inodes' in qemu-options.hx. * Capture root_ino in v9fs_device_realize_common() as well, not just the device id. * Added function dirent_to_qid(). * Fixed v9fs_do_readdir() having exposed info outside export root with '..' entry (no matter if inode remapping was on or not). * Fixed v9fs_do_readdir() not having remapped inodes. * Fixed definition of QPATH_INO_MASK. * Log error message when running out of prefixes in qid_path_prefixmap(). * Adjusted changes in stat_to_qid() to qemu code style guidelines. Patch 4: * Log error message when running out of prefixes in qid_path_fullmap(). * Log error message about potential degraded performance in qid_path_prefixmap() (that is when qid_path_fullmap() will start to kick in next). * Fixed typo in code comment. Patch 5: * Dropped fixed (16 bit) size prefix code and hence removed usage of P9_VARI_LENGTH_INODE_SUFFIXES macro checks all over the place. * Dropped function expGolombEncodeK0(uint64_t n) which was optimized for the expected default value of k=0; instead always use the generalized function expGolombEncode(uint64_t n, int k) instead now. * Adjusted changes in hw/9pfs/9p.c to qemu code style guidelines. * Adjusted functions' API comments in hw/9pfs/9p.c. v3->v4: * Rebased to latest git master head. * Splitted Antonios' patch set to its original 4 individual patches. (was merged previously as only 1 patch). * Addressed discussed issues directly on Antonios' patches (was a separate patch before). * Added virtfs command line option "remap_inodes": Unless this option is not enabled, no inode remapping is performed at all, the user just gets an error message when trying to use more than 1 device per export. * Dropped persistency feature of QIDs beyond reboots. * Dropped disputed "vii" feature. Christian Schoenebeck (5): 9p: unsigned type for type, version, path 9p: Treat multiple devices on one export as an error 9p: Added virtfs option 'remap_inodes' 9p: stat_to_qid: implement slow path 9p: Use variable length suffixes for inode remapping fsdev/9p-marshal.h | 6 +- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 6 + hw/9pfs/9p.c| 508 +--- hw/9pfs/9p.h| 51 + hw/9pfs/trace-events| 14 +- qemu-options.hx | 25 ++- vl.c| 3 + 9 files changed, 573 insertions(+), 48 deletions(-) -- 2.11.0
Re: [Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping
On Freitag, 28. Juni 2019 16:56:15 CEST Christian Schoenebeck via Qemu-devel > > > + */ > > > +#define EXP_GOLOMB_K0 > > > + > > > +# if !EXP_GOLOMB_K > > > + > > > +/** @brief Exponential Golomb algorithm limited to the case k=0. > > > + * > > > > This doesn't really help to have a special implementation for k=0. The > > resulting function is nearly identical to the general case. It is likely > > that the compiler can optimize it and generate the same code. > > I don't think the compiler's optimizer would really drop all the > instructions automatically of the generalized variant of that particular > function. Does it matter in practice? No, I actually just provided that > optimized variant for the special case k=0 because I think k=0 will be the > default value for that parameter and because you were already picky about a > simple > > if (pdu->s->dev_id == 0) > > to be optimized. In practice users won't feel the runtime difference either > one of the two optimization scenarios. I just checked my assmption made here with a simple C test unit: // Use manually optimized function for case k=0. VariLenAffix implK0(uint64_t n) { return expGolombEncodeK0(n); } // Rely on compiler to optimize generalized function for k=0 VariLenAffix implGenK(uint64_t n) { return expGolombEncode(n, 0); } And : gcc -S -O3 -ffast-math k.c Like expected the generated 2 functions are almost identical, except that the manually optimized variant saves the following 3 instructions: movl$0, %eax ... testl %edx, %edx cmovns %edx, %eax But like I said, it is a tiny difference of course. Not really relevant in practice. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v4 3/5] 9p: Added virtfs option "remap_inodes"
On Freitag, 28. Juni 2019 16:23:08 CEST Greg Kurz wrote: > > > This feature applies to all backends IIUC. We don't really care for the > > > synth backend since it generates non-colliding inode numbers by design, > > > but the proxy backend has the same issue as local. So... > > > > Yeah, I was not sure about these, because I did not even know what these > > two were for exactly. :) [ lazyness disclaimer end] > > "proxy" is a backend where all I/O accesses are performed by a separate > process running the virtfs-proxy-helper command. It runs with root > privileges, which provides the same level of functionality as "local" > with security_model=passthrough. It also chroot() into the shared > folder for extra security. But it is slower since it all requests > still go through the virtio-9p device in QEMU. This would call > for a vhost-9p implementation, but it's yet another story. > > "synth" is a software pseudo-backend, currently used to test 9pfs > with QTest (see tests/virtio-9p-test.c). Thanks for the clarification! So the proxy backend sounds like an idea that has not been implemented fully to its end. I guess it is not really used in production environments, right? What is the actual motivation for this proxy backend? And now that I look at it, I am a bit surprised that there is this pure Unix pipe socket based proxy variant, but no TCPIP network socket variant. I mean the latter is AFAIK the original idea behind the 9p protocol and IMO might be interesting to physical separate pure storage backends that way. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping
On Freitag, 28. Juni 2019 13:50:15 CEST Greg Kurz wrote: > > And with k=5 they would look like: > > > > Index Dec/Bin -> Generated Suffix Bin > > 1 [1] -> [01] (6 bits) > > 2 [10] -> [11] (6 bits) > > 3 [11] -> [010001] (6 bits) > > 4 [100] -> [110001] (6 bits) > > 5 [101] -> [001001] (6 bits) > > 6 [110] -> [101001] (6 bits) > > 7 [111] -> [011001] (6 bits) > > 8 [1000] -> [111001] (6 bits) > > 9 [1001] -> [000101] (6 bits) > > 10 [1010] -> [100101] (6 bits) > > 11 [1011] -> [010101] (6 bits) > > 12 [1100] -> [110101] (6 bits) > > ... > > 65533 [1101] -> [001110001000] (28 bits) > > 65534 [1110] -> [101110001000] (28 bits) > > 65535 [] -> [00001000] (28 bits) > > Hence minBits=6 maxBits=28 > > IIUC, this k control parameter should be constant for the > lifetime of QIDs. So it all boils down to choose a _good_ > value that would cover most scenarios, right ? That's correct. In the end it's just a matter of how many devices do you expect to be exposed with the same 9p export for choosing an appropriate value for k. That parameter k must be constant at least until guest is rebooted, otherwise you would end up with completely different inode numbers if you would change that parameter k while guest is still running. Like I mentioned before, I can send a short C file if you want to play around with that parameter k to see how the generated suffixes would look like. The console output is actually like shown above. Additionally the C demo also checks and proofs that all generated suffixes really generate unique numbers for the entire possible 64 bit range, so that should take away the scare about what this algorithm does. Since you said before that you find it already exotic to have more than 1 device being exported by 9p, I hence did not even bother to make that parameter "k" a runtime parameter. I *think* k=0 should be fine in practice. > For now, I just have some _cosmetic_ remarks. > > > Signed-off-by: Christian Schoenebeck > > --- > > > > hw/9pfs/9p.c | 267 > > ++- hw/9pfs/9p.h > > | 67 ++- > > 2 files changed, 312 insertions(+), 22 deletions(-) > > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > > index e6e410972f..46c9f11384 100644 > > --- a/hw/9pfs/9p.c > > +++ b/hw/9pfs/9p.c > > @@ -26,6 +26,7 @@ > > > > #include "migration/blocker.h" > > #include "sysemu/qtest.h" > > #include "qemu/xxhash.h" > > > > +#include > > > > int open_fd_hw; > > int total_open_fd; > > > > @@ -572,6 +573,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) > > > > P9_STAT_MODE_NAMED_PIPE | \ > > P9_STAT_MODE_SOCKET) > > > > +#if P9_VARI_LENGTH_INODE_SUFFIXES > > The numerous locations guarded by P9_VARI_LENGTH_INODE_SUFFIXES > really obfuscate the code, and don't ease review (for me at least). > And anyway, if we go for variable length suffixes, we won't keep > the fixed length prefix code. I just did that to provide a direct comparison between the fixed size prefix vs. variable size suffix code, since the fixed size prefix code is easier to understand. If you want I can add a 6th "cleanup" patch that would remove the fixed size prefix code and all the #ifs ? > > + > > +/* Mirrors all bits of a byte. So e.g. binary 1010 would become > > 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) { > > From CODING_STYLE: > > 4. Block structure > > [...] > > for reasons of tradition and clarity it comes on a line by itself: > > void a_function(void) > { > do_something(); > } Got it. > > +/* Parameter k for the Exponential Golomb algorihm to be used. > > + * > > + * The smaller this value, the smaller the minimum bit count for the Exp. > > + * Golomb generated affixes will be (at lowest index) however for the > > + * price of having higher maximum bit count of generated affixes (at > > highest + * index). Likewise increasing this parameter yields in smaller > > maximum bit + * count for the price of having higher minimum bit count. > > Forgive my laziness but what are the benefits of a smaller or larger > value, in term of user experience ? Well, the goal in general is too keep the generated suffix numbers (or their bit count actually) as small as possible, because you are cutting away the same amount of bits of the orginal inode number from host. So the smaller the generated suffix number is, the more bits you can simply directly use from the original inode number from host, and hence the cheaper this remap code becomes in practice. Because if you have e.g. a suffix number of just 3 bits, then in practice you will very likely only have exactly 1 entry in that hash table *ever*. So hash lookup will be very cheap, etc. If you had a suffix number of 32 bit instead that would mean you would need to cut away 32
Re: [Qemu-devel] [PATCH v4 3/5] 9p: Added virtfs option "remap_inodes"
On Freitag, 28. Juni 2019 12:09:31 CEST Greg Kurz wrote: > On Wed, 26 Jun 2019 20:42:13 +0200 > > Christian Schoenebeck via Qemu-devel wrote: > > To support multiple devices on the 9p share, and avoid > > qid path collisions we take the device id as input > > to generate a unique QID path. The lowest 48 bits of > > the path will be set equal to the file inode, and the > > top bits will be uniquely assigned based on the top > > 16 bits of the inode and the device id. > > > > Signed-off-by: Antonios Motakis > > Same remark about changes to the original patch. ack_once(); :) > BTW, I had a concern with the way v9fs_do_readdir() open-codes QID > generation without calling stat_to_qid(). > > See discussion here: > > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02724.html > > I guess you should ensure in a preliminary patch that QIDs only > come out of stat_to_qid(). Mja, actually I first omitted your suggestion consciously, because I first thought it was an overkill pure visibility issue lmited to the default case remap_inodes==false, but now that I look at it again, it is actually an issue even when remap_inodes==true since dirent would expose wrong inode numbers on guest as well. I will see what to do about it. However about your other concern here, quote: "Also, if we hit a collision while reading the directory, I'm afraid the remaining entries won't be read at all. I'm not sure this is really what we want." That's however still a concern here that I would consider overkill to address. I mean if a user gets into that situation then because of a configuration error that must be corrected by user; the point of this patch set is to prevent undefined behaviour and to make the user aware about the root cause of the overall issue; the purpose is not to address all possible issues while there is still a configuration error. > > +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, > > +uint64_t *path) > > +{ > > +QppEntry lookup = { > > +.dev = stbuf->st_dev, > > +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48) > > +}, *val; > > +uint32_t hash = qpp_hash(lookup); > > + > > +val = qht_lookup(&pdu->s->qpp_table, &lookup, hash); > > + > > +if (!val) { > > +if (pdu->s->qp_prefix_next == 0) { > > +/* we ran out of prefixes */ > > And we won't ever be able to allocate a new one. Maybe worth > adding an error_report_once() to inform the user ? Yeah, I thought about that as well. Will do. > > static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID > > *qidp) { > > > > -size_t size; > > +int err; > > > > -if (pdu->s->dev_id == 0) { > > -pdu->s->dev_id = stbuf->st_dev; > > -} else if (pdu->s->dev_id != stbuf->st_dev) { > > -error_report_once( > > -"9p: Multiple devices detected in same VirtFS export. " > > -"You must use a separate export for each device." > > -); > > -return -ENOSYS; > > +if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { > > +/* map inode+device to qid path (fast path) */ > > +err = qid_path_prefixmap(pdu, stbuf, &qidp->path); > > +if (err) { > > +return err; > > +} > > +} else { > > +if (pdu->s->dev_id == 0) { > > +pdu->s->dev_id = stbuf->st_dev; > > +} else if (pdu->s->dev_id != stbuf->st_dev) { > > +error_report_once( > > +"9p: Multiple devices detected in same VirtFS export. " > > +"You must either use a separate export for each device " > > +"shared from host or enable virtfs option > > 'remap_inodes'." > > +); > > +return -ENOSYS; > > +} > > +size_t size; > > From CODING_STYLE: > > 5. Declarations > > Mixed declarations (interleaving statements and declarations within > blocks) are generally not allowed; declarations should be at the beginning > of blocks. > > Please do so for "size" and add an extra blank line. Ok. > > +#define QPATH_INO_MASK(((unsigned long)1 << 48) - 1) > > This won't give the expected result on a 32-bit host. Since this > is a mask for 64-bit entities, it should rather be: > > #define QPATH
Re: [Qemu-devel] [PATCH v4 4/5] 9p: stat_to_qid: implement slow path
On Freitag, 28. Juni 2019 12:21:20 CEST Greg Kurz wrote: > > +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, > > +uint64_t *path) > > +{ > > +QpfEntry lookup = { > > +.dev = stbuf->st_dev, > > +.ino = stbuf->st_ino > > +}, *val; > > +uint32_t hash = qpf_hash(lookup); > > + > > +/* most users won't need the fullmap, so init the table lazily */ > > +if (!pdu->s->qpf_table.map) { > > +qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, > > QHT_MODE_AUTO_RESIZE); +} > > + > > +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); > > + > > +if (!val) { > > +if (pdu->s->qp_fullpath_next == 0) { > > +/* no more files can be mapped :'( */ > > This would be the place to put the error_report_once() suggested > in the previous patch actually. I will add the suggested error message to qid_path_prefixmap() in patch 3 and then will move over that error message to qid_path_fullmap() in patch 4. Or if you want I can also leave an error_report_once() in qid_path_prefixmap() in patch 4 about potential degraded performance. > > @@ -3779,7 +3831,8 @@ void v9fs_device_unrealize_common(V9fsState *s, > > Error **errp)> > > } > > fsdev_throttle_cleanup(s->ctx.fst); > > g_free(s->tag); > > > > -qpp_table_destroy(&s->qpp_table); > > +qp_table_destroy(&s->qpp_table); > > +qp_table_destroy(&s->qpf_table); > > I'm starting to think v9fs_device_unrealize_common() should be made > idempotent, so that it can be used to handle rollback of a partially > realized device, and thus avoid the code duplication. But this is > out-of-scope for this series. Well, I can also make that e.g.: if (s->qpf_table.map) qp_table_destroy(&s->qpf_table); if you prefer the occurrence amount to be reduced. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v4 2/5] 9p: Treat multiple devices on one export as an error
On Donnerstag, 27. Juni 2019 19:26:22 CEST Greg Kurz wrote: > On Wed, 26 Jun 2019 20:30:41 +0200 > > Christian Schoenebeck via Qemu-devel wrote: > > The QID path should uniquely identify a file. However, the > > inode of a file is currently used as the QID path, which > > on its own only uniquely identifies wiles within a device. > > s/wile/files Ah right. :) > > Here we track the device hosting the 9pfs share, in order > > to prevent security issues with QID path collisions from > > other devices. > > > > Signed-off-by: Antonios Motakis > > You should mention here the changes you made to the original patch. Got it. Will do for the other cases as well of course. > > -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) > > +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID > > *qidp)> > > { > > > > size_t size; > > > > +if (pdu->s->dev_id == 0) { > > +pdu->s->dev_id = stbuf->st_dev; > > st_dev should be captured in v9fs_device_realize_common() since we > lstat() the root there, instead of every request doing the check. Ok. > > +} else if (pdu->s->dev_id != stbuf->st_dev) { > > +error_report_once( > > +"9p: Multiple devices detected in same VirtFS export. " > > +"You must use a separate export for each device." > > +); > > +return -ENOSYS; > > This error is likely to end up as the return value of a > syscall in the guest and -ENOSYS usually means the syscall > isn't implemented, which is obviously not the case. Maybe > return -EPERM instead ? I would rather suggest -ENODEV. The entire device of the requested file/dir is not available on guest. -EPERM IMO rather motivates users looking for file system permission settings on individual files intead, and probably not checking the host's logs for the detailled error message. > > @@ -3633,6 +3674,8 @@ int v9fs_device_realize_common(V9fsState *s, const > > V9fsTransport *t,> > > goto out; > > > > } > > > > +s->dev_id = 0; > > + > > Set it to stat->st_dev after lstat() was called later in this function. I guesst you mean "earlier" not "later". The lstat() call is just before that dev_id initalization line. But in general your suggestion makes sense of course. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v4 1/5] 9p: unsigned type for type, version, path
On Donnerstag, 27. Juni 2019 18:12:03 CEST Greg Kurz wrote: > On Wed, 26 Jun 2019 20:25:55 +0200 > Christian Schoenebeck via Qemu-devel wrote: > > There is no need for signedness on these QID fields for 9p. > > > > Signed-off-by: Antonios Motakis > > You should mention here the changes you made on top of Antonios > original patch. Something like: > > [CS: - also convert path > - adapted trace-events and donttouch_stat()] Haven't seen that comment style in the git logs. Any example hash for that? > > diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events > > index c0a0a4ab5d..6964756922 100644 > > --- a/hw/9pfs/trace-events > > +++ b/hw/9pfs/trace-events > > @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id > > %d err %d"> > > v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag > > %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t > > id, int32_t msize, char* version) "tag %d id %d msize %d version %s" > > v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* > > uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s"> > > -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t > > version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 > > +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t > > version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64 > I was expecting to see PRIu64 for an uint64_t but I now realize that %d > seems to be used all over the place for unsigned types... :-\ > > At least, please fix the masks of the lines you're changing in this > patch so that unsigned are passed to "u" or PRIu64. The rest of the > mess can be fixed later in a followup. If you don't mind I will restrict it to your latter suggestion for now, that is adjusting it using the short format specifiers e.g. "u", the rest would IMO be out of the scope of this patch series. Too bad that no format specifier warnings are thrown on these. Best regards, Christian Schoenebeck
[Qemu-devel] [PATCH v4 0/5] 9p: Fix file ID collisions
This is v4 of a proposed patch set for fixing file ID collisions with 9pfs. v3->v4: * Rebased to latest git master head. * Splitted Antonios' patch set to its original 4 individual patches. (was merged previously as only 1 patch). * Addressed discussed issues directly on Antonios' patches (was a separate patch before). * Added virtfs command line option "remap_inodes": Unless this option is not enabled, no inode remapping is performed at all, the user just gets an error message when trying to use more than 1 device per export. * Dropped persistency feature of QIDs beyond reboots. * Dropped disputed "vii" feature. Greg, please check if I am doing anything superfluous in patch 3 regarding the new command line parameter "remap_inodes". Daniel, I also have a libvirt patch for this new "remap_inodes" command line parameter, but I guess I wait for this qemu patch set to get through. Christian Schoenebeck (5): 9p: unsigned type for type, version, path 9p: Treat multiple devices on one export as an error 9p: Added virtfs option "remap_inodes" 9p: stat_to_qid: implement slow path 9p: Use variable length suffixes for inode remapping fsdev/9p-marshal.h | 6 +- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev-opts.c | 7 +- fsdev/qemu-fsdev.c | 6 + hw/9pfs/9p.c| 448 +--- hw/9pfs/9p.h| 83 + hw/9pfs/trace-events| 14 +- qemu-options.hx | 17 +- vl.c| 3 + 9 files changed, 550 insertions(+), 35 deletions(-) -- 2.11.0
[Qemu-devel] [PATCH v4 4/5] 9p: stat_to_qid: implement slow path
stat_to_qid attempts via qid_path_prefixmap to map unique files (which are identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value. However this implementation makes some assumptions about inode number generation on the host. If qid_path_prefixmap fails, we still have 48 bits available in the QID path to fall back to a less memory efficient full mapping. Signed-off-by: Antonios Motakis Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 63 +++- hw/9pfs/9p.h | 9 + 2 files changed, 67 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 7ccc68a829..e6e410972f 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -579,23 +579,69 @@ static uint32_t qpp_hash(QppEntry e) return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); } +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + static bool qpp_lookup_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } -static void qpp_table_remove(void *p, uint32_t h, void *up) +static bool qpf_lookup_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino == e2->ino; +} + +static void qp_table_remove(void *p, uint32_t h, void *up) { g_free(p); } -static void qpp_table_destroy(struct qht *ht) +static void qp_table_destroy(struct qht *ht) { -qht_iter(ht, qpp_table_remove, NULL); +qht_iter(ht, qp_table_remove, NULL); qht_destroy(ht); } +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) * to a unique QID path (64 bits). To avoid having to map and keep track * of up to 2^64 objects, we map only the 16 highest bits of the inode plus @@ -642,6 +688,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) { /* map inode+device to qid path (fast path) */ err = qid_path_prefixmap(pdu, stbuf, &qidp->path); +if (err == -ENFILE) { +/* fast path didn't work, fall back to full map */ +err = qid_path_fullmap(pdu, stbuf, &qidp->path); +} if (err) { return err; } @@ -3752,6 +3802,7 @@ int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t, /* QID path hash table. 1 entry ought to be enough for anybody ;) */ qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); s->qp_prefix_next = 1; /* reserve 0 to detect overflow */ +s->qp_fullpath_next = 1; s->ctx.fst = &fse->fst; fsdev_throttle_init(s->ctx.fst); @@ -3766,7 +3817,8 @@ out: } g_free(s->tag); g_free(s->ctx.fs_root); -qpp_table_destroy(&s->qpp_table); +qp_table_destroy(&s->qpp_table); +qp_table_destroy(&s->qpf_table); v9fs_path_free(&path); } return rc; @@ -3779,7 +3831,8 @@ void v9fs_device_unrealize_common(V9fsState *s, Error **errp) } fsdev_throttle_cleanup(s->ctx.fst); g_free(s->tag); -qpp_table_destroy(&s->qpp_table); +qp_table_destroy(&s->qpp_table); +qp_table_destroy(&s->qpf_table); g_free(s->ctx.fs_root); } diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h index 0200e04176..2b74561030 100644 --- a/hw/9pfs/9p.h +++ b/hw/9pfs/9p.h @@ -245,6 +245,13 @@ typedef struct { uint16_t qp_prefix; } QppEntry; +/* QID path full entry, as above */ +typedef struct { +dev_t dev; +ino_t ino; +uint64_t path; +} QpfEntry; + struct V9fsState { QLIST_HEAD(, V9fsPDU) free_list; @@ -268,7 +275,9 @@ struct V9fsState V9fsQID root_qid; dev_t dev_id; struct qht qpp_table; +struct qht qpf_table; uint16_t qp_prefix_next; +uint64_t qp_fullpath_next; }; /* 9p2000.L open flags */ -- 2.11.0
[Qemu-devel] [PATCH v4 3/5] 9p: Added virtfs option "remap_inodes"
To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Signed-off-by: Antonios Motakis Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev-opts.c | 7 +++- fsdev/qemu-fsdev.c | 6 +++ hw/9pfs/9p.c| 105 ++-- hw/9pfs/9p.h| 12 ++ qemu-options.hx | 17 +++- vl.c| 3 ++ 7 files changed, 135 insertions(+), 16 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index c757c8099f..6c1663c252 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,7 @@ typedef struct ExtendedOps { #define V9FS_RDONLY 0x0040 #define V9FS_PROXY_SOCK_FD 0x0080 #define V9FS_PROXY_SOCK_NAME0x0100 +#define V9FS_REMAP_INODES 0x0200 #define V9FS_SEC_MASK 0x003C diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c index 7c31af..64a8052266 100644 --- a/fsdev/qemu-fsdev-opts.c +++ b/fsdev/qemu-fsdev-opts.c @@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = { }, { .name = "readonly", .type = QEMU_OPT_BOOL, - +}, { +.name = "remap_inodes", +.type = QEMU_OPT_BOOL, }, { .name = "socket", .type = QEMU_OPT_STRING, @@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = { .name = "readonly", .type = QEMU_OPT_BOOL, }, { +.name = "remap_inodes", +.type = QEMU_OPT_BOOL, +}, { .name = "socket", .type = QEMU_OPT_STRING, }, { diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 077a8c4e2b..b6fa9799be 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -121,6 +121,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) const char *fsdev_id = qemu_opts_id(opts); const char *fsdriver = qemu_opt_get(opts, "fsdriver"); const char *writeout = qemu_opt_get(opts, "writeout"); +bool remap_inodes = qemu_opt_get_bool(opts, "remap_inodes", 0); bool ro = qemu_opt_get_bool(opts, "readonly", 0); if (!fsdev_id) { @@ -161,6 +162,11 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) } else { fsle->fse.export_flags &= ~V9FS_RDONLY; } +if (remap_inodes) { +fsle->fse.export_flags |= V9FS_REMAP_INODES; +} else { +fsle->fse.export_flags &= ~V9FS_REMAP_INODES; +} if (fsle->fse.ops->parse_opts) { if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index cbaa212625..7ccc68a829 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,24 +572,96 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) -/* This is the algorithm from ufs in spfs */ + +/* creative abuse of tb_hash_func7, which is based on xxhash */ +static uint32_t qpp_hash(QppEntry e) +{ +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); +} + +static bool qpp_lookup_func(const void *obj, const void *userp) +{ +const QppEntry *e1 = obj, *e2 = userp; +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; +} + +static void qpp_table_remove(void *p, uint32_t h, void *up) +{ +g_free(p); +} + +static void qpp_table_destroy(struct qht *ht) +{ +qht_iter(ht, qpp_table_remove, NULL); +qht_destroy(ht); +} + +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) + * to a unique QID path (64 bits). To avoid having to map and keep track + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus + * the device id to the 16 highest bits of the QID path. The 48 lowest bits + * of the QID path equal to the lowest bits of the inode number. + * + * This takes advantage of the fact that inode number are usually not + * random but allocated sequentially, so we have fewer items to keep + * track of. + */ +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QppEntry lookup = { +.dev = stbuf->st_dev, +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48) +}, *val; +uint32_t hash = qpp_hash(lookup); + +val = qht_lookup(&pdu->s->qpp_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_prefix_next == 0) { +/* we ran out of prefixes */ +return -ENFILE; +} + +val = g_malloc0(
[Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping
Use variable length suffixes for inode remapping instead of the fixed 16 bit size prefixes before. With this change the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with the previous fixed size inode remapping. Additionally this solution should be more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well. Which might also be beneficial for nested virtualization. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a paramter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 267 ++- hw/9pfs/9p.h | 67 ++- 2 files changed, 312 insertions(+), 22 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index e6e410972f..46c9f11384 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,7 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include int open_fd_hw; int total_open_fd; @@ -572,6 +573,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +#if P9_VARI_LENGTH_INODE_SUFFIXES + +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) { +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) { +return ((uint64_t)mirror8bit( value& 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; +} + +/* Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller the minimum bit count for the Exp. + * Golomb generated affixes will be (at lowest index) however for the + * price of having higher maximum bit count of generated affixes (at highest + * index). Likewise increasing this parameter yields in smaller maximum bit + * count for the price of having higher minimum bit count. + */ +#define EXP_GOLOMB_K0 + +# if !EXP_GOLOMB_K + +/** @brief Exponential Golomb algorithm limited to the case k=0. + * + * See expGolombEncode() below for details. + * + * @param n - natural number (or index) of the prefix to be generated + *(1, 2, 3, ...) + */ +static VariLenAffix expGolombEncodeK0(uint64_t n) { +const int bits = (int) log2(n) + 1; +return (VariLenAffix) { +.type = AffixType_Prefix, +.value = n, +.bits = bits + bits - 1 +}; +} + +# else + +/** @brief Exponential Golomb algorithm for arbitrary k (including k=0). + * + * The Exponential Golomb algorithm generates @b prefixes (@b not suffixes!) + * with growing length and with the mathematical property of being + * "prefix-free". The latter means the generated prefixes can be prepended + * in front of arbitrary numbers and the resulting concatenated numbers are + * guaranteed to be always unique. + * + * This is
[Qemu-devel] [PATCH v4 2/5] 9p: Treat multiple devices on one export as an error
The QID path should uniquely identify a file. However, the inode of a file is currently used as the QID path, which on its own only uniquely identifies wiles within a device. Here we track the device hosting the 9pfs share, in order to prevent security issues with QID path collisions from other devices. Signed-off-by: Antonios Motakis Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 71 hw/9pfs/9p.h | 1 + 2 files changed, 58 insertions(+), 14 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 586a6dccba..cbaa212625 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -572,10 +572,20 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_SOCKET) /* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) { size_t size; +if (pdu->s->dev_id == 0) { +pdu->s->dev_id = stbuf->st_dev; +} else if (pdu->s->dev_id != stbuf->st_dev) { +error_report_once( +"9p: Multiple devices detected in same VirtFS export. " +"You must use a separate export for each device." +); +return -ENOSYS; +} + memset(&qidp->path, 0, sizeof(qidp->path)); size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path)); memcpy(&qidp->path, &stbuf->st_ino, size); @@ -587,6 +597,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) if (S_ISLNK(stbuf->st_mode)) { qidp->type |= P9_QID_TYPE_SYMLINK; } + +return 0; } static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, @@ -599,7 +611,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp, if (err < 0) { return err; } -stat_to_qid(&stbuf, qidp); +err = stat_to_qid(pdu, &stbuf, qidp); +if (err < 0) { +return err; +} return 0; } @@ -830,7 +845,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, memset(v9stat, 0, sizeof(*v9stat)); -stat_to_qid(stbuf, &v9stat->qid); +err = stat_to_qid(pdu, stbuf, &v9stat->qid); +if (err < 0) { +return err; +} v9stat->mode = stat_to_v9mode(stbuf); v9stat->atime = stbuf->st_atime; v9stat->mtime = stbuf->st_mtime; @@ -891,7 +909,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, V9fsPath *path, #define P9_STATS_ALL 0x3fffULL /* Mask for All fields above */ -static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, +static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf, V9fsStatDotl *v9lstat) { memset(v9lstat, 0, sizeof(*v9lstat)); @@ -913,7 +931,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf, /* Currently we only support BASIC fields in stat */ v9lstat->st_result_mask = P9_STATS_BASIC; -stat_to_qid(stbuf, &v9lstat->qid); +return stat_to_qid(pdu, stbuf, &v9lstat->qid); } static void print_sg(struct iovec *sg, int cnt) @@ -1115,7 +1133,6 @@ static void coroutine_fn v9fs_getattr(void *opaque) uint64_t request_mask; V9fsStatDotl v9stat_dotl; V9fsPDU *pdu = opaque; -V9fsState *s = pdu->s; retval = pdu_unmarshal(pdu, offset, "dq", &fid, &request_mask); if (retval < 0) { @@ -1136,7 +1153,10 @@ static void coroutine_fn v9fs_getattr(void *opaque) if (retval < 0) { goto out; } -stat_to_v9stat_dotl(s, &stbuf, &v9stat_dotl); +retval = stat_to_v9stat_dotl(pdu, &stbuf, &v9stat_dotl); +if (retval < 0) { +goto out; +} /* fill st_gen if requested and supported by underlying fs */ if (request_mask & P9_STATS_GEN) { @@ -1381,7 +1401,10 @@ static void coroutine_fn v9fs_walk(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} v9fs_path_copy(&dpath, &path); } memcpy(&qids[name_idx], &qid, sizeof(qid)); @@ -1483,7 +1506,10 @@ static void coroutine_fn v9fs_open(void *opaque) if (err < 0) { goto out; } -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} if (S_ISDIR(stbuf.st_mode)) { err = v9fs_co_opendir(pdu, fidp); if (err < 0) { @@ -1593,7 +1619,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque) fidp->flags |= FID_NON_RECLAIMABLE; } iounit = get_iounit(pdu, &fidp->path); -stat_to_qid(&stbuf, &qid); +err = stat_to_qid(pdu, &stbuf, &qid); +if (err < 0) { +goto out; +} err = pdu_marshal(pdu, offset, "Qd", &qid, iounit); if (err < 0) { goto
[Qemu-devel] [PATCH v4 1/5] 9p: unsigned type for type, version, path
There is no need for signedness on these QID fields for 9p. Signed-off-by: Antonios Motakis Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 6 +++--- hw/9pfs/9p.c | 6 +++--- hw/9pfs/trace-events | 14 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index c8823d878f..8f3babb60a 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -9,9 +9,9 @@ typedef struct V9fsString typedef struct V9fsQID { -int8_t type; -int32_t version; -int64_t path; +uint8_t type; +uint32_t version; +uint64_t path; } V9fsQID; typedef struct V9fsStat diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 55821343e5..586a6dccba 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -743,9 +743,9 @@ static int donttouch_stat(V9fsStat *stat) { if (stat->type == -1 && stat->dev == -1 && -stat->qid.type == -1 && -stat->qid.version == -1 && -stat->qid.path == -1 && +stat->qid.type == 0xff && +stat->qid.version == (uint32_t) -1 && +stat->qid.path == (uint64_t) -1 && stat->mode == -1 && stat->atime == -1 && stat->mtime == -1 && diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index c0a0a4ab5d..6964756922 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s" -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d length %"PRId64"}" v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) "tag %d id %d fid %d request_mask %"PRIu64 @@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t result_mask, uint32_t mod v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t nwnames) "tag %d id %d fid %d newfid %d nwnames %d" v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag %d id %d nwnames %d qids %p" v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d fid %d mode %d" -v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u" -v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_lcreate_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_fsync(uint16_t tag, uint8_t id, int32_t fid, int datasync) "tag %d id %d fid %d datasync %d" v9fs_clunk(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_read(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t max_count) "tag %d id %d fid %d off %"PRIu64" max_count %u" @@ -26,21 +26,21 @@ v9fs_readdir_return(uint16_t tag, uint8_t id, uint32_t count, ssize_t retval) "t v9fs_write(uint16_t tag, uint8_t id, int32_t fid, uint64_t off, uint32_t count, int cnt) "tag %d id %d fid %d off %"PRIu64" count %u cnt %d" v9fs_write_return(uint16_t tag, uint8_t id, int32_t total, ssize_t err) "tag %d id %d total %d err %zd" v9fs_create(uint16_t tag, uint8_t id, int32_t fid, char* name, int32_t perm, int8_t mode) "tag %d id %d fid %d name %s perm %d mode %d" -v9fs_create_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_create_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_symlink(uint16_t tag, uint8_t id, int32_t fid, char* name, char* symname, uint32_t gid) "tag %d id %d fid %d name %s symname %s gid
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Montag, 3. Juni 2019 08:57:15 CEST Greg Kurz wrote: > > Ok, I will extend Antonios' patch to log that error on host. I thought > > about limiting that error message to once per session (for not flooding > > the logs), but it is probably not worth it, so if you don't veto then I > > will just log that error simply on every file access. > > Please use error_report_once(). I will. > > > Please repost a series, possibly based on some of Antonios's patches > > > that > > > allows to avoid the QID collision, returns an error to the client > > > instead > > > and possibly printing out some useful messages in the QEMU log. Then, on > > > top of that, you can start introducing hashing and variable prefix > > > length. > > > > So you want that as its own patch series first, or can I continue with my > > suggestion to deliver the hash patch and variable suffix length patch as > > last patches within the same series? > > Same series is okay. Ok. I'm currently busy with other work; I will probably send a new patch set approximately next week. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Montag, 20. Mai 2019 16:05:09 CEST Greg Kurz wrote: > Hi Christian, Hi Greg, > On the other hand, I'm afraid that having a functional solution won't > motivate people to come up with a new spec... Anyway, I agree that the > data corruption/loss issues must be prevented, ie, the 9p server should > at least return an error to the client rather than returning a colliding > QID. Ok, I will extend Antonios' patch to log that error on host. I thought about limiting that error message to once per session (for not flooding the logs), but it is probably not worth it, so if you don't veto then I will just log that error simply on every file access. > A simple way to avoid that is to enforce a single device, ie. patch > 2 in Antonios's series. Of course this may break some setups where > multiple devices are involved, but it is pure luck if they aren't already > broken with the current code base. Yes, the worst thing you can have is this collision silently being ignored, like it is actually right now. Because you end up getting all sorts of different misbehaviours on guests, and these are just symptoms that take a while to debug and realise that the actual root cause is an inode collision. So enforcing a single device is still better than undefined behaviour. > > Background: The concept of FS "data sets" combines the benefits of > > classical partitions (e.g. logical file space separation, independent fs > > configurations like compression on/off/algorithm, data deduplication > > on/off, snapshot isolation, snapshots on/off) without the disadvantages > > of classical real partitions (physical space is dynamically shared, no > > space wasted on fixed boundaries; physical device pool management is > > transparent for all data sets, configuration options can be inherited > > from parent data sets). > > Ok. I must admit my ignorance around ZFS and "data sets"... so IIUC, even > with a single underlying physical device you might end up with lstat() > returning different st_dev values on the associated files, correct ? > > I take your word for the likeliness of the issue to popup in such setups. :) Yes, that is correct, you _always_ get a different stat::st_dev value for each ZFS data set. Furthermore, each ZFS data set has its own inode number sequence generator starting from one. So consider you create two new ZFS data sets, then you create one file on each data set, then both files will have inode number 1. That probably makes it clear why you hit this ID collision bug very easily when using the combination ZFS & 9p. > > also a big difference giving the user the _optional_ possibility to define > > e.g. one path (not device) on guest said to be sensitive regarding high > > inode numbers on guest; and something completely different telling the > > user that he _must_ configure every single device from host that is ever > > supposed to pop up with 9p on guest and forcing the user to update that > > configuration whenever a new device is added or removed on host. The > > "vii" configuration feature does not require any knowledge of how many > > and which kinds of devices are actually ever used on host (nor on any > > higher level host in case of nested > > virtualization), nor does that "vii" config require any changes ever when > > host device setup changes. So 9p's core transparency feature would not be > > touched at all. > > I guess this all boils down to I finding some time to read/understand more > :) Yes, that helps sometimes. :) > As usual, a series with smaller and simpler patches will be easier to > review, and more likely to be merged. Of course. In the next patch series I will completely drop a) the entire QID persistency feature code and b) that disputed "vii" feature. But I will still suggest the variable inode suffix length patch as last patch in that new patch series. That should make the amount of changes manageable small. > > Let me make a different suggestion: how about putting these fixes into a > > separate C unit for now and making the default behaviour (if you really > > want) to not use any of that code by default at all. So the user would > > just get an error message in the qemu log files by default if he tries to > > export several devices with one 9p device, suggesting him either to map > > them as separate 9p devices (like you suggested) and informing the user > > about the alternative of enabling support for the automatic inode > > remapping code (from that separate C unit) instead by adding one > > convenient config option if he/she really wants. > It seems that we may be reaching some consensus here :) > > I like the approach, provided this is configurable from the command line, > off by default and doesn't duplicate core 9p code. Not sure this needs to > sit in its own C unit though. Any preference for a command line argument name and/or libvirt XML config tag/ attribute for switching the inode remapping code on? About that separate C unit: I leave th
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Freitag, 17. Mai 2019 16:47:46 CEST Greg Kurz wrote: > Potentially yes if another approach is satisfying enough, as I wouldn't > want to over-engineer too much around this 9p imposed limitation. The > right thing to do would be to come up with a new version of the protocol > with variable sized QIDs and call it a day. Yes, but that's the long-term solution which will still take a very long while. This patch set is already a functional solution until that happens, and this 9p issue is IMO quite severe (as minor as data corruption, data loss and exists for several years already). > > plus it completely destroys the fundamental idea about 9p, which is about > > transparency of the higher level(s). > > That's a point indeed, even if again I'm not sure if this is a frequent > case to share a directory tree spanning over multiple devices. When the use case is mass storage then it is very likely that you will have several devices. Times have changed. With modern file systems like ZFS it is very common to create a large amount of "data sets" for all kinds of individual purposes and mount points which is cheap to get. Each fs data set is a separate "device" from OS (i.e. Linux) point of view, even if all those FS data sets are in the same FS pool and even on the same physical IO device. Background: The concept of FS "data sets" combines the benefits of classical partitions (e.g. logical file space separation, independent fs configurations like compression on/off/algorithm, data deduplication on/off, snapshot isolation, snapshots on/off) without the disadvantages of classical real partitions (physical space is dynamically shared, no space wasted on fixed boundaries; physical device pool management is transparent for all data sets, configuration options can be inherited from parent data sets). > I don't have that much time to spend on 9p maintainership, for both > reviewing and fixing bugs (CVEs most of the time). So, yes it may > sound like I want to drop the patchset, but it's just I need to be > convinced I won't regret having merged a huge change like this... > when I'll have to support it alone later ;-) Actually I already assumed this to be the actual root cause. I see that you are currently the only maintainer, and my plan was not to just provide a one time shot but eventually hang along helping with maintaining it due my use of 9p and its huge potential I see (as universal and efficient root file system for all guests, not just for exotically sharing a small tree portion between guests). I also have plans for supporting native file forks BTW. But if you are seriously suggesting to simply omit a fundamental issue in 9p, then my plans would obviously no longer make sense. :) I mean I am completely tolerant to all kinds of religious views on bit level, languages, code style, libs, precise implementation details, parameters, source structure, etc.; but saying to simply leave a fundamental bug open for years to come, that's where I have to drop out. > For the moment, I'm not convinced by the "vii" solution. It even > motivated my suggestion of having several devices actually, since > the paths you would put in there are known before starting QEMU. Well, that "vii" configuration and the QID persistency are 2 optional patches on top of the core fixes. It is a huge difference to suggest dropping these 2 patches than saying to completely drop the entire patch set and to leave this bug open. The mandatory core fixes that I see (for the short/mid term) are at least Antonios' patches plus my variable length prefix patch, the latter significantly reduces the inode numbers on guest and significantly increases the inode name space, and hence also significanlty reduces the propability that 9p might ever need to kick in with any kind of expensive remapping actions (or even something worse like stat fatally returning -ENFILE to the user). About the "vii" configuration, even though you don't like the idea: there is also a big difference giving the user the _optional_ possibility to define e.g. one path (not device) on guest said to be sensitive regarding high inode numbers on guest; and something completely different telling the user that he _must_ configure every single device from host that is ever supposed to pop up with 9p on guest and forcing the user to update that configuration whenever a new device is added or removed on host. The "vii" configuration feature does not require any knowledge of how many and which kinds of devices are actually ever used on host (nor on any higher level host in case of nested virtualization), nor does that "vii" config require any changes ever when host device setup changes. So 9p's core transparency feature would not be touched at all. > It might take me some more rounds of discussion to decide. I understand > it is frustrating but bear with me :) Let me make a different suggestion: how about putting these fixes into a separate
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Freitag, 17. Mai 2019 14:30:29 CEST Greg Kurz wrote: > Then, we come to the bulk problem: how to handle the case where we > have multiple devices involved in a directory we want to share ? > Antonios's proposal is a clever way to address the collisions, but > your work proves it isn't enough... With the patch set I have right now, things finally bahave smooth. > Before going forward, I'd like > to consider another approach. > > What about: > > 1) de-compose the shared directory on a per-device basis, >ie. identify all mount points under the shared directory > > 2) expose found mount points separately, each with its onw 9p device > > 3) re-compose the directory tree within the guest using the same topology >as the host > > ie. if you want to share /vm/fs and > > /vm/fs on device A > /vm/fs/shares on device B > /vm/fs/tmp on device C > > you would start QEMU with > > -fsdev local,path=/vm/fs,id=fsdev0... \ > -device virtio-9p,fsdev=fsdev0,mount_tag=tag0 \ > -fsdev local,path=/vm/fs,id=fsdev1... \ > -device virtio-9p,fsdev=fsdev1,mount_tag=tag1 \ > -fsdev local,path=/vm/fs,id=fsdev2... \ > -device virtio-9p,fsdev=fsdev2,mount_tag=tag2 > > and /etc/fstab in the guest: > > tag0/ 9p nofail,trans=virtio,version=9p2000.L 0 0 > tag1/shares 9p nofail,trans=virtio,version=9p2000.L 0 0 > tag2/tmp9p nofail,trans=virtio,version=9p2000.L 0 0 > > This involves some more work for the user but it doesn't require > any changes in QEMU. So your suggestion is actually: don't fix it. "Some" more work for the user is a quantity of how many guests you are running, multiplied by the nested virtualization levels you might have = potentially a lot of work for admins. > Would this approach solve the issues you've been hitting with Samba ? No, because that completely neglects runtime changes on a higher level (host), plus it completely destroys the fundamental idea about 9p, which is about transparency of the higher level(s). May I ask, do you have concrete reasons why you want to abondon the entire patch set? Because that's what it sounds to me. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Dienstag, 7. Mai 2019 18:16:08 CEST Christian Schoenebeck wrote: > Here are the archive links for latest v3 patch set [5(+1) patches total]: > > [PATCH v3 0/5] 9p: Fix file ID collisions: > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01143.html > > [PATCH v3 1/5] 9p: mitigates most QID path collisions: > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01142.html > > [PATCH v3 2/5] 9P: trivial cleanup of QID path collision mitigation: > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01140.html > > [PATCH v3 3/5] 9p: persistency of QID path beyond reboots / suspensions: > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01144.html > > [PATCH v3 4/5] 9p: use variable length suffixes for inode mapping: > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01141.html > > [PATCH v3 5/5] 9p: adds virtfs 'vii' device parameter > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01138.html > > And the optional libvirt patch: > > [libvirt patch] qemu: adds support for virtfs 9p argument 'vii': > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01223.html > > > > Please just have a glimpse on that v3 thread, and before I address the > > > details that you requested (I have reviewed them all already and will > > > address them), I would like you to ask you for a coarse feedback on > > > design/features first. > > > > > Because there are some things where I am unresolved on design level yet: > > > > I'll try but probably not before next week. Hi Greg, you have not forgotten about me, did you? ;-) Or should I go ahead and provide a v4 next week addressing the issues discussed so far? Best regards, Christian Schoenebeck
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Dienstag, 7. Mai 2019 17:42:39 CEST Greg Kurz wrote: > > Sorry that I caused a bit of confusion, You were actually commenting > > mostly on v2 of the patch set, where my email client replaced the message > > IDs and hence screwed threading. > > > > This is v3 that I sent yesterday and which has correct threading: > > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01143.html > > For a reason yet to be investigated, I haven't received it yet... Here are the archive links for latest v3 patch set [5(+1) patches total]: [PATCH v3 0/5] 9p: Fix file ID collisions: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01143.html [PATCH v3 1/5] 9p: mitigates most QID path collisions: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01142.html [PATCH v3 2/5] 9P: trivial cleanup of QID path collision mitigation: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01140.html [PATCH v3 3/5] 9p: persistency of QID path beyond reboots / suspensions: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01144.html [PATCH v3 4/5] 9p: use variable length suffixes for inode mapping: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01141.html [PATCH v3 5/5] 9p: adds virtfs 'vii' device parameter https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01138.html And the optional libvirt patch: [libvirt patch] qemu: adds support for virtfs 9p argument 'vii': https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01223.html > > Please just have a glimpse on that v3 thread, and before I address the > > details that you requested (I have reviewed them all already and will > > address them), I would like you to ask you for a coarse feedback on > > design/features first. > > Because there are some things where I am unresolved on design level yet: > I'll try but probably not before next week. No problem, take your time! Best regards, Christian Schoenebeck
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Dienstag, 7. Mai 2019 13:57:56 CEST Daniel P. Berrangé wrote: > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > Like with the vii qemu virtfs command line argument, the order of the > > "important" tag defines which one gets the highest inode namespace > > (smallest generated suffix) on guest side. > > Do we think anyone is likely to use this feature in the real world ? I don't know if other people need it, that's one of the reasons why I am asking for a coarse high level feedback of the current v3 patch set before getting into the details. The only thing I can say right now is that I must use this feature when running Samba to avoid all kinds of serious problems. And I could imagine inode namespace control to become more of an issue once nested virtualization becomes more popular. > I'm not really a fan of the representation, because this is affecting > guest ABI via a side effect of the ordering which is the kind of thing > that has got us in trouble before. If we need control over the IDs > used for each mount point, then I tend to think we need to represent > it explicitly such that the mgmt app sets the exact ID used. > > > > > this ensures that the IDs are still stable when adding or removing > paths Well, that would lead to the exact opposite of what you asked for. Because allowing admins to configure an exact ID (which I think you mean should be used as exact inode suffix by 9p then) would expose implementation details inside 9pfs to config space, which are subject to change, might collide with implementation details, and requires implementation knowledge and extreme care by admins so they would pick appropriate IDs with "suffix-free" property which are guaranteed to create unique numbers in all cases: https://en.wikipedia.org/wiki/Prefix_code Also keep in mind that one fs device might end up having multiple suffixes. Hence my suggestion was to only expose the bare minimum to config space regarding this issue: Asking (if required at all) admins which ones are the most critical pathes regarding inode namespace for their use cases, and 9p would then automatically generate appropriate suffixes for those mentioned by admin to achieve the highest inode namespace appropriately and in a safe way. Plus for the "important path=" semantics I suggested you don't have have to use mount points BTW. You can use subdirs and even individual files and 9pfs would then automatically resolve the appropriate fs device of the given path. So e.g. when using nested virtualization, an admin inside a lower level guest does not even need to know the exact mount points on a higher level / host. Best regards, Christian Schoenebeck
Re: [Qemu-devel] [PATCH v3 1/5] 9p: mitigates most QID path collisions
On Dienstag, 7. Mai 2019 13:42:47 CEST Daniel P. Berrangé wrote: > > This first patch here is an updated version of Antonios Motakis' > > original 4-patch set (using fixed length 16 bit prefixes), merged to one > > patch: > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html > > > > * Updated to latest git master, specifically to new qht interface. > > > > * Merged the original 4 patches to this single patch. > > Why did you merge them all into one ? The split patches were "best > practice" IMHO. The original patch authorship & S-o-B lines could > be preserved if you kept them split as before too. Seems I misinterpreted an old comment of Greg that he would like to see them to be merged into less patches. I think it was this one: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02590.html No problem, I will restore Antonios' original individual 4 patches appropriately. What about SOB then? Should I just place Antonio's SOB on those 4 patches or does it need his and mine SOB lines? (I mean I need to rebase those 4 patches and address the old issues on them) Best regards, Christian Schoenebeck
Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
On Dienstag, 7. Mai 2019 11:55:56 CEST Greg Kurz wrote: > > support the 'vii' feature of patch 5, which introduces the XML config > > What is patch 5 ?!? What is 'vii' ? I am a bit lost here... Hi Greg, Sorry that I caused a bit of confusion, You were actually commenting mostly on v2 of the patch set, where my email client replaced the message IDs and hence screwed threading. This is v3 that I sent yesterday and which has correct threading: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01143.html Please just have a glimpse on that v3 thread, and before I address the details that you requested (I have reviewed them all already and will address them), I would like you to ask you for a coarse feedback on design/features first. Because there are some things where I am unresolved on design level yet: 1. Should I drop the "persistency" feature of patch 3 (same inode numbers after reboots/suspends) completely from the patch set? It is disabled at compile time by default for now after entire v3 patch set is applied. Or should that persistency feature probably become a qemu command line option instead? 2. If persistency feature should be preserved, shall I probably move out all the inode remapping code into a separate C unit to avoid 9p.c getting bloated too much (the amount of code for saving/loading the qp*_table hash tables is quite large). If yes, any suggestion for an appropriate unit name? 3. Are you fine with the suggested variable length suffixes (patch 4) becoming the default behaviour (instead of the fixed length 16 bit prefix solution by Antonios)? 4. Do you have a better idea for a name instead of the suggested "vii" (patch 5) virtfs qemu command line option? And are you fine with the idea of that "vii" feature anyway? > > This is the counter part patch against latest libvirt git master head to > > Hmm... shouldn't this be Cc'd to libvir-l...@redhat.com as well then ? Well, for now I just provided that libvirt patch to give you an idea about how imagined this "vii" feature to be used. Does it make sense to CC them already even though this suggested "vii" command line option does not exist on qemu side yet? I know I piled up quite a bit of code on this patch set, so to speed up things simply raise questions instead of spending too much time in reviewing everything in detail already. Thanks Greg! Best regards, Christian Schoenebeck
[Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
This is the counter part patch against latest libvirt git master head to support the 'vii' feature of patch 5, which introduces the XML config XML tag "important" on libvirt side. To stick with the previous example mentioned with patch 5, likewise libvirt XML configuration might then look like this: ... ... Like with the vii qemu virtfs command line argument, the order of the "important" tag defines which one gets the highest inode namespace (smallest generated suffix) on guest side. Signed-off-by: Christian Schoenebeck --- docs/schemas/domaincommon.rng | 6 ++ src/conf/domain_conf.c| 30 ++ src/conf/domain_conf.h| 3 +++ src/qemu/qemu_command.c | 10 ++ 4 files changed, 49 insertions(+) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 111b85c36f..c75edfc4d3 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2515,6 +2515,12 @@ + + + + + + diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b4fb6cf981..cc75c6a7dd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2294,6 +2294,8 @@ virDomainFSDefNew(void) void virDomainFSDefFree(virDomainFSDefPtr def) { +size_t i; + if (!def) return; @@ -2302,6 +2304,13 @@ void virDomainFSDefFree(virDomainFSDefPtr def) virDomainDeviceInfoClear(&def->info); VIR_FREE(def->virtio); +if (def->important) { +for (i = 0; i < def->nimportant; i++) +if (def->important[i]) +VIR_FREE(def->important[i]); +} +VIR_FREE(def->important); + VIR_FREE(def); } @@ -10953,6 +10962,7 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt, VIR_AUTOFREE(char *) usage = NULL; VIR_AUTOFREE(char *) units = NULL; VIR_AUTOFREE(char *) model = NULL; +long n; ctxt->node = node; @@ -11001,6 +11011,12 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt, 1, ULLONG_MAX, false) < 0) goto error; +n = virXMLChildElementCount(node); +if (n > 0) { +if (VIR_ALLOC_N(def->important, n) < 0) +goto error; +} + cur = node->children; while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { @@ -11039,6 +11055,8 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt, if (virDomainVirtioOptionsParseXML(cur, &def->virtio) < 0) goto error; +} else if (virXMLNodeNameEqual(cur, "important")) { +def->important[def->nimportant++] = virXMLPropString(cur, "path"); } } cur = cur->next; @@ -11107,6 +11125,8 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt, goto error; cleanup: +if (def && def->important && !def->nimportant) +VIR_FREE(def->important); return def; error: @@ -24601,6 +24621,7 @@ virDomainFSDefFormat(virBufferPtr buf, const char *src = def->src->path; VIR_AUTOCLEAN(virBuffer) driverBuf = VIR_BUFFER_INITIALIZER; int ret = -1; +size_t i; if (!type) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -24689,6 +24710,15 @@ virDomainFSDefFormat(virBufferPtr buf, if (def->readonly) virBufferAddLit(buf, "\n"); +if (def->important) { +for (i = 0; i < def->nimportant; ++i) { +if (!def->important[i]) continue; +virBufferAddLit(buf, "important[i]); +virBufferAddLit(buf, "/>\n"); +} +} + if (virDomainDeviceInfoFormat(buf, &def->info, flags) < 0) goto cleanup; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 01c22d8cc3..9bbd66d932 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -818,6 +818,9 @@ struct _virDomainFSDef { unsigned long long space_soft_limit; /* in bytes */ bool symlinksResolved; virDomainVirtioOptionsPtr virtio; + +size_t nimportant; +char **important; }; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 50b4205267..2005ccadf8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -2732,6 +2732,7 @@ qemuBuildFSStr(virDomainFSDefPtr fs) virBuffer opt = VIR_BUFFER_INITIALIZER; const char *driver = qemuDomainFSDriverTypeToString(fs->fsdriver); const char *wrpolicy = virDomainFSWrpolicyTypeToString(fs->wrpolicy); +size_t i; if (fs->type != VIR_DOMAIN_FS_TYPE_MOUNT) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -2775,6 +2776,15 @@ qemuBuildFSStr(virDomainFSDefPtr fs) if (fs->readonly) virBufferAddLit(&opt, ",readonly"); +if (fs->important) { +for (i = 0; i < fs->nimportant; ++i) { +if (i == 0) +
[Qemu-devel] [PATCH v3 0/5] 9p: Fix file ID collisions
This is v3 of a proposed patch set for fixing file ID collisions with 9pfs. That's it from my side for now regarding this overall issue. I am waiting for your comments on this patch set before doing anything else. Patch 1 to 4 are identical to the previous version. New patch 5 adds an optional qemu virtfs device parameter "vii" (very important inode[s]) , which allows host admins to configure which fs device(s) should get the largest inode namespaces on guest. I will also send a (6th) patch against libvirt which allows to configure the "vii" feature of patch 5 via xml config instead of qemu command line argument. I am yet unresolved whether or not to use persistency for file IDs that is introduced with patch 3. After applying the entire patch set, the persistency feature is disabled by default at compile time, but you can enable it with macro QPP_TABLE_PERSISTENCY_LIMIT. Christian Schoenebeck (5): 9p: mitigates most QID path collisions 9P: trivial cleanup of QID path collision mitigation 9p: persistency of QID path beyond reboots / suspensions 9p: use variable length suffixes for inode mapping 9p: adds virtfs 'vii' device parameter fsdev/9p-marshal.h |6 +- fsdev/file-op-9p.h |1 + fsdev/qemu-fsdev-opts.c|6 + fsdev/qemu-fsdev.c |3 + hw/9pfs/9p.c | 1199 +++- hw/9pfs/9p.h | 173 +++ hw/9pfs/trace-events | 14 +- hw/9pfs/virtio-9p-device.c |1 + qemu-options.hx|5 +- vl.c |9 +- 10 files changed, 1378 insertions(+), 39 deletions(-) -- 2.11.0
[Qemu-devel] [PATCH v3 3/5] 9p: persistency of QID path beyond reboots / suspensions
This patch aims to keep QID path identical beyond the scope of reboots and guest suspensions. With the 1st patch alone the QID path of the same files might change after reboots / suspensions, since 9p would restart with empty qpp_table and the resulting QID path depends on the precise sequence of files being accessed on guest. The first patch should already avoid the vast majority of potential QID path collisions. However especially network services running on guest would still be prone to QID path issues when just using the 1st patch. For instance Samba is exporting file IDs to clients in the network and SMB cliens in the network will use those IDs to access and request changes on the file server. If guest is now interrupted in between, like it commonly happens on maintenance, e.g. software updates on host, then SMB clients in the network will continue working with old file IDs, which in turn leads to data corruption and data loss on the file server. Furthermore on SMB client side I also encountered severe misbehaviours in this case, for instance Macs accessing the file server would either start to hang or die with a kernel panic within seconds, since the smbx implementation on macOS heavily relies on file IDs being unique (within the context of a connection that is). So this patch here mitigates the remaining problem described above by storing the qpp_table persistently as extended attribute(s) on the exported root of the file system and automatically tries to restore the qpp_table i.e. after reboots / resumptions. This patch is aimed at real world scenarios, in which qpp_table will only ever get few dozens of entries (and none ever in qpf_table). So it is e.g. intentionally limited to only store qpp_table, not qpf_table; and so far I have not made optimizations, since in practice the qpf_table is really just tiny. Since there is currently no callback in qemu yet that would reliably be called on guest shutdowns, the table is stored on every new insertion for now. Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 315 ++- hw/9pfs/9p.h | 33 +++ 2 files changed, 343 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 2b893e25a1..29c6dfc68a 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,19 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include "qemu/crc32c.h" +#if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ +# include /* for XATTR_SIZE_MAX */ +#endif + +/* + * How many bytes may we store to fs per extended attribute value? + */ +#ifdef XATTR_SIZE_MAX +# define ATTR_MAX_SIZE XATTR_SIZE_MAX /* Linux only: 64kB limit in kernel */ +#else +# define ATTR_MAX_SIZE 65536 /* Most systems allow a bit more, so we take this as basis. */ +#endif int open_fd_hw; int total_open_fd; @@ -642,6 +655,285 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, return 0; } +static inline bool is_ro_export(FsContext *ctx) +{ +return ctx->export_flags & V9FS_RDONLY; +} + +/* + * Once qpp_table size exceeds this value, we no longer save + * the table persistently. See comment in v9fs_store_qpp_table() + */ +#define QPP_TABLE_PERSISTENCY_LIMIT 32768 + +/* Remove all user.virtfs.system.qidp.* xattrs from export root. */ +static void remove_qidp_xattr(FsContext *ctx) +{ +V9fsString name; +int i; + +/* just for a paranoid endless recursion sanity check */ +const ssize_t max_size = +sizeof(QppSrlzHeader) + +QPP_TABLE_PERSISTENCY_LIMIT * sizeof(QppEntryS); + +v9fs_string_init(&name); +for (i = 0; i * ATTR_MAX_SIZE < max_size; ++i) { +v9fs_string_sprintf(&name, "user.virtfs.system.qidp.%d", i); +if (lremovexattr(ctx->fs_root, name.data) < 0) +break; +} +v9fs_string_free(&name); +} + +/* Used to convert qpp hash table into continuous stream. */ +static void qpp_table_serialize(void *p, uint32_t h, void *up) +{ +const QppEntry *entry = (const QppEntry*) p; +QppSerialize *ser = (QppSerialize*) up; + +if (ser->error) +return; + +/* safety first */ +if (entry->qp_prefix - 1 >= ser->count) { +ser->error = -1; +return; +} + +ser->elements[entry->qp_prefix - 1] = (QppEntryS) { +.dev = entry->dev, +.ino_prefix = entry->ino_prefix +}; +ser->done++; +} + +/* + * Tries to store the current qpp_table as extended attribute(s) on the + * exported file system root with the goal to preserve identical qids + * beyond the scope of reboots. + */ +static void v9fs_store_qpp_table(V9fsState *s) +{ +FsContext *ctx = &s->ctx; +V9fsString name; +int i, res; +size_t size; +QppSrlzStream* stream; +QppSerialize ser; + +if (is_ro_export(ctx)) +return; + +/* + * Whenever we exceeded some certain (arbitrary) high qpp_table size we + * delete the stored ta
[Qemu-devel] [PATCH v3 1/5] 9p: mitigates most QID path collisions
This first patch here is an updated version of Antonios Motakis' original 4-patch set (using fixed length 16 bit prefixes), merged to one patch: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html * Updated to latest git master, specifically to new qht interface. * Merged the original 4 patches to this single patch. Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 4 +- hw/9pfs/9p.c | 200 - hw/9pfs/9p.h | 21 ++ 3 files changed, 204 insertions(+), 21 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index c8823d878f..d1ad3645c4 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -10,8 +10,8 @@ typedef struct V9fsString typedef struct V9fsQID { int8_t type; -int32_t version; -int64_t path; +uint32_t version; +uint64_t path; } V9fsQID; typedef struct V9fsStat diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 55821343e5..b9bbdcbaee 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,14 +572,135 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) -/* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) + +/* creative abuse of qemu_xxhash7, which is based on xxhash */ +static uint32_t qpp_hash(QppEntry e) { -size_t size; +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); +} + +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + +static bool qpp_cmp_func(const void *obj, const void *userp) +{ +const QppEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +} + +static bool qpf_cmp_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino == e2->ino); +} + +static void qp_table_remove(void *p, uint32_t h, void *up) +{ +g_free(p); +} + +static void qp_table_destroy(struct qht *ht) +{ +qht_iter(ht, qp_table_remove, NULL); +qht_destroy(ht); +} + +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_cmp_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +if (!val) { +return -ENOMEM; +} +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) + * to a unique QID path (64 bits). To avoid having to map and keep track + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus + * the device id to the 16 highest bits of the QID path. The 48 lowest bits + * of the QID path equal to the lowest bits of the inode number. + * + * This takes advantage of the fact that inode number are usually not + * random but allocated sequentially, so we have fewer items to keep + * track of. + */ +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QppEntry lookup = { +.dev = stbuf->st_dev, +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48) +}, *val; +uint32_t hash = qpp_hash(lookup); + +val = qht_lookup(&pdu->s->qpp_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_prefix_next == 0) { +/* we ran out of prefixes */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +if (!val) { +return -ENOMEM; +} +*val = lookup; + +/* new unique inode prefix and device combo */ +val->qp_prefix = pdu->s->qp_prefix_next++; +qht_insert(&pdu->s->qpp_table, val, hash, NULL); +} + +*path = ((uint64_t)val->qp_prefix << 48) | (stbuf->st_ino & QPATH_INO_MASK); +return 0; +} + +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) +{ +int err; + +/* map inode+device to qid path (fast pa
[Qemu-devel] [PATCH v3 4/5] 9p: use variable length suffixes for inode mapping
This patch introduces variable length suffixes for being used for inode mapping instead of the fixed 16 bit size prefixes of patch 1. With this patch the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with patch 1). I preserved both solutions in this patch though, so you can switch between them by adjusting P9_VARI_LENGTH_INODE_SUFFIXES, which probably also makes code comparison between the two more easy for now. Motivation: with patch 1 the inode numbers were so high that some network services had issues. Especially SMB seems to be a problem again due to varying implementations regarding SMB file IDs. I don't bother you with the details. Additionally this solution should be more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well. Which might also be beneficial for nested virtualization. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a paramter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 If somebody wants to play around with the details of the suffix generation, let me know and I will send you a small C source for you to play with. Additionally this patch disables persistency of file IDs (that I introduced with patch 3) at compile time by default for now. You can still enable it by setting QPP_TABLE_PERSISTENCY_LIMIT to some reasonable high value (e.g. 127). Reason: I am still unresolved what to do with this feature. The original motivation was to avoid file ID collisions with network services if the machine was interrupted for a short period. But what I did not consider was that device IDs might change on host, so after loading the tables from fs the qp tables would grow with irrelevant entries. Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 744 ++- hw/9pfs/9p.h | 121 +- 2 files changed, 805 insertions(+), 60 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 29c6dfc68a..003901a1e0 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -30,6 +30,10 @@ #if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ # include /* for XATTR_SIZE_MAX */ #endif +#include +#include +#include +#include /* * How many bytes may we store to fs per extended attribute value? @@ -585,6 +589,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) { +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) { +return ((uint64_t)mirror8bit( value& 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; +} + +#if P9_VARI_LENGTH_INODE_SUFFIXES + +/* Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller
[Qemu-devel] [PATCH v3 2/5] 9P: trivial cleanup of QID path collision mitigation
Addresses trivial changes regarding the previous patch as requested on the mailing list a while ago. * Removed unneccessary parantheses: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02661.html * Removed unneccessary g_malloc() result checks: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02814.html * Unsigned type changes: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02581.html Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 2 +- hw/9pfs/9p.c | 16 +--- hw/9pfs/trace-events | 14 +++--- 3 files changed, 13 insertions(+), 19 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index d1ad3645c4..8f3babb60a 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -9,7 +9,7 @@ typedef struct V9fsString typedef struct V9fsQID { -int8_t type; +uint8_t type; uint32_t version; uint64_t path; } V9fsQID; diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index b9bbdcbaee..2b893e25a1 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -587,13 +587,13 @@ static uint32_t qpf_hash(QpfEntry e) static bool qpp_cmp_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } static bool qpf_cmp_func(const void *obj, const void *userp) { const QpfEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino == e2->ino); +return e1->dev == e2->dev && e1->ino == e2->ino; } static void qp_table_remove(void *p, uint32_t h, void *up) @@ -630,9 +630,6 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode and device combo */ @@ -673,9 +670,6 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode prefix and device combo */ @@ -870,9 +864,9 @@ static int donttouch_stat(V9fsStat *stat) { if (stat->type == -1 && stat->dev == -1 && -stat->qid.type == -1 && -stat->qid.version == -1 && -stat->qid.path == -1 && +stat->qid.type == 0xff && +stat->qid.version == (uint32_t) -1 && +stat->qid.path == (uint64_t) -1 && stat->mode == -1 && stat->atime == -1 && stat->mtime == -1 && diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index c0a0a4ab5d..6964756922 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s" -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d length %"PRId64"}" v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) "tag %d id %d fid %d request_mask %"PRIu64 @@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t result_mask, uint32_t mod v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t nwnames) "tag %d id %d fid %d newfid %d nwnames %d" v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag %d id %d nwnames %d qids %p" v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d fid %d mode %d" -v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u" -v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_lcreat
[Qemu-devel] [PATCH v3 5/5] 9p: adds virtfs 'vii' device parameter
This patch adds an optional qemu device parameter "vii" (very important inode[s]), which accepts a colon separated list of pathes relative below the 9p export root, which shall be given the smallest inode suffix (and hence largest possible inode namespace on guest) according to the given order. That way host admins do have a way to fine tune the automatic inode remapping algorithm of 9pfs. Example: consider a file server running on guest with 9p export root /vm/fs on host and the mass storage file system being mounted on host on /vm/fs/var/shares and further consider some very frequently accessed temp file system being mounted as /vm/fs/tmp on host, then host admin could give the mass storage file system the highest inode namespace and the temp file system the 2nd highest inode namespace with qemu virtfs argument: vii=/var/shares:/tmp The root file system (/vm/fs on host, / on guest) would then hence get the 3rd highest inode namespace on guest automatically (since not mentioned in vii list). For completeness I will also send a (6th) patch against libvirt which allows to conigure this feature via libvirt/virsh xml config instead. Additionally this patch addresses: * Removes unnecessary header includes in 9p.c (that I used for debugging). * Fixes a compiler error of previous patch in 9p.c if macro QPP_TABLE_PERSISTENCY_LIMIT was enabled and latest git master head version used. This patch probably needs some cleanup. But I will wait for your coarse feedback before doing anything else. For instance the device property that I added in virtio-9p-device.c is actually not used ATM. My initial assumption was this to be used by the qemu options visitors automatically for command line arguments, then I noticed that these properties are rather intended for passing parameters from guest instead. I am unresolved though whether this parameter shall be configurable from guest or not, so I have not finished that yet. Then the user manual changes need to be finished and obviously you might have a better idea for an argument name than "vii" ;-) Signed-off-by: Christian Schoenebeck --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev-opts.c| 6 ++ fsdev/qemu-fsdev.c | 3 + hw/9pfs/9p.c | 138 +++-- hw/9pfs/9p.h | 6 ++ hw/9pfs/virtio-9p-device.c | 1 + qemu-options.hx| 5 +- vl.c | 9 ++- 8 files changed, 124 insertions(+), 45 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 3fa062b39f..2645faa113 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -77,6 +77,7 @@ typedef struct FsDriverEntry { FsThrottle fst; mode_t fmode; mode_t dmode; +char *vii; } FsDriverEntry; struct FsContext { diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c index 7c31af..5ae02887a5 100644 --- a/fsdev/qemu-fsdev-opts.c +++ b/fsdev/qemu-fsdev-opts.c @@ -44,6 +44,9 @@ static QemuOptsList qemu_fsdev_opts = { }, { .name = "dmode", .type = QEMU_OPT_NUMBER, +}, { +.name = "vii", +.type = QEMU_OPT_STRING, }, THROTTLE_OPTS, @@ -87,6 +90,9 @@ static QemuOptsList qemu_virtfs_opts = { }, { .name = "dmode", .type = QEMU_OPT_NUMBER, +}, { +.name = "vii", +.type = QEMU_OPT_STRING, }, { /*End of list */ } diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 54cb36a212..29133b4a15 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) const char *fsdev_id = qemu_opts_id(opts); const char *fsdriver = qemu_opt_get(opts, "fsdriver"); const char *writeout = qemu_opt_get(opts, "writeout"); +const char *vii = qemu_opt_get(opts, "vii"); bool ro = qemu_opt_get_bool(opts, "readonly", 0); if (!fsdev_id) { @@ -59,6 +60,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) fsle = g_malloc0(sizeof(*fsle)); fsle->fse.fsdev_id = g_strdup(fsdev_id); +fsle->fse.vii = g_strdup(vii); fsle->fse.ops = FsDrivers[i].ops; if (writeout) { if (!strcmp(writeout, "immediate")) { @@ -74,6 +76,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp) if (fsle->fse.ops->parse_opts) { if (fsle->fse.ops->parse_opts(opts, &fsle->fse, errp)) { g_free(fsle->fse.fsdev_id); +g_free(fsle->fse.vii); g_free(fsle); return -1; } diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 003901a1e0..b4c4c66a3b 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -30,10 +30,8 @@ #if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ # include /* for XATTR_SIZE_MAX */ #endif -#include -#include -#include #include +#include "qemu/cutils.h" /* * How many bytes may we store to fs p
[Qemu-devel] [PATCH v2 4/4] 9p: use variable length suffixes for inode mapping
This patch introduces variable length suffixes for being used for inode mapping instead of the fixed 16 bit size prefixes of patch 1. With this patch the inode numbers on guest will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with patch 1). I preserved both solutions in this patch though, so you can switch between them by adjusting P9_VARI_LENGTH_INODE_SUFFIXES, which probably also makes code comparison between the two more easy for now. Motivation: with patch 1 the inode numbers were so high that some network services had issues. Especially SMB seems to be a problem again due to varying implementations regarding SMB file IDs. I don't bother you with the details. Additionally this solution should be more efficient, since inode numbers in practice can take almost their entire 64 bit range on guest as well. Which might also be beneficial for nested virtualization. The "Exponential Golomb" algorithm is used as basis for generating the variable length suffixes. The algorithm has a paramter k which controls the distribution of bits on increasing indeces (minimum bits at low index vs. maximum bits at high index). With k=0 the generated suffixes look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [1] (1 bits) 2 [10] -> [010] (3 bits) 3 [11] -> [110] (3 bits) 4 [100] -> [00100] (5 bits) 5 [101] -> [10100] (5 bits) 6 [110] -> [01100] (5 bits) 7 [111] -> [11100] (5 bits) 8 [1000] -> [0001000] (7 bits) 9 [1001] -> [1001000] (7 bits) 10 [1010] -> [0101000] (7 bits) 11 [1011] -> [1101000] (7 bits) 12 [1100] -> [0011000] (7 bits) ... 65533 [1101] -> [1011000] (31 bits) 65534 [1110] -> [0111000] (31 bits) 65535 [] -> [000] (31 bits) Hence minBits=1 maxBits=31 And with k=5 they would look like: Index Dec/Bin -> Generated Suffix Bin 1 [1] -> [01] (6 bits) 2 [10] -> [11] (6 bits) 3 [11] -> [010001] (6 bits) 4 [100] -> [110001] (6 bits) 5 [101] -> [001001] (6 bits) 6 [110] -> [101001] (6 bits) 7 [111] -> [011001] (6 bits) 8 [1000] -> [111001] (6 bits) 9 [1001] -> [000101] (6 bits) 10 [1010] -> [100101] (6 bits) 11 [1011] -> [010101] (6 bits) 12 [1100] -> [110101] (6 bits) ... 65533 [1101] -> [001110001000] (28 bits) 65534 [1110] -> [101110001000] (28 bits) 65535 [] -> [00001000] (28 bits) Hence minBits=6 maxBits=28 If somebody wants to play around with the details of the suffix generation, let me know and I will send you a small C source for you to play with. Additionally this patch disables persistency of file IDs (that I introduced with patch 3) at compile time by default for now. You can still enable it by setting QPP_TABLE_PERSISTENCY_LIMIT to some reasonable high value (e.g. 127). Reason: I am still unresolved what to do with this feature. The original motivation was to avoid file ID collisions with network services if the machine was interrupted for a short period. But what I did not consider was that device IDs might change on host, so after loading the tables from fs the qp tables would grow with irrelevant entries. Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 744 ++- hw/9pfs/9p.h | 121 +- 2 files changed, 805 insertions(+), 60 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 29c6dfc68a..003901a1e0 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -30,6 +30,10 @@ #if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ # include /* for XATTR_SIZE_MAX */ #endif +#include +#include +#include +#include /* * How many bytes may we store to fs per extended attribute value? @@ -585,6 +589,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) +/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. */ +static inline uint8_t mirror8bit(uint8_t byte) { +return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; +} + +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */ +static inline uint64_t mirror64bit(uint64_t value) { +return ((uint64_t)mirror8bit( value& 0xff) << 56) | + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; +} + +#if P9_VARI_LENGTH_INODE_SUFFIXES + +/* Parameter k for the Exponential Golomb algorihm to be used. + * + * The smaller this value, the smaller
[Qemu-devel] [PATCH v2 3/4] 9p: persistency of QID path beyond reboots / suspensions
This patch aims to keep QID path identical beyond the scope of reboots and guest suspensions. With the 1st patch alone the QID path of the same files might change after reboots / suspensions, since 9p would restart with empty qpp_table and the resulting QID path depends on the precise sequence of files being accessed on guest. The first patch should already avoid the vast majority of potential QID path collisions. However especially network services running on guest would still be prone to QID path issues when just using the 1st patch. For instance Samba is exporting file IDs to clients in the network and SMB cliens in the network will use those IDs to access and request changes on the file server. If guest is now interrupted in between, like it commonly happens on maintenance, e.g. software updates on host, then SMB clients in the network will continue working with old file IDs, which in turn leads to data corruption and data loss on the file server. Furthermore on SMB client side I also encountered severe misbehaviours in this case, for instance Macs accessing the file server would either start to hang or die with a kernel panic within seconds, since the smbx implementation on macOS heavily relies on file IDs being unique (within the context of a connection that is). So this patch here mitigates the remaining problem described above by storing the qpp_table persistently as extended attribute(s) on the exported root of the file system and automatically tries to restore the qpp_table i.e. after reboots / resumptions. This patch is aimed at real world scenarios, in which qpp_table will only ever get few dozens of entries (and none ever in qpf_table). So it is e.g. intentionally limited to only store qpp_table, not qpf_table; and so far I have not made optimizations, since in practice the qpf_table is really just tiny. Since there is currently no callback in qemu yet that would reliably be called on guest shutdowns, the table is stored on every new insertion for now. Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 315 ++- hw/9pfs/9p.h | 33 +++ 2 files changed, 343 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 2b893e25a1..29c6dfc68a 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,19 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include "qemu/crc32c.h" +#if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ +# include /* for XATTR_SIZE_MAX */ +#endif + +/* + * How many bytes may we store to fs per extended attribute value? + */ +#ifdef XATTR_SIZE_MAX +# define ATTR_MAX_SIZE XATTR_SIZE_MAX /* Linux only: 64kB limit in kernel */ +#else +# define ATTR_MAX_SIZE 65536 /* Most systems allow a bit more, so we take this as basis. */ +#endif int open_fd_hw; int total_open_fd; @@ -642,6 +655,285 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, return 0; } +static inline bool is_ro_export(FsContext *ctx) +{ +return ctx->export_flags & V9FS_RDONLY; +} + +/* + * Once qpp_table size exceeds this value, we no longer save + * the table persistently. See comment in v9fs_store_qpp_table() + */ +#define QPP_TABLE_PERSISTENCY_LIMIT 32768 + +/* Remove all user.virtfs.system.qidp.* xattrs from export root. */ +static void remove_qidp_xattr(FsContext *ctx) +{ +V9fsString name; +int i; + +/* just for a paranoid endless recursion sanity check */ +const ssize_t max_size = +sizeof(QppSrlzHeader) + +QPP_TABLE_PERSISTENCY_LIMIT * sizeof(QppEntryS); + +v9fs_string_init(&name); +for (i = 0; i * ATTR_MAX_SIZE < max_size; ++i) { +v9fs_string_sprintf(&name, "user.virtfs.system.qidp.%d", i); +if (lremovexattr(ctx->fs_root, name.data) < 0) +break; +} +v9fs_string_free(&name); +} + +/* Used to convert qpp hash table into continuous stream. */ +static void qpp_table_serialize(void *p, uint32_t h, void *up) +{ +const QppEntry *entry = (const QppEntry*) p; +QppSerialize *ser = (QppSerialize*) up; + +if (ser->error) +return; + +/* safety first */ +if (entry->qp_prefix - 1 >= ser->count) { +ser->error = -1; +return; +} + +ser->elements[entry->qp_prefix - 1] = (QppEntryS) { +.dev = entry->dev, +.ino_prefix = entry->ino_prefix +}; +ser->done++; +} + +/* + * Tries to store the current qpp_table as extended attribute(s) on the + * exported file system root with the goal to preserve identical qids + * beyond the scope of reboots. + */ +static void v9fs_store_qpp_table(V9fsState *s) +{ +FsContext *ctx = &s->ctx; +V9fsString name; +int i, res; +size_t size; +QppSrlzStream* stream; +QppSerialize ser; + +if (is_ro_export(ctx)) +return; + +/* + * Whenever we exceeded some certain (arbitrary) high qpp_table size we + * delete the stored ta
[Qemu-devel] [PATCH v2 2/4] 9P: trivial cleanup of QID path collision mitigation
Addresses trivial changes regarding the previous patch as requested on the mailing list a while ago. * Removed unneccessary parantheses: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02661.html * Removed unneccessary g_malloc() result checks: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02814.html * Unsigned type changes: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02581.html Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 2 +- hw/9pfs/9p.c | 16 +--- hw/9pfs/trace-events | 14 +++--- 3 files changed, 13 insertions(+), 19 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index d1ad3645c4..8f3babb60a 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -9,7 +9,7 @@ typedef struct V9fsString typedef struct V9fsQID { -int8_t type; +uint8_t type; uint32_t version; uint64_t path; } V9fsQID; diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index b9bbdcbaee..2b893e25a1 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -587,13 +587,13 @@ static uint32_t qpf_hash(QpfEntry e) static bool qpp_cmp_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } static bool qpf_cmp_func(const void *obj, const void *userp) { const QpfEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino == e2->ino); +return e1->dev == e2->dev && e1->ino == e2->ino; } static void qp_table_remove(void *p, uint32_t h, void *up) @@ -630,9 +630,6 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode and device combo */ @@ -673,9 +670,6 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode prefix and device combo */ @@ -870,9 +864,9 @@ static int donttouch_stat(V9fsStat *stat) { if (stat->type == -1 && stat->dev == -1 && -stat->qid.type == -1 && -stat->qid.version == -1 && -stat->qid.path == -1 && +stat->qid.type == 0xff && +stat->qid.version == (uint32_t) -1 && +stat->qid.path == (uint64_t) -1 && stat->mode == -1 && stat->atime == -1 && stat->mtime == -1 && diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index c0a0a4ab5d..6964756922 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s" -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d length %"PRId64"}" v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) "tag %d id %d fid %d request_mask %"PRIu64 @@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t result_mask, uint32_t mod v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t nwnames) "tag %d id %d fid %d newfid %d nwnames %d" v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag %d id %d nwnames %d qids %p" v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d fid %d mode %d" -v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u" -v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_lcreat
[Qemu-devel] [PATCH v2 0/4] 9p: Fix file ID collisions
Hi! This is v2 of a proposed patch set for fixing file ID collisions with 9pfs. Patch 1 to 3 are identical to the previous version. New in this v2 is patch 4 which introduces variable length suffixes for inode mapping instead of fixed length prefixes. Also: patch 4 disables file ID persistency at compile time by default for now, since I am yet unresolved about details of that persistency. Christian Schoenebeck (4): 9p: mitigates most QID path collisions 9P: trivial cleanup of QID path collision mitigation 9p: persistency of QID path beyond reboots / suspensions 9p: use variable length suffixes for inode mapping fsdev/9p-marshal.h |6 +- hw/9pfs/9p.c | 1145 -- hw/9pfs/9p.h | 167 hw/9pfs/trace-events | 14 +- 4 files changed, 1296 insertions(+), 36 deletions(-) -- 2.11.0
[Qemu-devel] [PATCH v2 1/4] 9p: mitigates most QID path collisions
This first patch here is an updated version of Antonios Motakis' original 4-patch set, merged to one patch: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html * Updated to latest git master, specifically to new qht interface. * Merged the original 4 patches to this single patch. Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 4 +- hw/9pfs/9p.c | 200 - hw/9pfs/9p.h | 21 ++ 3 files changed, 204 insertions(+), 21 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index c8823d878f..d1ad3645c4 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -10,8 +10,8 @@ typedef struct V9fsString typedef struct V9fsQID { int8_t type; -int32_t version; -int64_t path; +uint32_t version; +uint64_t path; } V9fsQID; typedef struct V9fsStat diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 55821343e5..b9bbdcbaee 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,14 +572,135 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) -/* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) + +/* creative abuse of qemu_xxhash7, which is based on xxhash */ +static uint32_t qpp_hash(QppEntry e) { -size_t size; +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); +} + +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + +static bool qpp_cmp_func(const void *obj, const void *userp) +{ +const QppEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +} + +static bool qpf_cmp_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino == e2->ino); +} + +static void qp_table_remove(void *p, uint32_t h, void *up) +{ +g_free(p); +} + +static void qp_table_destroy(struct qht *ht) +{ +qht_iter(ht, qp_table_remove, NULL); +qht_destroy(ht); +} + +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_cmp_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +if (!val) { +return -ENOMEM; +} +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) + * to a unique QID path (64 bits). To avoid having to map and keep track + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus + * the device id to the 16 highest bits of the QID path. The 48 lowest bits + * of the QID path equal to the lowest bits of the inode number. + * + * This takes advantage of the fact that inode number are usually not + * random but allocated sequentially, so we have fewer items to keep + * track of. + */ +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QppEntry lookup = { +.dev = stbuf->st_dev, +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48) +}, *val; +uint32_t hash = qpp_hash(lookup); + +val = qht_lookup(&pdu->s->qpp_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_prefix_next == 0) { +/* we ran out of prefixes */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +if (!val) { +return -ENOMEM; +} +*val = lookup; + +/* new unique inode prefix and device combo */ +val->qp_prefix = pdu->s->qp_prefix_next++; +qht_insert(&pdu->s->qpp_table, val, hash, NULL); +} + +*path = ((uint64_t)val->qp_prefix << 48) | (stbuf->st_ino & QPATH_INO_MASK); +return 0; +} + +static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp) +{ +int err; + +/* map inode+device to qid path (fast path) */ +err = qid_path_prefixmap(
Re: [Qemu-devel] virtfs/9p duplicate inodes
On Dienstag, 23. April 2019 13:44:42 CEST Greg Kurz wrote: > Hi Christian, > > Sorry for the late response. I'm quite busy on other topics these days... Absolutely no problem. We probably all share the same fate of constant work load on endless battle fields popping up everywhere. :-) > > I intended to extend Antonios' patch set regarding 9p QID collisions with > > the goal to make the ids constant beyond reboots by storing the qpp_table > > as fs xattr. > > Hmm... why would you do that ? Even if some filesystems do have persistant > inode numbers, it isn't mandatory AFAIK. Mja, unfortunately I found that necessary, see my comment in patch 3 for the background / reason: https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03755.html And sorry that my patch set did not go out as one thread today. I vow emendation. ;-) Best regards, Christian Schoenebeck
[Qemu-devel] [PATCH 3/3] 9p: persistency of QID path beyond reboots / suspensions
This patch aims to keep QID path identical beyond the scope of reboots and guest suspensions. With the 1st patch alone the QID path of the same files might change after reboots / suspensions, since 9p would restart with empty qpp_table and the resulting QID path depends on the precise sequence of files being accessed on guest. The first patch should already avoid the vast majority of potential QID path collisions. However especially network services running on guest would still be prone to QID path issues when just using the 1st patch. For instance Samba is exporting file IDs to clients in the network and SMB cliens in the network will use those IDs to access and request changes on the file server. If guest is now interrupted in between, like it commonly happens on maintenance, e.g. software updates on host, then SMB clients in the network will continue working with old file IDs, which in turn leads to data corruption and data loss on the file server. Furthermore on SMB client side I also encountered severe misbehaviours in this case, for instance Macs accessing the file server would either start to hang or die with a kernel panic within seconds, since the smbx implementation on macOS heavily relies on file IDs being unique (within the context of a connection that is). So this patch here mitigates the remaining problem described above by storing the qpp_table persistently as extended attribute(s) on the exported root of the file system and automatically tries to restore the qpp_table i.e. after reboots / resumptions. This patch is aimed at real world scenarios, in which qpp_table will only ever get few dozens of entries (and none ever in qpf_table). So it is e.g. intentionally limited to only store qpp_table, not qpf_table; and so far I have not made optimizations, since in practice the qpf_table is really just tiny. Since there is currently no callback in qemu yet that would reliably be called on guest shutdowns, the table is stored on every new insertion for now. Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p.c | 315 ++- hw/9pfs/9p.h | 33 +++ 2 files changed, 343 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 2b893e25a1..29c6dfc68a 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -26,6 +26,19 @@ #include "migration/blocker.h" #include "sysemu/qtest.h" #include "qemu/xxhash.h" +#include "qemu/crc32c.h" +#if defined(__linux__) /* TODO: This should probably go into osdep.h instead */ +# include /* for XATTR_SIZE_MAX */ +#endif + +/* + * How many bytes may we store to fs per extended attribute value? + */ +#ifdef XATTR_SIZE_MAX +# define ATTR_MAX_SIZE XATTR_SIZE_MAX /* Linux only: 64kB limit in kernel */ +#else +# define ATTR_MAX_SIZE 65536 /* Most systems allow a bit more, so we take this as basis. */ +#endif int open_fd_hw; int total_open_fd; @@ -642,6 +655,285 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, return 0; } +static inline bool is_ro_export(FsContext *ctx) +{ +return ctx->export_flags & V9FS_RDONLY; +} + +/* + * Once qpp_table size exceeds this value, we no longer save + * the table persistently. See comment in v9fs_store_qpp_table() + */ +#define QPP_TABLE_PERSISTENCY_LIMIT 32768 + +/* Remove all user.virtfs.system.qidp.* xattrs from export root. */ +static void remove_qidp_xattr(FsContext *ctx) +{ +V9fsString name; +int i; + +/* just for a paranoid endless recursion sanity check */ +const ssize_t max_size = +sizeof(QppSrlzHeader) + +QPP_TABLE_PERSISTENCY_LIMIT * sizeof(QppEntryS); + +v9fs_string_init(&name); +for (i = 0; i * ATTR_MAX_SIZE < max_size; ++i) { +v9fs_string_sprintf(&name, "user.virtfs.system.qidp.%d", i); +if (lremovexattr(ctx->fs_root, name.data) < 0) +break; +} +v9fs_string_free(&name); +} + +/* Used to convert qpp hash table into continuous stream. */ +static void qpp_table_serialize(void *p, uint32_t h, void *up) +{ +const QppEntry *entry = (const QppEntry*) p; +QppSerialize *ser = (QppSerialize*) up; + +if (ser->error) +return; + +/* safety first */ +if (entry->qp_prefix - 1 >= ser->count) { +ser->error = -1; +return; +} + +ser->elements[entry->qp_prefix - 1] = (QppEntryS) { +.dev = entry->dev, +.ino_prefix = entry->ino_prefix +}; +ser->done++; +} + +/* + * Tries to store the current qpp_table as extended attribute(s) on the + * exported file system root with the goal to preserve identical qids + * beyond the scope of reboots. + */ +static void v9fs_store_qpp_table(V9fsState *s) +{ +FsContext *ctx = &s->ctx; +V9fsString name; +int i, res; +size_t size; +QppSrlzStream* stream; +QppSerialize ser; + +if (is_ro_export(ctx)) +return; + +/* + * Whenever we exceeded some certain (arbitrary) high qpp_table size we + * delete the stored ta
[Qemu-devel] [PATCH 2/3] 9P: trivial cleanup of QID path collision mitigation
Addresses trivial changes regarding the previous patch as requested on the mailing list a while ago. * Removed unneccessary parantheses: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02661.html * Removed unneccessary g_malloc() result checks: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02814.html * Unsigned type changes: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02581.html Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 2 +- hw/9pfs/9p.c | 16 +--- hw/9pfs/trace-events | 14 +++--- 3 files changed, 13 insertions(+), 19 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index d1ad3645c4..8f3babb60a 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -9,7 +9,7 @@ typedef struct V9fsString typedef struct V9fsQID { -int8_t type; +uint8_t type; uint32_t version; uint64_t path; } V9fsQID; diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index b9bbdcbaee..2b893e25a1 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -587,13 +587,13 @@ static uint32_t qpf_hash(QpfEntry e) static bool qpp_cmp_func(const void *obj, const void *userp) { const QppEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix; } static bool qpf_cmp_func(const void *obj, const void *userp) { const QpfEntry *e1 = obj, *e2 = userp; -return (e1->dev == e2->dev) && (e1->ino == e2->ino); +return e1->dev == e2->dev && e1->ino == e2->ino; } static void qp_table_remove(void *p, uint32_t h, void *up) @@ -630,9 +630,6 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode and device combo */ @@ -673,9 +670,6 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, } val = g_malloc0(sizeof(QppEntry)); -if (!val) { -return -ENOMEM; -} *val = lookup; /* new unique inode prefix and device combo */ @@ -870,9 +864,9 @@ static int donttouch_stat(V9fsStat *stat) { if (stat->type == -1 && stat->dev == -1 && -stat->qid.type == -1 && -stat->qid.version == -1 && -stat->qid.path == -1 && +stat->qid.type == 0xff && +stat->qid.version == (uint32_t) -1 && +stat->qid.path == (uint64_t) -1 && stat->mode == -1 && stat->atime == -1 && stat->mtime == -1 && diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index c0a0a4ab5d..6964756922 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -6,7 +6,7 @@ v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag %d id %d msize %d version %s" v9fs_attach(uint16_t tag, uint8_t id, int32_t fid, int32_t afid, char* uname, char* aname) "tag %u id %u fid %d afid %d uname %s aname %s" -v9fs_attach_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path) "tag %d id %d type %d version %d path %"PRId64 +v9fs_attach_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path) "tag %d id %d type %d version %d path %"PRId64 v9fs_stat(uint16_t tag, uint8_t id, int32_t fid) "tag %d id %d fid %d" v9fs_stat_return(uint16_t tag, uint8_t id, int32_t mode, int32_t atime, int32_t mtime, int64_t length) "tag %d id %d stat={mode %d atime %d mtime %d length %"PRId64"}" v9fs_getattr(uint16_t tag, uint8_t id, int32_t fid, uint64_t request_mask) "tag %d id %d fid %d request_mask %"PRIu64 @@ -14,9 +14,9 @@ v9fs_getattr_return(uint16_t tag, uint8_t id, uint64_t result_mask, uint32_t mod v9fs_walk(uint16_t tag, uint8_t id, int32_t fid, int32_t newfid, uint16_t nwnames) "tag %d id %d fid %d newfid %d nwnames %d" v9fs_walk_return(uint16_t tag, uint8_t id, uint16_t nwnames, void* qids) "tag %d id %d nwnames %d qids %p" v9fs_open(uint16_t tag, uint8_t id, int32_t fid, int32_t mode) "tag %d id %d fid %d mode %d" -v9fs_open_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_open_return(uint16_t tag, uint8_t id, uint8_t type, uint32_t version, uint64_t path, int iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" v9fs_lcreate(uint16_t tag, uint8_t id, int32_t dfid, int32_t flags, int32_t mode, uint32_t gid) "tag %d id %d dfid %d flags %d mode %d gid %u" -v9fs_lcreate_return(uint16_t tag, uint8_t id, int8_t type, int32_t version, int64_t path, int32_t iounit) "tag %d id %d qid={type %d version %d path %"PRId64"} iounit %d" +v9fs_lcreat
[Qemu-devel] [PATCH 1/3] 9p: mitigates most QID path collisions
I am attempting to revive Antonios Motakis' effort to fix the current file ID collisions with 9p. My patch set does not fix all the concerns you had with his original patch set, especially if there is still some (very rare) case of QID path collision the affected files are still visible on guest. However I hope we can bring this overall issue forward, because with the current 9p implementation it is almost inevitable to end up with QID path collisions, which in turn causes very severe misbehaviours like data corruption and data loss on guest. On the other hand the yet unresolved issues that you noted are IMO rather theoretical issues. Because in practice you usually will have around not more than one or two dozens of entries in qpp_table (and not a single entry in qpf_table). So this first patch here is an updated version of Antonios Motakis' original 4-patch set, merged to one patch: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html * Updated to latest git master, specifically to new qht interface. * Merged the original 4 patches to this single patch. Signed-off-by: Christian Schoenebeck --- fsdev/9p-marshal.h | 4 +- hw/9pfs/9p.c | 200 - hw/9pfs/9p.h | 21 ++ 3 files changed, 204 insertions(+), 21 deletions(-) diff --git a/fsdev/9p-marshal.h b/fsdev/9p-marshal.h index c8823d878f..d1ad3645c4 100644 --- a/fsdev/9p-marshal.h +++ b/fsdev/9p-marshal.h @@ -10,8 +10,8 @@ typedef struct V9fsString typedef struct V9fsQID { int8_t type; -int32_t version; -int64_t path; +uint32_t version; +uint64_t path; } V9fsQID; typedef struct V9fsStat diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 55821343e5..b9bbdcbaee 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -25,6 +25,7 @@ #include "trace.h" #include "migration/blocker.h" #include "sysemu/qtest.h" +#include "qemu/xxhash.h" int open_fd_hw; int total_open_fd; @@ -571,14 +572,135 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) P9_STAT_MODE_NAMED_PIPE | \ P9_STAT_MODE_SOCKET) -/* This is the algorithm from ufs in spfs */ -static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp) + +/* creative abuse of qemu_xxhash7, which is based on xxhash */ +static uint32_t qpp_hash(QppEntry e) { -size_t size; +return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0); +} + +static uint32_t qpf_hash(QpfEntry e) +{ +return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); +} + +static bool qpp_cmp_func(const void *obj, const void *userp) +{ +const QppEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino_prefix == e2->ino_prefix); +} + +static bool qpf_cmp_func(const void *obj, const void *userp) +{ +const QpfEntry *e1 = obj, *e2 = userp; +return (e1->dev == e2->dev) && (e1->ino == e2->ino); +} + +static void qp_table_remove(void *p, uint32_t h, void *up) +{ +g_free(p); +} + +static void qp_table_destroy(struct qht *ht) +{ +qht_iter(ht, qp_table_remove, NULL); +qht_destroy(ht); +} + +static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QpfEntry lookup = { +.dev = stbuf->st_dev, +.ino = stbuf->st_ino +}, *val; +uint32_t hash = qpf_hash(lookup); + +/* most users won't need the fullmap, so init the table lazily */ +if (!pdu->s->qpf_table.map) { +qht_init(&pdu->s->qpf_table, qpf_cmp_func, 1 << 16, QHT_MODE_AUTO_RESIZE); +} + +val = qht_lookup(&pdu->s->qpf_table, &lookup, hash); + +if (!val) { +if (pdu->s->qp_fullpath_next == 0) { +/* no more files can be mapped :'( */ +return -ENFILE; +} + +val = g_malloc0(sizeof(QppEntry)); +if (!val) { +return -ENOMEM; +} +*val = lookup; + +/* new unique inode and device combo */ +val->path = pdu->s->qp_fullpath_next++; +pdu->s->qp_fullpath_next &= QPATH_INO_MASK; +qht_insert(&pdu->s->qpf_table, val, hash, NULL); +} + +*path = val->path; +return 0; +} + +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits) + * to a unique QID path (64 bits). To avoid having to map and keep track + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus + * the device id to the 16 highest bits of the QID path. The 48 lowest bits + * of the QID path equal to the lowest bits of the inode number. + * + * This takes advantage of the fact that inode number are usually not + * random but allocated sequentially, so we have fewer items to keep + * track of. + */ +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, +uint64_t *path) +{ +QppEntry lookup = { +.dev = stbuf->st_dev, +.ino_prefix = (uint16_t) (stbuf->st_ino >> 48) +}, *val; +uint32_t hash = qpp_hash(lookup); + +
Re: [Qemu-devel] virtfs/9p duplicate inodes
On Samstag, 30. März 2019 21:01:28 CEST Christian Schoenebeck wrote: > On Samstag, 30. März 2019 17:47:51 CET Greg Kurz wrote: > > Maybe have a look at this tentative to fix QID collisions: > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html [snip] > Question: so far I just had a look at that patch set, but haven't tried it > yet. Am I correct that the inode numbers (of the same file) would actually > change on guest side with every reboot (i.e. depending on the precise > sequence individual files would be accessed by guest after each reboot)? I intended to extend Antonios' patch set regarding 9p QID collisions with the goal to make the ids constant beyond reboots by storing the qpp_table as fs xattr. My plan was to load the qpp_table in v9fs_device_realize_common() and save the table only once in v9fs_device_unrealize_common(), instead of storing the table on every new insertion. The problem though is that none of the 9p unrealize functions is called on guest shutdowns. Is there any callback that is guaranteed to be called on guest shutdowns? Best regards, Christian Schoenebeck
Re: [Qemu-devel] virtfs/9p duplicate inodes
On Samstag, 30. März 2019 17:47:51 CET Greg Kurz wrote: > Maybe have a look at this tentative to fix QID collisions: > > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html Interesting! My idea would have been different, based on my presumption that the amount of devices on a 9p exported tree should be very small, whereas the amount of total files/dirs could be very high, I would have probably encoded as many bits as possible for the inode part and would have consecutively generated short device IDs as prefixes instead. But I see that you already have a consensus to go into the direction suggested by Antonios. Question: so far I just had a look at that patch set, but haven't tried it yet. Am I correct that the inode numbers (of the same file) would actually change on guest side with every reboot (i.e. depending on the precise sequence individual files would be accessed by guest after each reboot)? Is there still anything substantial missing in that patch set? Best regards, Christian Schoenebeck
[Qemu-devel] virtfs/9p duplicate inodes
Hi list, currently the 9p implementation in qemu causes inode number collisions on guest OS level if the directory exported by 9p consists on host level of more than one file system being mounted inside that exported directory tree; which leads to severe misbehaviours on guest side with all kinds of applications. The problem here is that when stat-ing a file/dir on guest side, 9p always returns the same device id for every file/dir of the exported directory (the virtual 9p device itself), while at the same time it simply provides to the guest the inode number of the requested file/dir from host side (+2). Since the combination of (device-id, inode-nr) must be unique per file on any system, this needs to be fixed. Any suggestions? As far as I can see it, either a) different device IDs should be returned to the guest for actually different file systems on host level, or b) 9p should remap the inode numbers appropriately such that the inode number range on guest side would always be isolated between the individual file systems accessed on host side. Best regards, Christian Schoenebeck