On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 12:49, Max Reitz wrote:
On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:
ping. Do we want it for 6.0?
I’d rather wait. I think the conclusion was that guests shouldn’t hit
this because they serialize discards?
I think, that we never had bugs, so we of course can wait.
There’s also something Kevin wrote on IRC a couple of weeks ago, for
which I had hoped he’d sent an email but I don’t think he did, so I’ll
try to remember and paraphrase as well as I can...
He basically asked whether it wouldn’t be conceptually simpler to take
a reference to some cluster in get_cluster_offset() and later release
it with a to-be-added put_cluster_offset().
He also noted that reading is problematic, too, because if you read a
discarded and reused cluster, this might result in an information leak
(some guest application might be able to read data it isn’t allowed to
read); that’s why making get_cluster_offset() the point of locking
clusters against discarding would be better.
Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6]
qcow2: fix parallel rewrite and discard (lockless))
This would probably work with both of your solutions. For the
in-memory solutions, you’d take a refcount to an actual cluster; in
the CoRwLock solution, you’d take that lock.
What do you think?
Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and
qcow2_inflight_writes_dec() to
get_cluster_offset()/put_cluster_offset(), to make it more native to use
for read operations as well?
Hm. Our discussion wasn’t so detailed.
I interpreted it to mean all qcow2 functions that find an offset to a
qcow2 cluster, namely qcow2_get_host_offset(),
qcow2_alloc_host_offset(), and qcow2_alloc_compressed_cluster_offset().
When those functions return an offset (in)to some cluster, that cluster
(or the image as a whole) should be locked against discards. Every
offset received this way would require an accompanying
qcow2_put_host_offset().
Or to update any kind of "getting cluster offset" in the whole qcow2
driver to take a kind of "dynamic reference count" by
get_cluster_offset() and then call corresponding put() somewhere? In
this case I'm afraid it's a lot more work..
Hm, really? I would have assumed we need to do some locking in all
functions that get a cluster offset this way, so it should be less work
to take the lock in the functions they invoke to get the offset.
It would be also the problem
that a lot of paths in qcow2 are not in coroutine and don't even take
s->lock when they actually should.
I’m not sure what you mean here, because all functions that invoke any
of the three functions I listed above are coroutine_fns (or, well, I
didn’t look it up, but they all have *_co_* in their name).
This will also mean that we do same
job as normal qcow2 refcounts already do: no sense in keeping additional
"dynamic refcount" for L2 table cluster while reading it, as we already
have non-zero qcow2 normal refcount for it..
I’m afraid I don’t understand how normal refcounts relate to this. For
example, qcow2_get_host_offset() doesn’t touch refcounts at all.
Max