Vladimir, >>> 4. Q: Should not get_{allocated,dirty} be separate commands? >>> cons: Two commands with almost same semantic and similar means? >>> pros: However here is a good point of separating clearly defined and >>> native >>> for block devices GET_BLOCK_STATUS from user-driven and actually >>> undefined data, called 'dirtyness'. >> I'm suggesting one generic 'read bitmap' command like you. > > To support get_block_status in this general read_bitmap, we will need to > define something like 'multibitmap', which allows several bits per chunk, as > allocation data has two: zero and allocated.
I think you are saying that for arbitrary 'bitmap' there might be more than one state. For instance, one might (in an allocation 'bitmap') have a hole, a non-hole-zero, or a non-hole-non-zero. In the spec I'd suggest, for one 'bitmap', we represent the output as extents. Each extent has a status. For the bitmap to be useful, at least two status need to be possible, but the above would have three. This could be internally implemented by the server as (a) a bitmap (with two bits per entry), (b) two bitmaps (possibly with different granularity), (c) something else (e.g. reading file extents, then if the data is allocated manually comparing it against zero). I should have put 'bitmap' in quotes in what I wrote because returning extents (as you suggested) is a good idea, and there need not be an actual bitmap. >>> 5. Number of status descriptors, sent by server, should be restricted >>> variants: >>> 1: just allow server to restrict this as it wants (which was done in v3) >>> 2: (not excluding 1). Client specifies somehow the maximum for number >>> of descriptors. >>> 2.1: add command flag, which will request only one descriptor >>> (otherwise, no restrictions from the client) >>> 2.2: again, introduce extended nbd requests, and add field to >>> specify this maximum >> I think some form of extended request is the way to go, but out of >> interest, what's the issue with as many descriptors being sent as it >> takes to encode the reply? The client can just consume the remainder >> (without buffering) and reissue the request at a later point for >> the areas it discarded. > > the issue is: too many descriptors possible. So, (1) solves it. (2) is > optional, just to simplify/optimize client side. I think I'd prefer the server to return what it was asked for, and the client to deal with it. So either the client should be able to specify a maximum number of extents (and if we are extending the command structure, that's possible) or we deal with the client consuming and retrying unwanted extents. The reason for this is that it's unlikely the server can know a priori the number of extents which is the appropriate maximum for the client. >>> + The list of block status descriptors within the >>> + `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions >>> + of the file starting from specified *offset*, and the sum of the >>> + *length* fields of each descriptor MUST not be greater than the >>> + overall *length* of the request. This means that the server MAY >>> + return less data than required. However the server MUST return at >>> + least one status descriptor >> I'm not sure I understand why that's useful. What should the client >> infer from the server refusing to provide information? We don't >> permit short reads etc. > > if the bitmap is 010101010101 we will have too many descriptors. For example, > 16tb disk, 64k granularity -> 2G of descriptors payload. Yep. And the cost of consuming and retrying is quite high. One option would be for the client to realise this is a possibility, and not request the entire extent map for a 16TB disk, as it might be very large! Even if the client worked at e.g. a 64MB level (where they'd get a maximum of 1024 extents per reply), this isn't going to noticeably increase the round trip timing. One issue here is that to determine a 'reasonable' size, the client needs to know the minimum length of any extent. I think the answer is probably a 'maximum number of extents' in the request packet. Of course with statuses in extent, the final extent could be represented as 'I don't know, break this bit into a separate request' status. -- Alex Bligh