On Sun 28 Jul 2024 01:01:07 AM +03, Nir Soffer wrote:
>> +def bitmap_set(bitmap, idx):
>> +    bitmap[int(idx / 8)] |= 1 << (idx % 8)
>
> Should use floor division operator (//):
>
>     bitmap[idx // 8] |= 1 << (idx % 8)
>
> Same for bitmap_test().

Right, also for all other cases of int(foo / bar). I'll update them.

>> +        ret = subprocess.run(
>> +            [
>> +                QEMU_STORAGE_DAEMON,
>> +                "--daemonize",
>
> Any reason to daemonize? managing the child process is easier if you
> don't daemonize.

--daemonize guarantees that when subprocess.run() returns the exported
raw_file is ready to use.

>> +            if len(cluster) < cluster_size:
>> +                cluster += bytes(cluster_size - len(cluster))
>
> This should be done only for the last cluster.

But that's what that condition is for, we'll only read less than
cluster_size bytes at the end of the file.

>> +                if not bitmap_test(l1_bitmap, l1_idx):
>> +                    bitmap_set(l1_bitmap, l1_idx)
>> +                    allocated_l2_tables += 1
>
> This could be much more efficient using SEEK_DATA/SEEK_HOLE, avoiding
> reading the entire image twice. Or using "qemu-img map --output json"
> to simplify.

I prefer not to have external dependencies(*) so I would rather not use
qemu-img, but I certainly can use SEEK_DATA/SEEK_HOLE to avoid reading
data that is known to be zero in the first pass.

(*) there is of course qemu-storage-daemon but that one is optional and
    I anyway cannot implement its functionality in this script.

>> +                sys.stdout.buffer.write(cluster)
>> +            else:
>> +                skip += 1
>
> If would be easier to work with if you add a function iterating over
> the l2_entries, yielding the he cluster index to copy:
>
>   def iter_l2_entries(bitmap, clusters):
>     for idx in range(clusters):
>       if bitmap_test(bitmap, idx):
>         yield idx
>
> The copy loop can read using os.pread():
>
>     for idx in iter_l2_entries(l2_bitmap, total_data_clusters):
>         cluster = os.pread(fd, cluster_size, cluster_size * idx)
>         sys.stdout.buffer.write(cluster)
>
> I'm not sure the offset is right in my example, it is hard to
> understand the semantics of skip in your code.

That part reads the input file sequentially from start to end, but
instead of reading empty clusters we use seek() to skip them. The 'skip'
variable keeps a counter of empty clusters since the last read.

Your proposal requires an additional function but I see that it can make
the code more readable, so I'll give it a try.

>> +if __name__ == "__main__":
>
> Usually code is easier to work with when __main__  calls main().

Good idea.

Thanks for the detailed review!

Berto

Reply via email to