Re: [PATCH 0/4] pci: Compare function number and ARI next function number

2023-07-01 Thread Michael S. Tsirkin
On Sat, Jul 01, 2023 at 04:01:18PM +0900, Akihiko Odaki wrote:
> The function number must be lower than the next function number
> advertised with ARI. Add a check to enforce this.
> 
> I suggested this change at:
> https://lore.kernel.org/qemu-devel/bf351f8b-1c8a-8a7a-7f44-17c9ba18f...@daynix.com/
> 
> Implementing this change, I found the devices implementing ARI do not set the
> correct next function numbers, which is also fixed in this series.

This isn't going to be merged with more in the way of motivation.

Analysis of at least linux guest behavious, documentation about testing,
addressing migration concerns are all not there either.



> Akihiko Odaki (4):
>   docs: Fix next function numbers in SR/IOV documentation
>   hw/nvme: Fix ARI next function numbers
>   igb: Fix ARI next function numbers
>   pci: Compare function number and ARI next function number
> 
>  docs/pcie_sriov.txt |  5 +++--
>  hw/net/igb_core.h   |  3 +++
>  hw/net/igb.c|  4 +---
>  hw/net/igbvf.c  |  5 -
>  hw/nvme/ctrl.c  |  7 ++-
>  hw/pci/pci.c| 15 +++
>  6 files changed, 32 insertions(+), 7 deletions(-)
> 
> -- 
> 2.41.0




Re: [PATCH 3/4] igb: Fix ARI next function numbers

2023-07-01 Thread Michael S. Tsirkin
On Sat, Jul 01, 2023 at 04:01:21PM +0900, Akihiko Odaki wrote:
> The next function numbers are expected to form a linked list ending with
> 0.
> 
> Fixes: 3a977deebe ("Intrdocue igb device emulation")
> Signed-off-by: Akihiko Odaki 
> ---
>  hw/net/igb_core.h | 3 +++
>  hw/net/igb.c  | 4 +---
>  hw/net/igbvf.c| 5 -
>  3 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/igb_core.h b/hw/net/igb_core.h
> index 9cbbfd516b..e1dab76995 100644
> --- a/hw/net/igb_core.h
> +++ b/hw/net/igb_core.h
> @@ -49,6 +49,9 @@
>  #define IGB_NUM_QUEUES  (16)
>  #define IGB_NUM_VM_POOLS(8)
>  
> +#define IGB_VF_OFFSET   (0x80)
> +#define IGB_VF_STRIDE   (2)
> +
>  typedef struct IGBCore IGBCore;
>  
>  enum { PHY_R = BIT(0),
> diff --git a/hw/net/igb.c b/hw/net/igb.c
> index 1c989d7677..543ca0114a 100644
> --- a/hw/net/igb.c
> +++ b/hw/net/igb.c
> @@ -81,8 +81,6 @@ struct IGBState {
>  };
>  
>  #define IGB_CAP_SRIOV_OFFSET(0x160)
> -#define IGB_VF_OFFSET   (0x80)
> -#define IGB_VF_STRIDE   (2)
>  
>  #define E1000E_MMIO_IDX 0
>  #define E1000E_FLASH_IDX1
> @@ -431,7 +429,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
> **errp)
>  hw_error("Failed to initialize AER capability");
>  }
>  
> -pcie_ari_init(pci_dev, 0x150, 1);
> +pcie_ari_init(pci_dev, 0x150, IGB_VF_OFFSET);
>  
>  pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
>  IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,


I think this change would break migrations from 8.0. No?


More importantly your commit log says linked list should end
with 0, but you make it point at a VF instead.


> diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
> index 284ea61184..bf2f237ab5 100644
> --- a/hw/net/igbvf.c
> +++ b/hw/net/igbvf.c
> @@ -240,6 +240,9 @@ static const MemoryRegionOps mmio_ops = {
>  static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
>  {
>  IgbVfState *s = IGBVF(dev);
> +uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
> +uint16_t nextfn = nextvfn < IGB_MAX_VF_FUNCTIONS ?
> +  IGB_VF_OFFSET + nextvfn * IGB_VF_STRIDE : 0;
>  int ret;
>  int i;
>  
> @@ -270,7 +273,7 @@ static void igbvf_pci_realize(PCIDevice *dev, Error 
> **errp)
>  hw_error("Failed to initialize AER capability");
>  }
>  
> -pcie_ari_init(dev, 0x150, 1);
> +pcie_ari_init(dev, 0x150, nextfn);



For this one I don't see why it matters at all:

The presence of Shadow Functions does not affect this field.
For VFs, this field is undefined since VFs are located using First VF Offset 
(see § Section 9.3.3.9 ) and VF
Stride (see § Section 9.3.3.10 ).




>  }
>  
>  static void igbvf_pci_uninit(PCIDevice *dev)
> -- 
> 2.41.0




Re: [PATCH 4/4] pci: Compare function number and ARI next function number

2023-07-01 Thread Michael S. Tsirkin
On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> The function number must be lower than the next function number
> advertised with ARI.
> 
> Signed-off-by: Akihiko Odaki 

I don't get this logic at all - where is the limitation coming from?

All I see in the spec is:
Next Function Number - With non-VFs, this field indicates the Function 
Number of the next higher
numbered Function in the Device, or 00h if there are no higher numbered 
Functions. Function 0 starts
this linked list of Functions.
The presence of Shadow Functions does not affect this field.
For VFs, this field is undefined since VFs are located using First VF 
Offset (see § Section 9.3.3.9 ) and VF
Stride (see § Section 9.3.3.10 ).

and

 To improve the enumeration performance and create a more deterministic 
solution, software can
enumerate Functions through a linked list of Function Numbers. The next 
linked list element is
communicated through each Function’s ARI Capability Register.
i. Function 0 acts as the head of a linked list of Function Numbers. 
Software detects a
non-Zero Next Function Number field within the ARI Capability Register 
as the next
Function within the linked list. Software issues a configuration probe 
using the Bus Number
captured by the Device and the Function Number derived from the ARI 
Capability Register
to locate the next associated Function’s configuration space.
ii. Function Numbers may be sparse and non-sequential in their 
consumption by an ARI
Device.





> ---
>  hw/pci/pci.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e2eb4c3b4a..568665ee42 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  Error *local_err = NULL;
>  bool is_default_rom;
>  uint16_t class_id;
> +uint16_t ari;
> +uint16_t nextfn;
>  
>  /*
>   * capped by systemd (see: udev-builtin-net_id.c)
> @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  }
>  }
>  
> +if (pci_is_express(pci_dev)) {
> +ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> +if (ari) {
> +nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 
> 8) & 0xff;
> +if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> +error_setg(errp, "PCI: function number %u is not lower than 
> ARI next function number %u",
> +   pci_dev->devfn & 0xff, nextfn);
> +pci_qdev_unrealize(DEVICE(pci_dev));
> +return;
> +}
> +}
> +}
> +
>  if (pci_dev->failover_pair_id) {
>  if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
>  error_setg(errp, "failover primary device must be on "
> -- 
> 2.41.0




Re: [PATCH 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-01 Thread Akihiko Odaki

On 2023/07/01 23:31, Ani Sinha wrote:




On 01-Jul-2023, at 12:31 PM, Akihiko Odaki  wrote:

The next function numbers are expected to form a linked list ending with
0.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Signed-off-by: Akihiko Odaki 
---
docs/pcie_sriov.txt | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..cc4232e49a 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x70);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, fun_offset);
   ...

   /* Add and initialize the SR/IOV capability */
@@ -76,9 +76,10 @@ setting up a BAR for a VF.
pci_your_vf_dev_realize( ... )
{
   ...
+  uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
   int ret = pcie_endpoint_cap_init(d, 0x60);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, nextvfn < total_vfs ? fun_offset + nextvfn * 
stride : 0);


 
I think this will be fun_offset and not just 0
Same with the other patches ..


It is intended to point to the PF. fun_offset points to the first VF.




   ...
   memory_region_init(mr, ... )
   pcie_sriov_vf_register_bar(d, bar_nr, mr);
--
2.41.0







[PATCH v2] hw/ide/piix: properly initialize the BMIBA register

2023-07-01 Thread Olaf Hering
According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA—BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.

The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.

This bug went unnoticed until commit ee358e919e38 ("hw/ide/piix: Convert
reset handler to DeviceReset"). After this change, piix_ide_reset is
exercised after the "unplug" command from a Xen HVM domU, which was not
the case prior that commit. This function resets the command register.
As a result the ata_piix driver inside the domU will see a disabled PCI
device. The generic PCI code will reenable the PCI device. On the qemu
side, this runs pci_default_write_config/pci_update_mappings. Here a
changed address is returned by pci_bar_address, this is the address
which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
address changes from 0xc120 to 0xc100.

While the unplug is supposed to hide the IDE disks, the changed BMIBA
address breaks the UHCI device. In case the domU has an USB tablet
configured, to recive absolute pointer coordinates for the GUI, it will
cause a hang during device discovery of the partly discovered USB hid
device. Reading the USBSTS word size register will fail. The access ends
up in the QEMU piix-bmdma device, instead of the expected uhci device.
Here a byte size request is expected, and a value of ~0 is returned. As
a result the UCHI driver sees an error state in the register, and turns
off the UHCI controller.

Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 
82371SB (Function 1)")

Signed-off-by: Olaf Hering 
---
 hw/ide/piix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..1e346b1b1d 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_COMMAND, 0x);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
+pci_set_long(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
 static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)

base-commit: d145c0da22cde391d8c6672d33146ce306e8bf75



Re: [PATCH v1] hw/ide/piix: properly initialize the BMIBA register

2023-07-01 Thread Olaf Hering
Sat, 1 Jul 2023 15:34:40 +0200 (CEST) BALATON Zoltan :

> If all 32 bits should be writtern does this need pci_set_long instead of 
> pci_set_word?

Thanks for spotting. After a number of experiments I used the wrong variant.


Olaf


pgpqRNZ3DILTu.pgp
Description: Digitale Signatur von OpenPGP


Re: [PATCH 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-01 Thread Ani Sinha



> On 01-Jul-2023, at 12:31 PM, Akihiko Odaki  wrote:
> 
> The next function numbers are expected to form a linked list ending with
> 0.
> 
> Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
> docs/pcie_sriov.txt")
> Signed-off-by: Akihiko Odaki 
> ---
> docs/pcie_sriov.txt | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> index 7eff7f2703..cc4232e49a 100644
> --- a/docs/pcie_sriov.txt
> +++ b/docs/pcie_sriov.txt
> @@ -48,7 +48,7 @@ setting up a BAR for a VF.
>   ...
>   int ret = pcie_endpoint_cap_init(d, 0x70);
>   ...
> -  pcie_ari_init(d, 0x100, 1);
> +  pcie_ari_init(d, 0x100, fun_offset);
>   ...
> 
>   /* Add and initialize the SR/IOV capability */
> @@ -76,9 +76,10 @@ setting up a BAR for a VF.
>pci_your_vf_dev_realize( ... )
>{
>   ...
> +  uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
>   int ret = pcie_endpoint_cap_init(d, 0x60);
>   ...
> -  pcie_ari_init(d, 0x100, 1);
> +  pcie_ari_init(d, 0x100, nextvfn < total_vfs ? fun_offset + nextvfn * 
> stride : 0);


I think this will be fun_offset and not just 0
Same with the other patches ..

>   ...
>   memory_region_init(mr, ... )
>   pcie_sriov_vf_register_bar(d, bar_nr, mr);
> -- 
> 2.41.0
> 




Re: [PATCH v1] hw/ide/piix: properly initialize the BMIBA register

2023-07-01 Thread BALATON Zoltan

On Sat, 1 Jul 2023, Olaf Hering wrote:

According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA—BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #1 "Resource Type Indicator (RTE)" needs to be enabled.

The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.

This bug went unnoticed until commit ee358e919e38 ("hw/ide/piix: Convert
reset handler to DeviceReset"). After this change, piix_ide_reset is
exercised after the "unplug" command from a Xen HVM domU, which was not
the case prior that commit. This function resets the command register.
As a result the ata_piix driver inside the domU will see a disabled PCI
device. The generic PCI code will reenable the PCI device. On the qemu
side, this runs pci_default_write_config/pci_update_mappings. Here a
changed address is returned by pci_bar_address, this is the address
which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
address changes from 0xc120 to 0xc100.

While the unplug is supposed to hide the IDE disks, the changed BMIBA
address breaks the UHCI device. In case the domU has an USB tablet
configured, to recive absolute pointer coordinates for the GUI, it will
cause a hang during device discovery of the partly discovered USB hid
device. Reading the USBSTS word size register will fail. The access ends
up in the QEMU piix-bmdma device, instead of the expected uhci device.
Here a byte size request is expected, and a value of ~0 is returned. As
a result the UCHI driver sees an error state in the register, and turns
off the UHCI controller.

Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 
82371SB (Function 1)")

Signed-off-by: Olaf Hering 
---
hw/ide/piix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..6449ba8b6b 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev)
pci_set_word(pci_conf + PCI_COMMAND, 0x);
pci_set_word(pci_conf + PCI_STATUS,
 PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
+pci_set_word(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */


Commit message is way longer than the patch itself and very detailed but I 
may have lost in the details. If all 32 bits should be writtern does this 
need pci_set_long instead of pci_set_word?


Regards,
BALATON Zoltan


}

static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)

base-commit: d145c0da22cde391d8c6672d33146ce306e8bf75



[PATCH v1] hw/ide/piix: properly initialize the BMIBA register

2023-07-01 Thread Olaf Hering


According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA—BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #1 "Resource Type Indicator (RTE)" needs to be enabled.

The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.

This bug went unnoticed until commit ee358e919e38 ("hw/ide/piix: Convert
reset handler to DeviceReset"). After this change, piix_ide_reset is
exercised after the "unplug" command from a Xen HVM domU, which was not
the case prior that commit. This function resets the command register.
As a result the ata_piix driver inside the domU will see a disabled PCI
device. The generic PCI code will reenable the PCI device. On the qemu
side, this runs pci_default_write_config/pci_update_mappings. Here a
changed address is returned by pci_bar_address, this is the address
which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
address changes from 0xc120 to 0xc100.

While the unplug is supposed to hide the IDE disks, the changed BMIBA
address breaks the UHCI device. In case the domU has an USB tablet
configured, to recive absolute pointer coordinates for the GUI, it will
cause a hang during device discovery of the partly discovered USB hid
device. Reading the USBSTS word size register will fail. The access ends
up in the QEMU piix-bmdma device, instead of the expected uhci device.
Here a byte size request is expected, and a value of ~0 is returned. As
a result the UCHI driver sees an error state in the register, and turns
off the UHCI controller.

Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 
82371SB (Function 1)")

Signed-off-by: Olaf Hering 
---
 hw/ide/piix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..6449ba8b6b 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_COMMAND, 0x);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
+pci_set_word(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
 }
 
 static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)

base-commit: d145c0da22cde391d8c6672d33146ce306e8bf75



[PATCH v2 7/7] iotests: Add test for data_off check

2023-07-01 Thread Alexander Ivanov
Write a pattern to the first cluster. Corrupt the data_off field and check
if the field was repaired on image opening and the pattern has not changed.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 15 +++
 tests/qemu-iotests/tests/parallels-checks.out | 10 ++
 2 files changed, 25 insertions(+)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index 8a63c3daf4..a7a1b357b5 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -44,6 +44,7 @@ _supported_os Linux
 SIZE=$((4 * 1024 * 1024))
 IMGFMT=parallels
 CLUSTER_SIZE_OFFSET=28
+DATA_OFF_OFFSET=48
 BAT_OFFSET=64
 
 _make_test_img $SIZE
@@ -124,6 +125,20 @@ printf "content: 0x%02x\n" `peek_file_le $TEST_IMG 
$(($CLUSTER_SIZE)) 1`
 echo "== check second cluster on host =="
 printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
 
+# Clear image
+_make_test_img $SIZE
+
+echo "== TEST DATA_OFF CHECK =="
+
+echo "== write pattern to first cluster =="
+{ $QEMU_IO -c "write -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== spoil data_off field =="
+poke_file "$TEST_IMG" "$DATA_OFF_OFFSET" "\xff\xff\xff\xff"
+
+echo "== check first cluster =="
+{ $QEMU_IO -c "read -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index b747bba1f3..98a3a7f55e 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -62,4 +62,14 @@ read 1048576/1048576 bytes at offset 1048576
 content: 0x11
 == check second cluster on host ==
 content: 0x11
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST DATA_OFF CHECK ==
+== write pattern to first cluster ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== spoil data_off field ==
+== check first cluster ==
+Repairing data_off field has incorrect value
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.34.1




[PATCH v2 6/7] iotests: Fix test 131 after repair was added to parallels_open()

2023-07-01 Thread Alexander Ivanov
Images repairing in parallels_open() was added, thus parallels tests fail.
Access to an image leads to repairing the image. Further image check don't
detect any corruption. Remove reads after image creation in test 131.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/131 |  6 ++
 tests/qemu-iotests/131.out | 15 ++-
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index 72f6535581..304bbb3f61 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -69,11 +69,9 @@ echo == check that there is no trash after written ==
 echo == check that there is no trash before written ==
 { $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
-echo "== Corrupt image =="
+echo "== corrupt image =="
 poke_file "$TEST_IMG" "$inuse_offset" "\x59\x6e\x6f\x74"
-{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
-_check_test_img
-_check_test_img -r all
+echo "== read corrupted image with repairing =="
 { $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
 echo "== allocate with backing =="
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index 98017a067e..d2904578df 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -21,20 +21,9 @@ read 524288/524288 bytes at offset 2621440
 == check that there is no trash before written ==
 read 524288/524288 bytes at offset 0
 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-== Corrupt image ==
-qemu-io: can't open device TEST_DIR/t.parallels: parallels: Image was not 
closed correctly; cannot be opened read/write
-ERROR image was not closed correctly
-
-1 errors were found on the image.
-Data may be corrupted, or further writes to the image may corrupt it.
+== corrupt image ==
+== read corrupted image with repairing ==
 Repairing image was not closed correctly
-The following inconsistencies were found and repaired:
-
-0 leaked clusters
-1 corruptions
-
-Double checking the fixed image now...
-No errors were found on the image.
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == allocate with backing ==
-- 
2.34.1




[PATCH v2 2/7] iotests: Add leak check test for parallels format

2023-07-01 Thread Alexander Ivanov
Write a pattern to the last cluster, extend the image by 1 claster, repair
and check that the last cluster still has the same pattern.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 27 +++
 tests/qemu-iotests/tests/parallels-checks.out | 22 +++
 2 files changed, 49 insertions(+)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index 055ce34766..8be282fabe 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -65,6 +65,33 @@ poke_file "$TEST_IMG" "$BAT_OFFSET" "\x$cluster\x00\x00\x00"
 echo "== read corrupted image with repairing =="
 { $QEMU_IO -c "read -P 0x00 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
+# Clear image
+_make_test_img $SIZE
+
+echo "== TEST LEAK CHECK =="
+
+echo "== write pattern to last cluster =="
+echo "write -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE"
+{ $QEMU_IO -c "write -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+file_size=`stat --printf="%s" "$TEST_IMG"`
+echo "file size: $file_size"
+
+echo "== extend image by 1 cluster =="
+fallocate -xl $((file_size + CLUSTER_SIZE)) "$TEST_IMG"
+
+file_size=`stat --printf="%s" "$TEST_IMG"`
+echo "file size: $file_size"
+
+echo "== repair image =="
+_check_test_img -r all
+
+file_size=`stat --printf="%s" "$TEST_IMG"`
+echo "file size: $file_size"
+
+echo "== check last cluster =="
+{ $QEMU_IO -c "read -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index ea4dcef0a6..f2cb6dde85 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -9,4 +9,26 @@ wrote 4194304/4194304 bytes at offset 0
 Repairing cluster 0 is outside image
 read 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST LEAK CHECK ==
+== write pattern to last cluster ==
+write -P 0x11 3145728 1048576
+wrote 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+file size: 2097152
+== extend image by 1 cluster ==
+file size: 3145728
+== repair image ==
+Repairing space leaked at the end of the image 1048576
+The following inconsistencies were found and repaired:
+
+1 leaked clusters
+0 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+file size: 2097152
+== check last cluster ==
+read 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.34.1




[PATCH v2 5/7] iotests: Fix cluster size in parallels images tests (131)

2023-07-01 Thread Alexander Ivanov
In this test cluster size is 64k, but modern tools generate images with
cluster size 1M. Calculate cluster size using track field from image header.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/131 |  5 -
 tests/qemu-iotests/131.out | 44 +++---
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index 601546c84c..72f6535581 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -44,10 +44,13 @@ _supported_os Linux
 inuse_offset=$((0x2c))
 
 size=$((64 * 1024 * 1024))
-CLUSTER_SIZE=$((64 * 1024))
 IMGFMT=parallels
 _make_test_img $size
 
+# get cluster size in sectors from "tracks" header field
+CLUSTER_SIZE_OFFSET=28
+CLUSTER_SIZE=$(peek_file_le $TEST_IMG $CLUSTER_SIZE_OFFSET 4)
+CLUSTER_SIZE=$((CLUSTER_SIZE * 512))
 CLUSTER_HALF_SIZE=$((CLUSTER_SIZE / 2))
 CLUSTER_DBL_SIZE=$((CLUSTER_SIZE * 2))
 
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index de5ef7a8f5..98017a067e 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -1,26 +1,26 @@
 QA output created by 131
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 == read empty image ==
-read 65536/65536 bytes at offset 32768
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 524288
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == write more than 1 block in a row ==
-wrote 131072/131072 bytes at offset 32768
-128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 524288
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == read less than block ==
-read 32768/32768 bytes at offset 32768
-32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == read exactly 1 block ==
-read 65536/65536 bytes at offset 65536
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == read more than 1 block ==
-read 131072/131072 bytes at offset 32768
-128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2097152/2097152 bytes at offset 524288
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == check that there is no trash after written ==
-read 32768/32768 bytes at offset 163840
-32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 2621440
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == check that there is no trash before written ==
-read 32768/32768 bytes at offset 0
-32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == Corrupt image ==
 qemu-io: can't open device TEST_DIR/t.parallels: parallels: Image was not 
closed correctly; cannot be opened read/write
 ERROR image was not closed correctly
@@ -35,19 +35,19 @@ The following inconsistencies were found and repaired:
 
 Double checking the fixed image now...
 No errors were found on the image.
-read 65536/65536 bytes at offset 65536
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == allocate with backing ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
-wrote 65536/65536 bytes at offset 0
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 64/64 bytes at offset 0
 64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 64/64 bytes at offset 0
 64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 65472/65472 bytes at offset 64
-63.938 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 67043328/67043328 bytes at offset 65536
-63.938 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048512/1048512 bytes at offset 64
+1023.938 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 66060288/66060288 bytes at offset 1048576
+63 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.34.1




[PATCH v2 0/7] iotests/parallels: Add new tests and fix old

2023-07-01 Thread Alexander Ivanov
This patchset should be applied on top of [PATCH v7 0/8] parallels: Add
duplication check, repair at open, fix bugs

Add out-of-image, leak and BAT entries duplication checks tests.

Old parallels images check test (131):  Refactor, fix cluster size and fix
after repairing was added to parallels_open().

v2:
5: Fix a typo.
7: Add a test for data_off check.

Alexander Ivanov (7):
  iotests: Add out-of-image check test for parallels format
  iotests: Add leak check test for parallels format
  iotests: Add test for BAT entries duplication check
  iotests: Refactor tests of parallels images checks (131)
  iotests: Fix cluster size in parallels images tests (131)
  iotests: Fix test 131 after repair was added to parallels_open()
  iotests: Add test for data_off check

 tests/qemu-iotests/131|  36 +++--
 tests/qemu-iotests/131.out|  59 +++
 tests/qemu-iotests/tests/parallels-checks | 145 ++
 tests/qemu-iotests/tests/parallels-checks.out |  75 +
 4 files changed, 264 insertions(+), 51 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/parallels-checks
 create mode 100644 tests/qemu-iotests/tests/parallels-checks.out

-- 
2.34.1




[PATCH v2 1/7] iotests: Add out-of-image check test for parallels format

2023-07-01 Thread Alexander Ivanov
Fill the image with a pattern to generate entries in the BAT, set the first
BAT entry outside the image, try to read the corrupted image. At the image
opening it should be repaired, check for zeroes in the first cluster.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 71 +++
 tests/qemu-iotests/tests/parallels-checks.out | 12 
 2 files changed, 83 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/parallels-checks
 create mode 100644 tests/qemu-iotests/tests/parallels-checks.out

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
new file mode 100755
index 00..055ce34766
--- /dev/null
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Test qemu-img check for parallels format
+#
+# Copyright (C) 2022 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=alexander.iva...@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+
+_supported_fmt parallels
+_supported_proto file
+_supported_os Linux
+
+SIZE=$((4 * 1024 * 1024))
+IMGFMT=parallels
+CLUSTER_SIZE_OFFSET=28
+BAT_OFFSET=64
+
+_make_test_img $SIZE
+
+CLUSTER_SIZE=$(peek_file_le $TEST_IMG $CLUSTER_SIZE_OFFSET 4)
+CLUSTER_SIZE=$((CLUSTER_SIZE * 512))
+LAST_CLUSTER_OFF=$((SIZE - CLUSTER_SIZE))
+LAST_CLUSTER=$((LAST_CLUSTER_OFF/CLUSTER_SIZE))
+
+echo "== TEST OUT OF IMAGE CHECK =="
+
+echo "== write pattern =="
+{ $QEMU_IO -c "write -P 0x11 0 $SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+echo "== corrupt image =="
+cluster=$(($LAST_CLUSTER + 2))
+poke_file "$TEST_IMG" "$BAT_OFFSET" "\x$cluster\x00\x00\x00"
+
+echo "== read corrupted image with repairing =="
+{ $QEMU_IO -c "read -P 0x00 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
new file mode 100644
index 00..ea4dcef0a6
--- /dev/null
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -0,0 +1,12 @@
+QA output created by parallels-checks
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST OUT OF IMAGE CHECK ==
+== write pattern ==
+wrote 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== corrupt image ==
+== read corrupted image with repairing ==
+Repairing cluster 0 is outside image
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
2.34.1




[PATCH v2 3/7] iotests: Add test for BAT entries duplication check

2023-07-01 Thread Alexander Ivanov
Fill a parallels image with a pattern and write another pattern to the
second cluster. Corrupt the image and check if the pattern changes. Repair
the image and check the patterns on guest and host sides.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 32 +++
 tests/qemu-iotests/tests/parallels-checks.out | 31 ++
 2 files changed, 63 insertions(+)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index 8be282fabe..8a63c3daf4 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -92,6 +92,38 @@ echo "file size: $file_size"
 echo "== check last cluster =="
 { $QEMU_IO -c "read -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
+# Clear image
+_make_test_img $SIZE
+
+echo "== TEST DUPLICATION CHECK =="
+
+echo "== write pattern to whole image =="
+{ $QEMU_IO -c "write -P 0x11 0 $SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+echo "== write another pattern to second cluster =="
+{ $QEMU_IO -c "write -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 
| _filter_qemu_io | _filter_testdir
+
+echo "== check second cluster =="
+{ $QEMU_IO -c "read -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== corrupt image =="
+poke_file "$TEST_IMG" "$(($BAT_OFFSET + 4))" "\x01\x00\x00\x00"
+
+echo "== check second cluster =="
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== repair image =="
+_check_test_img -r all
+
+echo "== check second cluster =="
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check first cluster on host =="
+printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
+
+echo "== check second cluster on host =="
+printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index f2cb6dde85..b747bba1f3 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -31,4 +31,35 @@ file size: 2097152
 == check last cluster ==
 read 1048576/1048576 bytes at offset 3145728
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST DUPLICATION CHECK ==
+== write pattern to whole image ==
+wrote 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== write another pattern to second cluster ==
+wrote 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== corrupt image ==
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== repair image ==
+Repairing duplicate offset in BAT entry 1
+The following inconsistencies were found and repaired:
+
+0 leaked clusters
+1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check first cluster on host ==
+content: 0x11
+== check second cluster on host ==
+content: 0x11
 *** done
-- 
2.34.1




[PATCH v2 4/7] iotests: Refactor tests of parallels images checks (131)

2023-07-01 Thread Alexander Ivanov
Replace hardcoded numbers by variables.

Signed-off-by: Alexander Ivanov 
---
 tests/qemu-iotests/131 | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index a847692b4c..601546c84c 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -44,31 +44,34 @@ _supported_os Linux
 inuse_offset=$((0x2c))
 
 size=$((64 * 1024 * 1024))
-CLUSTER_SIZE=64k
+CLUSTER_SIZE=$((64 * 1024))
 IMGFMT=parallels
 _make_test_img $size
 
+CLUSTER_HALF_SIZE=$((CLUSTER_SIZE / 2))
+CLUSTER_DBL_SIZE=$((CLUSTER_SIZE * 2))
+
 echo == read empty image ==
-{ $QEMU_IO -c "read -P 0 32k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0 $CLUSTER_HALF_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 
| _filter_qemu_io | _filter_testdir
 echo == write more than 1 block in a row ==
-{ $QEMU_IO -c "write -P 0x11 32k 128k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "write -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_DBL_SIZE" 
"$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 echo == read less than block ==
-{ $QEMU_IO -c "read -P 0x11 32k 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_HALF_SIZE" 
"$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 echo == read exactly 1 block ==
-{ $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 echo == read more than 1 block ==
-{ $QEMU_IO -c "read -P 0x11 32k 128k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_DBL_SIZE" "$TEST_IMG"; 
} 2>&1 | _filter_qemu_io | _filter_testdir
 echo == check that there is no trash after written ==
-{ $QEMU_IO -c "read -P 0 160k 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0 $((CLUSTER_HALF_SIZE + CLUSTER_DBL_SIZE)) 
$CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 echo == check that there is no trash before written ==
-{ $QEMU_IO -c "read -P 0 0 32k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
 echo "== Corrupt image =="
 poke_file "$TEST_IMG" "$inuse_offset" "\x59\x6e\x6f\x74"
-{ $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 _check_test_img
 _check_test_img -r all
-{ $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
 echo "== allocate with backing =="
 # Verify that allocating clusters works fine even when there is a backing 
image.
@@ -83,7 +86,7 @@ TEST_IMG="$TEST_IMG.base" _make_test_img $size
 
 # Write some data to the base image (which would trigger an assertion failure 
if
 # interpreted as a QEMUIOVector)
-$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -P 42 0 $CLUSTER_SIZE" "$TEST_IMG.base" | _filter_qemu_io
 
 # Parallels does not seem to support storing a backing filename in the image
 # itself, so we need to build our backing chain on the command line
@@ -99,8 +102,8 @@ QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
 QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
 $QEMU_IO --image-opts "$imgopts" \
 -c 'read -P 1 0 64' \
--c "read -P 42 64 $((64 * 1024 - 64))" \
--c "read -P 0 64k $((size - 64 * 1024))" \
+-c "read -P 42 64 $((CLUSTER_SIZE - 64))" \
+-c "read -P 0 $CLUSTER_SIZE $((size - CLUSTER_SIZE))" \
 | _filter_qemu_io
 
 # success, all done
-- 
2.34.1




[PATCH v7 0/8] parallels: Add duplication check, repair at open, fix bugs

2023-07-01 Thread Alexander Ivanov
Fix incorrect data end calculation in parallels_open().

Check if data_end greater than the file size.

Add change_info argument to parallels_check_leak().

Add checking and repairing duplicate offsets in BAT

Image repairing in parallels_open().

v7:
3: Renamed "change_info" argument to "explicit", move info printing under
   explicit condition.
4: Different patch. Add data_start field to BDRVParallelsState for future
   host_cluster_index() function.
5: Prevously was 4th. Used s->data_start instead of s->header->data_off.
   Add bitmap size increasing if file length is not cluster aligned. Revert
   a couple conditions for better readability. Changed sectors calculation
   for better code transparency. Replaced sector variable by host_sector and
   guest_sector. Renamed off to host_off. Moved out_repare_bat: below return
   to get jumps on error path only.
6: Prevously was 5th.
7: New patch. Replace bdrv_getlength() by bdrv_co_getlength() in
   parallels_check_outside_image() because it is used in coroutine context.
8: New patch. Add data_off check.

v6:
2: Different patch. Refused to split image leak handling. Instead there is a
   patch with a data_end check.
3: Different patch. There is a patch with change_info argument.
4: Removed changing fprintf by qemu_log from this patchset. Previously 3rd
   patch became 4th. Replaced qemu_memalign() by qemu_blockalign(). Got
   rid of iovecs, replaced bdrv_co_pwritev() by bdrv_co_pwrite(). Added
   assert(cluster_index < bitmap_size). Now BAT changes are reverted if
   there was an error in the cluster copying process. Simplified a sector
   calculation.
5: Moved header magic check to the appropriate place. Added a
   migrate_del_blocker() call and s->bat_dirty_bmap freeing on error.

v5:
3: Fixed a byteorder bug, fixed zero-length image handling and fixed uint32
   truncation.

v4:
2,5: Rebased.

v3:
2: Added (size >= res->image_end_offset) assert and changed the comment in
   parallels_get_leak_size(). Changed error printing and leaks fixing order.
3: Removed highest_offset() helper, instead image_end_offset field is used.
5: Moved highest_offset() code to parallels_open() - now it is used only in
   this function. Fixed data_end update condition. Fixed a leak of
   s->migration_blocker.

v2:
2: Moved outsude parallels_check_leak() 2 helpers:
   parallels_get_leak_size() and parallels_fix_leak().
   
3: Used highest_offset() helper in parallels_check_leak(). Fixed a typo.
   Added comments. Replaced g_malloc() call by qemu_memalign(). Replaced
   bdrv_pread() call by bdrv_co_pread(). Got rid of keeping bytes and
   sectors in the same variable. Added setting the bitmap of the used
   clusters for a new allocated cluster if it isn't out of the bitmap.
   Moved the leak fix to the end of all the checks. Removed a dependence
   on image format for the duplicate check.
   
4 (old): Merged this patch to the previous.
4 (former 5): Fixed formatting.
5 (former 6): Fixed comments. Added O_INACTIVE check in the condition.
  Replaced inuse detection by header_unclean checking.
  Replaced playing with corutines by bdrv_check() usage.


Alexander Ivanov (8):
  parallels: Incorrect data end calculation in parallels_open()
  parallels: Check if data_end greater than the file size
  parallels: Add "explicit" argument to parallels_check_leak()
  parallels: Add data_start field to BDRVParallelsState
  parallels: Add checking and repairing duplicate offsets in BAT
  parallels: Image repairing in parallels_open()
  parallels: Use bdrv_co_getlength() in parallels_check_outside_image()
  parallels: Add data_off check

 block/parallels.c | 317 +++---
 block/parallels.h |   1 +
 2 files changed, 274 insertions(+), 44 deletions(-)

-- 
2.34.1




[PATCH v7 7/8] parallels: Use bdrv_co_getlength() in parallels_check_outside_image()

2023-07-01 Thread Alexander Ivanov
bdrv_co_getlength() should be used in coroutine context. Replace
bdrv_getlength() by bdrv_co_getlength() in parallels_check_outside_image().

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index 51fd8ddf5a..456a13bd28 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -496,7 +496,7 @@ parallels_check_leak(BlockDriverState *bs, BdrvCheckResult 
*res,
 int64_t size;
 int ret;
 
-size = bdrv_getlength(bs->file->bs);
+size = bdrv_co_getlength(bs->file->bs);
 if (size < 0) {
 res->check_errors++;
 return size;
-- 
2.34.1




[PATCH v7 6/8] parallels: Image repairing in parallels_open()

2023-07-01 Thread Alexander Ivanov
Repair an image at opening if the image is unclean or out-of-image
corruption was detected.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 70 +--
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 0f207c4b32..51fd8ddf5a 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -947,7 +947,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
-int64_t file_nb_sectors;
+int64_t file_nb_sectors, sector;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -1018,11 +1018,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
and actual data. We can't avoid read-modify-write... */
 s->header_size = size;
 }
-if (s->data_end > file_nb_sectors) {
-error_setg(errp, "Invalid image: incorrect data_off field");
-ret = -EINVAL;
-goto fail;
-}
 
 ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
 if (ret < 0) {
@@ -1030,33 +1025,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 s->bat_bitmap = (uint32_t *)(s->header + 1);
 
-for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i);
-if (off >= file_nb_sectors) {
-if (flags & BDRV_O_CHECK) {
-continue;
-}
-error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
-   "is larger than file size (%" PRIi64 ")",
-   off << BDRV_SECTOR_BITS, i,
-   file_nb_sectors << BDRV_SECTOR_BITS);
-ret = -EINVAL;
-goto fail;
-}
-if (off >= s->data_end) {
-s->data_end = off + s->tracks;
-}
-}
-
 if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
-/* Image was not closed correctly. The check is mandatory */
 s->header_unclean = true;
-if ((flags & BDRV_O_RDWR) && !(flags & BDRV_O_CHECK)) {
-error_setg(errp, "parallels: Image was not closed correctly; "
-   "cannot be opened read/write");
-ret = -EACCES;
-goto fail;
-}
 }
 
 opts = qemu_opts_create(_runtime_opts, NULL, 0, errp);
@@ -1117,10 +1087,40 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
bdrv_get_device_or_node_name(bs));
 ret = migrate_add_blocker(s->migration_blocker, errp);
 if (ret < 0) {
-error_free(s->migration_blocker);
+error_setg(errp, "Migration blocker error");
 goto fail;
 }
 qemu_co_mutex_init(>lock);
+
+for (i = 0; i < s->bat_size; i++) {
+sector = bat2sect(s, i);
+if (sector + s->tracks > s->data_end) {
+s->data_end = sector + s->tracks;
+}
+}
+
+/*
+ * We don't repair the image here if it's opened for checks. Also we don't
+ * want to change inactive images and can't change readonly images.
+ */
+if ((flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) || !(flags & BDRV_O_RDWR)) {
+return 0;
+}
+
+/*
+ * Repair the image if it's dirty or
+ * out-of-image corruption was detected.
+ */
+if (s->data_end > file_nb_sectors || s->header_unclean) {
+BdrvCheckResult res;
+ret = bdrv_check(bs, , BDRV_FIX_ERRORS | BDRV_FIX_LEAKS);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not repair corrupted image");
+migrate_del_blocker(s->migration_blocker);
+goto fail;
+}
+}
+
 return 0;
 
 fail_format:
@@ -1128,6 +1128,12 @@ fail_format:
 fail_options:
 ret = -EINVAL;
 fail:
+/*
+ * "s" object was allocated by g_malloc0 so we can safely
+ * try to free its fields even they were not allocated.
+ */
+error_free(s->migration_blocker);
+g_free(s->bat_dirty_bmap);
 qemu_vfree(s->header);
 return ret;
 }
-- 
2.34.1




[PATCH v7 8/8] parallels: Add data_off check

2023-07-01 Thread Alexander Ivanov
data_off field of the parallels image header can be corrupted. Check if
this field greater than the header + BAT size and less than file size.
Change checking code in parallels_open() accordingly.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 98 +++
 1 file changed, 83 insertions(+), 15 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 456a13bd28..51e79056df 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -446,6 +446,65 @@ static void parallels_check_unclean(BlockDriverState *bs,
 }
 }
 
+/*
+ * Returns 0 if data_off is correct or returns correct offset.
+ */
+static uint32_t parallels_test_data_off(BDRVParallelsState *s,
+int64_t file_nb_sectors)
+{
+uint32_t data_off, min_off;
+bool old_magic;
+
+old_magic = !memcmp(s->header->magic, HEADER_MAGIC, 16);
+
+data_off = le32_to_cpu(s->header->data_off);
+if (data_off == 0 && old_magic) {
+return 0;
+}
+
+min_off = DIV_ROUND_UP(bat_entry_off(s->bat_size), BDRV_SECTOR_SIZE);
+if (!old_magic) {
+min_off = ROUND_UP(min_off, s->cluster_size / BDRV_SECTOR_SIZE);
+}
+
+if (data_off >= min_off && data_off <= file_nb_sectors) {
+return 0;
+}
+
+return min_off;
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+parallels_check_data_off(BlockDriverState *bs, BdrvCheckResult *res,
+ BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t file_size;
+uint32_t data_off;
+
+file_size = bdrv_co_nb_sectors(bs->file->bs);
+if (file_size < 0) {
+res->check_errors++;
+return file_size;
+}
+
+data_off = parallels_test_data_off(s, file_size);
+if (data_off == 0) {
+return 0;
+}
+
+res->corruptions++;
+if (fix & BDRV_FIX_ERRORS) {
+s->header->data_off = cpu_to_le32(data_off);
+res->corruptions_fixed++;
+}
+
+fprintf(stderr, "%s data_off field has incorrect value\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+
+return 0;
+}
+
 static int coroutine_fn GRAPH_RDLOCK
 parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
   BdrvCheckMode fix)
@@ -709,6 +768,11 @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult 
*res,
 WITH_QEMU_LOCK_GUARD(>lock) {
 parallels_check_unclean(bs, res, fix);
 
+ret = parallels_check_data_off(bs, res, fix);
+if (ret < 0) {
+return ret;
+}
+
 ret = parallels_check_outside_image(bs, res, fix);
 if (ret < 0) {
 return ret;
@@ -947,7 +1011,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
-int64_t file_nb_sectors, sector;
+int64_t file_nb_sectors, sector, new_data_start;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -1008,9 +1072,22 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -ENOMEM;
 goto fail;
 }
-s->data_start = le32_to_cpu(ph.data_off);
-if (s->data_start == 0) {
-s->data_start = DIV_ROUND_UP(size, BDRV_SECTOR_SIZE);
+
+ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
+if (ret < 0) {
+goto fail;
+}
+s->bat_bitmap = (uint32_t *)(s->header + 1);
+
+if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
+s->header_unclean = true;
+}
+
+new_data_start = parallels_test_data_off(s, file_nb_sectors);
+if (new_data_start == 0) {
+s->data_start = le32_to_cpu(ph.data_off);
+} else {
+s->data_start = new_data_start;
 }
 s->data_end = s->data_start;
 if (s->data_end < (s->header_size >> BDRV_SECTOR_BITS)) {
@@ -1019,16 +1096,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->header_size = size;
 }
 
-ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
-if (ret < 0) {
-goto fail;
-}
-s->bat_bitmap = (uint32_t *)(s->header + 1);
-
-if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
-s->header_unclean = true;
-}
-
 opts = qemu_opts_create(_runtime_opts, NULL, 0, errp);
 if (!opts) {
 goto fail_options;
@@ -,7 +1178,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  * Repair the image if it's dirty or
  * out-of-image corruption was detected.
  */
-if (s->data_end > file_nb_sectors || s->header_unclean) {
+if (s->data_end > file_nb_sectors || s->header_unclean
+|| new_data_start != 0) {
 BdrvCheckResult res;
 ret = bdrv_check(bs, , BDRV_FIX_ERRORS | BDRV_FIX_LEAKS);
 if (ret < 0) {
-- 
2.34.1




[PATCH v7 3/8] parallels: Add "explicit" argument to parallels_check_leak()

2023-07-01 Thread Alexander Ivanov
In the on of the next patches we need to repair leaks without changing
leaks and leaks_fixed info in res. Also we don't want to print any warning
about leaks. Add "explicit" argument to skip info changing if the argument
is false.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 40a26908db..3cff25e3a4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -484,7 +484,7 @@ parallels_check_outside_image(BlockDriverState *bs, 
BdrvCheckResult *res,
 
 static int coroutine_fn GRAPH_RDLOCK
 parallels_check_leak(BlockDriverState *bs, BdrvCheckResult *res,
- BdrvCheckMode fix)
+ BdrvCheckMode fix, bool explicit)
 {
 BDRVParallelsState *s = bs->opaque;
 int64_t size;
@@ -499,10 +499,13 @@ parallels_check_leak(BlockDriverState *bs, 
BdrvCheckResult *res,
 if (size > res->image_end_offset) {
 int64_t count;
 count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
-fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
-fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
-size - res->image_end_offset);
-res->leaks += count;
+if (explicit) {
+fprintf(stderr,
+"%s space leaked at the end of the image %" PRId64 "\n",
+fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
+size - res->image_end_offset);
+res->leaks += count;
+}
 if (fix & BDRV_FIX_LEAKS) {
 Error *local_err = NULL;
 
@@ -517,7 +520,9 @@ parallels_check_leak(BlockDriverState *bs, BdrvCheckResult 
*res,
 res->check_errors++;
 return ret;
 }
-res->leaks_fixed += count;
+if (explicit) {
+res->leaks_fixed += count;
+}
 }
 }
 
@@ -570,7 +575,7 @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult 
*res,
 return ret;
 }
 
-ret = parallels_check_leak(bs, res, fix);
+ret = parallels_check_leak(bs, res, fix, true);
 if (ret < 0) {
 return ret;
 }
-- 
2.34.1




[PATCH v7 2/8] parallels: Check if data_end greater than the file size

2023-07-01 Thread Alexander Ivanov
Initially data_end is set to the data_off image header field and must not be
greater than the file size.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 86bc3bfcb8..40a26908db 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -868,6 +868,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
and actual data. We can't avoid read-modify-write... */
 s->header_size = size;
 }
+if (s->data_end > file_nb_sectors) {
+error_setg(errp, "Invalid image: incorrect data_off field");
+ret = -EINVAL;
+goto fail;
+}
 
 ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
 if (ret < 0) {
-- 
2.34.1




[PATCH v7 4/8] parallels: Add data_start field to BDRVParallelsState

2023-07-01 Thread Alexander Ivanov
In the next patch we will need the offset of the data area for host cluster
index calculation. Add this field and setting up code.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 7 ---
 block/parallels.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 3cff25e3a4..374c9d17eb 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -864,10 +864,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -ENOMEM;
 goto fail;
 }
-s->data_end = le32_to_cpu(ph.data_off);
-if (s->data_end == 0) {
-s->data_end = DIV_ROUND_UP(size, BDRV_SECTOR_SIZE);
+s->data_start = le32_to_cpu(ph.data_off);
+if (s->data_start == 0) {
+s->data_start = DIV_ROUND_UP(size, BDRV_SECTOR_SIZE);
 }
+s->data_end = s->data_start;
 if (s->data_end < (s->header_size >> BDRV_SECTOR_BITS)) {
 /* there is not enough unused space to fit to block align between BAT
and actual data. We can't avoid read-modify-write... */
diff --git a/block/parallels.h b/block/parallels.h
index f22f43f988..4e53e9572d 100644
--- a/block/parallels.h
+++ b/block/parallels.h
@@ -75,6 +75,7 @@ typedef struct BDRVParallelsState {
 uint32_t *bat_bitmap;
 unsigned int bat_size;
 
+int64_t  data_start;
 int64_t  data_end;
 uint64_t prealloc_size;
 ParallelsPreallocMode prealloc_mode;
-- 
2.34.1




[PATCH v7 1/8] parallels: Incorrect data end calculation in parallels_open()

2023-07-01 Thread Alexander Ivanov
The BDRVParallelsState structure contains data_end field that is measured
in sectors. In parallels_open() initially this field is set by data_off
field from parallels image header.

According to the parallels format documentation, data_off field contains
an offset, in sectors, from the start of the file to the start of the
data area. For "WithoutFreeSpace" images: if data_off is zero, the offset
is calculated as the end of the BAT table plus some padding to ensure
sector size alignment.

The parallels_open() function has code for handling zero value in
data_off, but in the result data_end contains the offset in bytes.

Replace the alignment to sector size by division by sector size and fix
the comparision with s->header_size.

Signed-off-by: Alexander Ivanov 
Reviewed-by: Denis V. Lunev 
Reviewed-by: Hanna Czenczek 
---
 block/parallels.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 18e34aef28..86bc3bfcb8 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -861,9 +861,9 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 s->data_end = le32_to_cpu(ph.data_off);
 if (s->data_end == 0) {
-s->data_end = ROUND_UP(bat_entry_off(s->bat_size), BDRV_SECTOR_SIZE);
+s->data_end = DIV_ROUND_UP(size, BDRV_SECTOR_SIZE);
 }
-if (s->data_end < s->header_size) {
+if (s->data_end < (s->header_size >> BDRV_SECTOR_BITS)) {
 /* there is not enough unused space to fit to block align between BAT
and actual data. We can't avoid read-modify-write... */
 s->header_size = size;
-- 
2.34.1




[PATCH v7 5/8] parallels: Add checking and repairing duplicate offsets in BAT

2023-07-01 Thread Alexander Ivanov
Cluster offsets must be unique among all the BAT entries. Find duplicate
offsets in the BAT and fix it by copying the content of the relevant
cluster to a newly allocated cluster and set the new cluster offset to the
duplicated entry.

Add host_cluster_index() helper to deduplicate the code.

When new clusters are allocated, the file size increases by 128 Mb. Call
parallels_check_leak() to fix this leak.

Signed-off-by: Alexander Ivanov 
---
 block/parallels.c | 144 ++
 1 file changed, 144 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 374c9d17eb..0f207c4b32 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -136,6 +136,12 @@ static int cluster_remainder(BDRVParallelsState *s, 
int64_t sector_num,
 return MIN(nb_sectors, ret);
 }
 
+static uint32_t host_cluster_index(BDRVParallelsState *s, int64_t off)
+{
+off -= s->data_start << BDRV_SECTOR_BITS;
+return off / s->cluster_size;
+}
+
 static int64_t block_status(BDRVParallelsState *s, int64_t sector_num,
 int nb_sectors, int *pnum)
 {
@@ -529,6 +535,139 @@ parallels_check_leak(BlockDriverState *bs, 
BdrvCheckResult *res,
 return 0;
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+parallels_check_duplicate(BlockDriverState *bs, BdrvCheckResult *res,
+  BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t host_off, host_sector, guest_sector;
+unsigned long *bitmap;
+uint32_t i, bitmap_size, cluster_index, bat_entry;
+int n, ret = 0;
+uint64_t *buf = NULL;
+bool fixed = false;
+
+/*
+ * Create a bitmap of used clusters.
+ * If a bit is set, there is a BAT entry pointing to this cluster.
+ * Loop through the BAT entries, check bits relevant to an entry offset.
+ * If bit is set, this entry is duplicated. Otherwise set the bit.
+ *
+ * We shouldn't worry about newly allocated clusters outside the image
+ * because they are created higher then any existing cluster pointed by
+ * a BAT entry.
+ */
+bitmap_size = host_cluster_index(s, res->image_end_offset);
+if (bitmap_size == 0) {
+return 0;
+}
+if (res->image_end_offset % s->cluster_size) {
+/* A not aligned image end leads to a bitmap shorter by 1 */
+bitmap_size++;
+}
+
+bitmap = bitmap_new(bitmap_size);
+
+buf = qemu_blockalign(bs, s->cluster_size);
+
+for (i = 0; i < s->bat_size; i++) {
+host_off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (host_off == 0) {
+continue;
+}
+
+cluster_index = host_cluster_index(s, host_off);
+assert(cluster_index < bitmap_size);
+if (!test_bit(cluster_index, bitmap)) {
+bitmap_set(bitmap, cluster_index, 1);
+continue;
+}
+
+/* this cluster duplicates another one */
+fprintf(stderr, "%s duplicate offset in BAT entry %u\n",
+fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+
+res->corruptions++;
+
+if (!(fix & BDRV_FIX_ERRORS)) {
+continue;
+}
+
+/*
+ * Reset the entry and allocate a new cluster
+ * for the relevant guest offset. In this way we let
+ * the lower layer to place the new cluster properly.
+ * Copy the original cluster to the allocated one.
+ * But before save the old offset value for repairing
+ * if we have an error.
+ */
+bat_entry = s->bat_bitmap[i];
+parallels_set_bat_entry(s, i, 0);
+
+ret = bdrv_co_pread(bs->file, host_off, s->cluster_size, buf, 0);
+if (ret < 0) {
+res->check_errors++;
+goto out_repare_bat;
+}
+
+guest_sector = (i * (int64_t)s->cluster_size) >> BDRV_SECTOR_BITS;
+host_sector = allocate_clusters(bs, guest_sector, s->tracks, );
+if (host_sector < 0) {
+res->check_errors++;
+goto out_repare_bat;
+}
+host_off = host_sector << BDRV_SECTOR_BITS;
+
+ret = bdrv_co_pwrite(bs->file, host_off, s->cluster_size, buf, 0);
+if (ret < 0) {
+res->check_errors++;
+goto out_repare_bat;
+}
+
+if (host_off + s->cluster_size > res->image_end_offset) {
+res->image_end_offset = host_off + s->cluster_size;
+}
+
+/*
+ * In the future allocate_cluster() will reuse holed offsets
+ * inside the image. Keep the used clusters bitmap content
+ * consistent for the new allocated clusters too.
+ *
+ * Note, clusters allocated outside the current image are not
+ * considered, and the bitmap size doesn't change.
+ */
+cluster_index = host_cluster_index(s, host_off);
+if (cluster_index < bitmap_size) {
+bitmap_set(bitmap, cluster_index, 1);
+}
+
+fixed = true;
+

[PATCH 2/4] hw/nvme: Fix ARI next function numbers

2023-07-01 Thread Akihiko Odaki
The next function numbers are expected to form a linked list ending with
0.

Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Signed-off-by: Akihiko Odaki 
---
 hw/nvme/ctrl.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index fd917fcda1..12500dc80b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8088,7 +8088,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
-pcie_ari_init(pci_dev, 0x100, 1);
+uint16_t nextvfn = pci_is_vf(pci_dev) ?
+   pcie_sriov_vf_number(pci_dev) + 1 : 0;
+uint16_t nextfn = nextvfn < n->params.sriov_max_vfs ?
+  NVME_VF_OFFSET + nextvfn * NVME_VF_STRIDE : 0;
+
+pcie_ari_init(pci_dev, 0x100, nextfn);
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-- 
2.41.0




[PATCH 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-01 Thread Akihiko Odaki
The next function numbers are expected to form a linked list ending with
0.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..cc4232e49a 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x70);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, fun_offset);
   ...
 
   /* Add and initialize the SR/IOV capability */
@@ -76,9 +76,10 @@ setting up a BAR for a VF.
pci_your_vf_dev_realize( ... )
{
   ...
+  uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
   int ret = pcie_endpoint_cap_init(d, 0x60);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, nextvfn < total_vfs ? fun_offset + nextvfn * 
stride : 0);
   ...
   memory_region_init(mr, ... )
   pcie_sriov_vf_register_bar(d, bar_nr, mr);
-- 
2.41.0




[PATCH 4/4] pci: Compare function number and ARI next function number

2023-07-01 Thread Akihiko Odaki
The function number must be lower than the next function number
advertised with ARI.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..568665ee42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 Error *local_err = NULL;
 bool is_default_rom;
 uint16_t class_id;
+uint16_t ari;
+uint16_t nextfn;
 
 /*
  * capped by systemd (see: udev-builtin-net_id.c)
@@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }
 
+if (pci_is_express(pci_dev)) {
+ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
+if (ari) {
+nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) 
& 0xff;
+if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
+error_setg(errp, "PCI: function number %u is not lower than 
ARI next function number %u",
+   pci_dev->devfn & 0xff, nextfn);
+pci_qdev_unrealize(DEVICE(pci_dev));
+return;
+}
+}
+}
+
 if (pci_dev->failover_pair_id) {
 if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
 error_setg(errp, "failover primary device must be on "
-- 
2.41.0




[PATCH 3/4] igb: Fix ARI next function numbers

2023-07-01 Thread Akihiko Odaki
The next function numbers are expected to form a linked list ending with
0.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb_core.h | 3 +++
 hw/net/igb.c  | 4 +---
 hw/net/igbvf.c| 5 -
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/net/igb_core.h b/hw/net/igb_core.h
index 9cbbfd516b..e1dab76995 100644
--- a/hw/net/igb_core.h
+++ b/hw/net/igb_core.h
@@ -49,6 +49,9 @@
 #define IGB_NUM_QUEUES  (16)
 #define IGB_NUM_VM_POOLS(8)
 
+#define IGB_VF_OFFSET   (0x80)
+#define IGB_VF_STRIDE   (2)
+
 typedef struct IGBCore IGBCore;
 
 enum { PHY_R = BIT(0),
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..543ca0114a 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -81,8 +81,6 @@ struct IGBState {
 };
 
 #define IGB_CAP_SRIOV_OFFSET(0x160)
-#define IGB_VF_OFFSET   (0x80)
-#define IGB_VF_STRIDE   (2)
 
 #define E1000E_MMIO_IDX 0
 #define E1000E_FLASH_IDX1
@@ -431,7 +429,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(pci_dev, 0x150, 1);
+pcie_ari_init(pci_dev, 0x150, IGB_VF_OFFSET);
 
 pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
index 284ea61184..bf2f237ab5 100644
--- a/hw/net/igbvf.c
+++ b/hw/net/igbvf.c
@@ -240,6 +240,9 @@ static const MemoryRegionOps mmio_ops = {
 static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
 {
 IgbVfState *s = IGBVF(dev);
+uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
+uint16_t nextfn = nextvfn < IGB_MAX_VF_FUNCTIONS ?
+  IGB_VF_OFFSET + nextvfn * IGB_VF_STRIDE : 0;
 int ret;
 int i;
 
@@ -270,7 +273,7 @@ static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(dev, 0x150, 1);
+pcie_ari_init(dev, 0x150, nextfn);
 }
 
 static void igbvf_pci_uninit(PCIDevice *dev)
-- 
2.41.0




[PATCH 0/4] pci: Compare function number and ARI next function number

2023-07-01 Thread Akihiko Odaki
The function number must be lower than the next function number
advertised with ARI. Add a check to enforce this.

I suggested this change at:
https://lore.kernel.org/qemu-devel/bf351f8b-1c8a-8a7a-7f44-17c9ba18f...@daynix.com/

Implementing this change, I found the devices implementing ARI do not set the
correct next function numbers, which is also fixed in this series.

Akihiko Odaki (4):
  docs: Fix next function numbers in SR/IOV documentation
  hw/nvme: Fix ARI next function numbers
  igb: Fix ARI next function numbers
  pci: Compare function number and ARI next function number

 docs/pcie_sriov.txt |  5 +++--
 hw/net/igb_core.h   |  3 +++
 hw/net/igb.c|  4 +---
 hw/net/igbvf.c  |  5 -
 hw/nvme/ctrl.c  |  7 ++-
 hw/pci/pci.c| 15 +++
 6 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.41.0