Re: [PATCH 25/42] drivers/scsi/ibmvscsi: Convert snprintf to sysfs_emit

2024-01-16 Thread Tyrel Datwyler
On 1/15/24 20:51, Li Zhijian wrote:
> Per filesystems/sysfs.rst, show() should only use sysfs_emit()
> or sysfs_emit_at() when formatting the value to be returned to user space.
> 
> coccinelle complains that there are still a couple of functions that use
> snprintf(). Convert them to sysfs_emit().
> 
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3483:8-16: WARNING: please use sysfs_emit
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3493:8-16: WARNING: please use sysfs_emit
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3503:8-16: WARNING: please use sysfs_emit
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3513:8-16: WARNING: please use sysfs_emit
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3522:8-16: WARNING: please use sysfs_emit
>> ./drivers/scsi/ibmvscsi/ibmvfc.c:3530:8-16: WARNING: please use sysfs_emit
> 
> No functional change intended
> 
> CC: Tyrel Datwyler 
> CC: Michael Ellerman 
> CC: Nicholas Piggin 
> CC: Christophe Leroy 
> CC: "Aneesh Kumar K.V" 
> CC: "Naveen N. Rao" 
> CC: "James E.J. Bottomley" 
> CC: "Martin K. Petersen" 
> CC: linux-s...@vger.kernel.org
> CC: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Li Zhijian 
> ---

Acked-by: Tyrel Datwyler 



Re: [PATCH 1/1] powerpc: fix a memory leak

2023-09-14 Thread Tyrel Datwyler
On 9/14/23 02:46, Yuanjun Gong wrote:
> When one of the methods xive_native_alloc_irq_on_chip, irq_create_mapping
> or irq_get_handler_data fails, the function will directly return without
> disposing vinst->name and vinst. Fix it.
> 
> Fixes: c20e1e299d93 ("powerpc/vas: Alloc and setup IRQ and trigger port 
> address")
> Signed-off-by: Yuanjun Gong 
> ---
>  arch/powerpc/platforms/powernv/vas.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/vas.c 
> b/arch/powerpc/platforms/powernv/vas.c
> index b65256a63e87..780740b478f0 100644
> --- a/arch/powerpc/platforms/powernv/vas.c
> +++ b/arch/powerpc/platforms/powernv/vas.c
> @@ -54,7 +54,7 @@ static int init_vas_instance(struct platform_device *pdev)
>   struct xive_irq_data *xd;
>   uint32_t chipid, hwirq;
>   struct resource *res;
> - int rc, cpu, vasid;
> + int rc, cpu, vasid, ret;

You can you reuse rc for the return value in the error path instead of
introducing a new ret variable.

-Tyrel

> 
>   rc = of_property_read_u32(dn, "ibm,vas-id", &vasid);
>   if (rc) {
> @@ -102,6 +102,7 @@ static int init_vas_instance(struct platform_device *pdev)
>   res = &pdev->resource[3];
>   if (res->end > 62) {
>   pr_err("Bad 'paste_win_id_shift' in DT, %llx\n", res->end);
> + ret = -ENODEV
>   goto free_vinst;
>   }
> 
> @@ -111,21 +112,24 @@ static int init_vas_instance(struct platform_device 
> *pdev)
>   if (!hwirq) {
>   pr_err("Inst%d: Unable to allocate global irq for chip %d\n",
>   vinst->vas_id, chipid);
> - return -ENOENT;
> + ret = -ENOENT;
> + goto free_vinst;
>   }
> 
>   vinst->virq = irq_create_mapping(NULL, hwirq);
>   if (!vinst->virq) {
>   pr_err("Inst%d: Unable to map global irq %d\n",
>   vinst->vas_id, hwirq);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto free_vinst;
>   }
> 
>   xd = irq_get_handler_data(vinst->virq);
>   if (!xd) {
>   pr_err("Inst%d: Invalid virq %d\n",
>   vinst->vas_id, vinst->virq);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto free_vinst;
>   }
> 
>   vinst->irq_port = xd->trig_page;
> @@ -168,7 +172,7 @@ static int init_vas_instance(struct platform_device *pdev)
>  free_vinst:
>   kfree(vinst->name);
>   kfree(vinst);
> - return -ENODEV;
> + return ret;
> 
>  }
> 



Re: [PATCH] scsi: ibmvscsi: Replace all non-returning strlcpy with strscpy

2023-05-17 Thread Tyrel Datwyler
On 5/17/23 07:34, Azeem Shaikh wrote:
> strlcpy() reads the entire source buffer first.
> This read may exceed the destination size limit.
> This is both inefficient and can lead to linear read
> overflows if a source string is not NUL-terminated [1].
> In an effort to remove strlcpy() completely [2], replace
> strlcpy() here with strscpy().
> No return values were used, so direct replacement is safe.
> 
> [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
> [2] https://github.com/KSPP/linux/issues/89
> 
> Signed-off-by: Azeem Shaikh 

Acked-by: Tyrel Datwyler 



Re: [PATCH] powerpc: kernel: pci_dn: Add missing of_node_put() for of_get_xx API

2022-07-01 Thread Tyrel Datwyler
On 7/1/22 12:47, Tyrel Datwyler wrote:
> On 7/1/22 06:17, Liang He wrote:
>> In pci_add_device_node_info(), we should use of_node_put() for the
>> reference 'parent' returned by of_get_parent() to keep refcount
>> balance.
>>
>> Fixes: cca87d303c85 ("powerpc/pci: Refactor pci_dn")
>> Co-authored-by: Miaoqian Lin 
>> Signed-off-by: Liang He 
>> ---
>>  arch/powerpc/kernel/pci_dn.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>> index 938ab8838ab5..aa221958007e 100644
>> --- a/arch/powerpc/kernel/pci_dn.c
>> +++ b/arch/powerpc/kernel/pci_dn.c
>> @@ -330,6 +330,7 @@ struct pci_dn *pci_add_device_node_info(struct 
>> pci_controller *hose,
>>  INIT_LIST_HEAD(&pdn->list);
>>  parent = of_get_parent(dn);
>>  pdn->parent = parent ? PCI_DN(parent) : NULL;
> NACK
> 
> pdn->parent is now a long term reference so we should not do a put on parent
> until we pdn->parent is no longer valid.

I withdraw my NACK. On closer inspection pdn->parent is a reference to the
parent struct pci_dn and not a reference to a parent struct device_node.

Reviewed-by: Tyrel Datwyler 

> 
> -Tyrel
> 
>> +of_node_put(parent);
>>  if (pdn->parent)
>>  list_add_tail(&pdn->list, &pdn->parent->child_list);
>>
> 



Re: [PATCH] powerpc: kernel: pci_dn: Add missing of_node_put() for of_get_xx API

2022-07-01 Thread Tyrel Datwyler
On 7/1/22 06:17, Liang He wrote:
> In pci_add_device_node_info(), we should use of_node_put() for the
> reference 'parent' returned by of_get_parent() to keep refcount
> balance.
> 
> Fixes: cca87d303c85 ("powerpc/pci: Refactor pci_dn")
> Co-authored-by: Miaoqian Lin 
> Signed-off-by: Liang He 
> ---
>  arch/powerpc/kernel/pci_dn.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 938ab8838ab5..aa221958007e 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -330,6 +330,7 @@ struct pci_dn *pci_add_device_node_info(struct 
> pci_controller *hose,
>   INIT_LIST_HEAD(&pdn->list);
>   parent = of_get_parent(dn);
>   pdn->parent = parent ? PCI_DN(parent) : NULL;
NACK

pdn->parent is now a long term reference so we should not do a put on parent
until we pdn->parent is no longer valid.

-Tyrel

> + of_node_put(parent);
>   if (pdn->parent)
>   list_add_tail(&pdn->list, &pdn->parent->child_list);
> 



[PATCH] ibmvfc: alloc/free queue resource only during probe/remove

2022-06-16 Thread Tyrel Datwyler
Currently, the sub-queues and event pool resources are alloc/free'd
for every CRQ connection event such as reset and LPM. This exposes the driver to
a couple issues. First the inefficiency of freeing and reallocating memory that
can simply be resued after being sanitized. Further, a system under memory
pressue runs the risk of allocation failures that could result in a cripled
driver. Finally, there is a race window where command submission/compeletion can
try to pull/return elements from/to an event pool that is being deleted or
already has been deleted due to the lack of host state around freeing/allocating
resources. The following is an example of list corruption following a live
partition migration (LPM):

Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vfat fat isofs cdrom ext4 mbcache jbd2 nft_counter 
nft_compat nf_tables nfnetlink rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs 
lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng drm 
drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg 
ibmvfc scsi_transport_fc ibmveth vmx_crypto dm_multipath dm_mirror 
dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
CPU: 0 PID: 2108 Comm: ibmvfc_0 Kdump: loaded Not tainted 
5.14.0-70.9.1.el9_0.ppc64le #1
NIP: c07c4bb0 LR: c07c4bac CTR: 005b9a10
REGS: c0025c10b760 TRAP: 0700  Not tainted (5.14.0-70.9.1.el9_0.ppc64le)
MSR: 8282b033  CR: 2800028f XER: 
000f
CFAR: c01f55bc IRQMASK: 0
GPR00: c07c4bac c0025c10ba00 c2a47c00 
004e
GPR04: c031e3006f88 c031e308bd00 c0025c10b768 
0027
GPR08:  c031e3009dc0 0031e0eb 

GPR12: c031e2a8 c2dd c0187108 
c0020fcee2c0
GPR16:    

GPR20:    
c00802f81300
GPR24: 5deadbeef100 5deadbeef122 c00263ba6910 
c0024cc88000
GPR28: 003c c002430a c002430ac300 
c300
NIP [c07c4bb0] __list_del_entry_valid+0x90/0x100
LR [c07c4bac] __list_del_entry_valid+0x8c/0x100
Call Trace:
[c0025c10ba00] [c07c4bac] __list_del_entry_valid+0x8c/0x100 
(unreliable)
[c0025c10ba60] [c00802f42284] ibmvfc_free_queue+0xec/0x210 [ibmvfc]
[c0025c10bb10] [c00802f4246c] ibmvfc_deregister_scsi_channel+0xc4/0x160 
[ibmvfc]
[c0025c10bba0] [c00802f42580] ibmvfc_release_sub_crqs+0x78/0x130 
[ibmvfc]
[c0025c10bc20] [c00802f4f6cc] ibmvfc_do_work+0x5c4/0xc70 [ibmvfc]
[c0025c10bce0] [c00802f4fdec] ibmvfc_work+0x74/0x1e8 [ibmvfc]
[c0025c10bda0] [c01872b8] kthread+0x1b8/0x1c0
[c0025c10be10] [c000cd64] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
40820034 3861 38210060 4e800020 7c0802a6 7c641b78 3c62fe7a 7d254b78
3863b590 f8010070 4ba309cd 6000 <0fe0> 7c0802a6 3c62fe7a 3863b640
---[ end trace 11a2b65a92f8b66c ]---
ibmvfc 3003: Send warning. Receive queue closed, will retry.

Add registration/deregistartion helpers that are called instead during
connection resets to sanitize and reconfigure the queues.

Fixes: 3034ebe26389 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 79 ++
 1 file changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 9fc0ffda6ae8..00684e11976b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -160,8 +160,8 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *);
 static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
 static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
-static void ibmvfc_release_sub_crqs(struct ibmvfc_host *);
-static void ibmvfc_init_sub_crqs(struct ibmvfc_host *);
+static void ibmvfc_dereg_sub_crqs(struct ibmvfc_host *);
+static void ibmvfc_reg_sub_crqs(struct ibmvfc_host *);
 
 static const char *unknown_error = "unknown error";
 
@@ -917,7 +917,7 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
struct vio_dev *vdev = to_vio_dev(vhost->dev);
unsigned long flags;
 
-   ibmvfc_release_sub_crqs(vhost);
+   ibmvfc_dereg_sub_crqs(vhost);
 
/* Re-enable the CRQ */
do {
@@ -936,7 +936,7 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
-   ibmvfc_init_sub_crqs(vhost);
+   ibmvfc_reg_sub_crqs(vhost);
 
return rc;
 }
@@ -955,7 +

[PATCH] ibmvfc: store vhost pointer during subcrq allocation

2022-06-16 Thread Tyrel Datwyler
Currently the back pointer from a queue to the vhost adapter isn't set
until after subcrq interrupt registration. The value is available when a queue
is first allocated and can/should be also set for primary and async queues as
well as subcrq's.

This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS
interrupt controller where a pending subcrq interrupt from the previous kernel
can be replayed immediately upon IRQ registration resulting in dereference of a
garbage backpointer in ibmvfc_interrupt_scsi.

Kernel attempted to read user page (58) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x0058
Faulting instruction address: 0xc00803216a08
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c00803216a08] ibmvfc_interrupt_scsi+0x40/0xb0 [ibmvfc]
LR [c82079e8] __handle_irq_event_percpu+0x98/0x270
Call Trace:
[c00047fa3d80] [c000123e6180] 0xc000123e6180 (unreliable)
[c00047fa3df0] [c82079e8] __handle_irq_event_percpu+0x98/0x270
[c00047fa3ea0] [c8207d18] handle_irq_event+0x98/0x188
[c00047fa3ef0] [c820f564] handle_fasteoi_irq+0xc4/0x310
[c00047fa3f40] [c8205c60] generic_handle_irq+0x50/0x80
[c00047fa3f60] [c8015c40] __do_irq+0x70/0x1a0
[c00047fa3f90] [c8016d7c] __do_IRQ+0x9c/0x130
[c00014622f60] [2000] 0x2000
[c00014622ff0] [c8016e50] do_IRQ+0x40/0xa0
[c00014623020] [c8017044] replay_soft_interrupts+0x194/0x2f0
[c00014623210] [c80172a8] arch_local_irq_restore+0x108/0x170
[c00014623240] [c8eb1008] _raw_spin_unlock_irqrestore+0x58/0xb0
[c00014623270] [c820b12c] __setup_irq+0x49c/0x9f0
[c00014623310] [c820b7c0] request_threaded_irq+0x140/0x230
[c00014623380] [c00803212a50] ibmvfc_register_scsi_channel+0x1e8/0x2f0 
[ibmvfc]
[c00014623450] [c00803213d1c] ibmvfc_init_sub_crqs+0xc4/0x1f0 [ibmvfc]
[c000146234d0] [c008032145a8] ibmvfc_reset_crq+0x150/0x210 [ibmvfc]
[c00014623550] [c008032147c8] ibmvfc_init_crq+0x160/0x280 [ibmvfc]
[c000146235f0] [c0080321a9cc] ibmvfc_probe+0x2a4/0x530 [ibmvfc]

Fixes: 3034ebe263897 ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI 
Sub-CRQ Channels")
Cc: sta...@vger.kernel.org
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 3 ++-
 drivers/scsi/ibmvscsi/ibmvfc.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d0eab5700dc5..9fc0ffda6ae8 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5682,6 +5682,8 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
queue->cur = 0;
queue->fmt = fmt;
queue->size = PAGE_SIZE / fmt_size;
+
+   queue->vhost = vhost;
return 0;
 }
 
@@ -5790,7 +5792,6 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
}
 
scrq->hwq_id = index;
-   scrq->vhost = vhost;
 
LEAVE;
return 0;
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 3718406e0988..c39a245f43d0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -789,6 +789,7 @@ struct ibmvfc_queue {
spinlock_t _lock;
spinlock_t *q_lock;
 
+   struct ibmvfc_host *vhost;
struct ibmvfc_event_pool evt_pool;
struct list_head sent;
struct list_head free;
@@ -797,7 +798,6 @@ struct ibmvfc_queue {
union ibmvfc_iu cancel_rsp;
 
/* Sub-CRQ fields */
-   struct ibmvfc_host *vhost;
unsigned long cookie;
unsigned long vios_cookie;
unsigned long hw_irq;
-- 
2.35.3



[PATCH] ibmvfc: multiqueue bug fixes

2022-06-16 Thread Tyrel Datwyler
Fixes for a couple observed crashes of the ibmvfc driver when in MQ mode.

Tyrel Datwyler (2):
  ibmvfc: store vhost pointer during subcrq allocation
  ibmvfc: alloc/free queue resource only during probe/remove

 drivers/scsi/ibmvscsi/ibmvfc.c | 82 ++
 drivers/scsi/ibmvscsi/ibmvfc.h |  2 +-
 2 files changed, 65 insertions(+), 19 deletions(-)

-- 
2.35.3



Re: [PATCH v2 0/4] pseries-wdt: initial support for H_WATCHDOG-based watchdog timers

2022-06-16 Thread Tyrel Datwyler
On 6/15/22 18:43, Daniel Henrique Barboza wrote:
> Hi,
> 
> I tried this series out with mainline QEMU built with Alexey's patch [1]
> and I wasn't able to get it to work. I'm using a simple QEMU command line
> booting a fedora36 guest in a Power9 boston host:

I would assume the H_WATCHDOG hypercall is not implemented by the qemu pseries
machine type. It is a very new hypercall.

-Tyrel

> 
> sudo  ./qemu-system-ppc64 \
> -M
> pseries,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,ic-mode=dual
> \
> -m 4G -accel kvm -cpu POWER9 -smp 1,maxcpus=1,threads=1,cores=1,sockets=1 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 \
> -drive
> file=/home/danielhb/fedora36.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
> \
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
> \
> -device qemu-xhci,id=usb,bus=pci.0,addr=0x4 -nographic -display none
> 
> 
> Guest is running v5.19-rc2 with this series applied. Kernel config consists of
> 'pseries_le_defconfig' plus the following 'watchdog' related changes:
> 
> [root@fedora ~]# cat linux/.config | grep PSERIES_WDT
> CONFIG_PSERIES_WDT=y
> 
> [root@fedora ~]# cat linux/.config | grep -i watchdog
> CONFIG_PPC_WATCHDOG=y
> CONFIG_HAVE_NMI_WATCHDOG=y
> CONFIG_WATCHDOG=y
> CONFIG_WATCHDOG_CORE=y
> CONFIG_WATCHDOG_NOWAYOUT=y
> CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
> CONFIG_WATCHDOG_OPEN_TIMEOUT=0
> # CONFIG_WATCHDOG_SYSFS is not set
> # CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set
> # Watchdog Pretimeout Governors
> # CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set
> # Watchdog Device Drivers
> # CONFIG_SOFT_WATCHDOG is not set
> # CONFIG_XILINX_WATCHDOG is not set
> # CONFIG_ZIIRAVE_WATCHDOG is not set
> # CONFIG_CADENCE_WATCHDOG is not set
> # CONFIG_DW_WATCHDOG is not set
> # CONFIG_MAX63XX_WATCHDOG is not set
> CONFIG_WATCHDOG_RTAS=y
> # PCI-based Watchdog Cards
> # CONFIG_PCIPCWATCHDOG is not set
> # USB-based Watchdog Cards
> # CONFIG_USBPCWATCHDOG is not set
> # CONFIG_WQ_WATCHDOG is not set
> [root@fedora ~]#
> 
> 
> 
> Kernel command line:
> 
> [root@fedora ~]# cat /proc/cmdline
> BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-5.19.0-rc2-00054-g12ede8ffb103 \
> root=/dev/mapper/fedora_fedora-root ro rd.lvm.lv=fedora_fedora/root \
> pseries-wdt.timeout=60 pseries-wdt.nowayout=1 pseries-wdt.action=2
> 
> 
> With all that, executing
> 
> echo V > /dev/watchdog0
> 
> Does nothing. dmesg is clean and the guest doesn't reboot after the 60 sec
> timeout.  I also tried with PSERIES_WDT being compiled as a module instead
> of built-in. Same results.
> 
> 
> What am I missing?
> 
> 
> [1]
> https://patchwork.ozlabs.org/project/qemu-ppc/patch/20220608030153.1862335-1-...@ozlabs.ru/
> 
> 
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> 
> On 6/2/22 14:53, Scott Cheloha wrote:
>> PAPR v2.12 defines a new hypercall, H_WATCHDOG.  This patch series
>> adds support for this hypercall to powerpc/pseries kernels and
>> introduces a new watchdog driver, "pseries-wdt", for the virtual
>> timers exposed by the hypercall.
>>
>> This series is preceded by the following:
>>
>> RFC v1:
>> https://lore.kernel.org/linux-watchdog/20220413165104.179144-1-chel...@linux.ibm.com/
>>
>> RFC v2:
>> https://lore.kernel.org/linux-watchdog/20220509174357.5448-1-chel...@linux.ibm.com/
>>
>> PATCH v1:
>> https://lore.kernel.org/linux-watchdog/20220520183552.33426-1-chel...@linux.ibm.com/
>>
>>
>> Changes of note from PATCH v1:
>>
>> - Trim down the large comment documenting the H_WATCHDOG hypercall.
>>    The comment is likely to rot, so remove anything we aren't using
>>    and anything overly obvious.
>>
>> - Remove any preprocessor definitions not actually used in the module
>>    right now.  If we want to use other features offered by the hypercall
>>    we can add them in later.  They're just clutter until then.
>>
>> - Simplify the "action" module parameter.  The value is now an index
>>    into an array of possible timeoutAction values.  This design removes
>>    the need for the custom get/set methods used in PATCH v1.
>>
>>    Now we merely need to check that the "action" value is a valid
>>    index during pseries_wdt_probe().  Easy.
>>
>> - Make the timeoutAction a member of pseries_wdt, "action".  This
>>    eliminates the use of a global variable during pseries_wdt_start().
>>
>> - Use watchdog_init_timeout() idiomatically.  Check its return value
>>    and error out of pseries_wdt_probe() if it fails.
>>
>>



Re: [PATCH] powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address

2022-06-14 Thread Tyrel Datwyler
On 6/14/22 06:49, Andrew Donnellan wrote:
> Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS
> call through the RTAS filter if the buffer address is 0.
> 
> According to PAPR, ibm,platform-dump is called with a null buffer address
> to notify the platform firmware that processing of a particular dump is
> finished.
> 
> Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an
> application such as rtas_errd that is attempting to retrieve a dump will
> encounter an error at the end of the retrieval process.
> 
> Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
> Cc: sta...@vger.kernel.org
> Reported-by: Sathvika Vasireddy 
> Signed-off-by: Andrew Donnellan 

Similar to what is done for ibm,configure-connector with idx_buf2 and a NULL
address.

Reviewed-by: Tyrel Datwyler 

> ---
>  arch/powerpc/kernel/rtas.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index a6fce3106e02..693133972294 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1071,7 +1071,7 @@ static struct rtas_filter rtas_filters[] 
> __ro_after_init = {
>   { "get-time-of-day", -1, -1, -1, -1, -1 },
>   { "ibm,get-vpd", -1, 0, -1, 1, 2 },
>   { "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
> - { "ibm,platform-dump", -1, 4, 5, -1, -1 },
> + { "ibm,platform-dump", -1, 4, 5, -1, -1 },  /* Special 
> cased */
>   { "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
>   { "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
>   { "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
> @@ -1120,6 +1120,15 @@ static bool block_rtas_call(int token, int nargs,
>   size = 1;
> 
>   end = base + size - 1;
> +
> + /*
> +  * Special case for ibm,platform-dump - NULL buffer
> +  * address is used to indicate end of dump processing
> +  */
> + if (!strcmp(f->name, "ibm,platform-dump") &&
> + base == 0)
> + return false;
> +
>   if (!in_rmo_buf(base, end))
>   goto err;
>   }



Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain

2022-06-09 Thread Tyrel Datwyler
On 6/9/22 13:21, Guilherme G. Piccoli wrote:
> First of all, thanks for looping me Bjorn! Much appreciated.
> I'm also CCing Ben and Gavin, that know a lot of PPC PCI stuff.
> 
> 
> On 09/06/2022 16:34, Bjorn Helgaas wrote:
>> [...]
>>> Upgrading powerpc kernels from LTS 4.4 version (which does not
>>> contain mentioned commit) to new LTS versions brings a
>>> regression in domain assignment.
>>
>> Can you elaborate why it is a regression ?
>> 63a72284b159 That commit says 'no functionnal changes', I'm
>> having hard time understanding how a nochange can be a
>> regression.
>
> It is not 'no functional change'. That commit completely changed
> PCI domain assignment in a way that is incompatible with other
> architectures and also incompatible with the way how it was done
> prior that commit.

 I agree that the "no functional change" statement is incorrect.
 However, for most powerpc platforms it ended up being simply a
 cosmetic behavior change. As far as I can tell there is nothing
 requiring domain ids to increase montonically from zero or that
 each architecture is required to use the same domain numbering
 scheme.
> 
> Strongly agree here in both points: first, this was not a "no functional
> change" thing, and I apologize for adding this in the commit message.
> What I meant is that: despite changing the numbering, (as Tyrel said)
> nothing should require increasing monotonic mutable PCI domains. At
> least, I'm not aware of such requirement in any spec or even in the
> kernel and adjacent tooling.
> 
> 
>>> [...]
 We could properly limit it to powernv and pseries by using
 ibm,fw-phb-id instead of reg property in the look up that follows
 a failed ibm,opal-phbid lookup. I think this is acceptable as long
 as no other powerpc platforms have started using this behavior for
 persistent naming.
>>>
>>> And what about setting that new config option to enabled by default
>>> for those series?
> 
> I don't remember all the details about PPC dt, but it should already be
> restricted to pseries/powernv, right? At least, the commit has a comment:
> 
> /* If not pseries nor powernv, or if fixed PHB numbering tried to add
>  * the same PHB number twice, then fallback to dynamic PHB numbering.*/
> 
> If this is *not* restricted to these 2 platforms, I agree with Pali's
> approach, although I'd consider the correct is to keep the persistent
> domain scheme for both pnv and pseries, as it's working like this for 5
> years and counting, and this *does* prevent a lot of issues with PCI
> hotplugging in PPC servers.

I mentioned this in a previous post, but it is clear the Author's intent was for
this only to apply to powernv and pseries platforms. However, it only really
checks for powernv, and if that fails it does a read the reg property for the
domain which works for and PPC platform. If we really only want this on powernv
and pseries and revert all other PPC platforms back we can fix this with a
pseries check instead of a config property. Using ibm,fw-phb-id instead of reg
property if ibm,opal-phbid lookup fails does the trick.

-Tyrel

> 
> 
>>> [...]
 I forgot to ask before about the actual regression here.  The commit
 log says domain numbers are different, but I don't know the connection
 from there to something failing.  I assume there's some script or
 config file that depends on specific domain numbers?  And that
 dependency is (hopefully) powerpc-specific?
>>>
>>> You assume correct. For example this is the way how OpenWRT handles PCI
>>> devices (but not only OpenWRT). This OpenWRT case is not
>>> powerpc-specific but generic to all architectures. This is just one
>>> example.
>>
>> So basically everybody uses D/b/d/f for persistent names.  That's ...
>> well, somewhat stable for things soldered down or in a motherboard
>> slot, but a terrible idea for things that can be hot-plugged.
>>
>> Even for more core things, it's possible for firmware to change bus
>> numbering between boots.  For example, if a complicated hierarchy is
>> cold-plugged into one slot, firmware is likely to assign different bus
>> numbers on the next boot to make room for it.  Obviously this can also
>> happen as a hot-add, and Linux needs the flexibility to do similar
>> renumbering then, although we don't support it yet.
>>
>> It looks like 63a72284b159 was intended to make domain numbers *more*
>> consistent, so it's ironic that this actually broke something by
>> changing domain numbers.  Maybe there's a way to limit the scope of
>> 63a72284b159 so it avoids the breakage.  I don't know enough about the
>> powerpc landscape to even guess at how.
> 
> I don't considereit breaks the userspace since this is definitely no
> stable ABI (or if it is, I'm not aware and my apologies). If scripts
> rely on 

Re: [PATCH v2 2/4] of: dynamic: add of_property_alloc() and of_property_free()

2022-06-02 Thread Tyrel Datwyler
On 6/1/22 23:58, Clément Léger wrote:
> Le Wed, 1 Jun 2022 15:32:29 -0700,
> Tyrel Datwyler  a écrit :
> 
>>>  /**
>>> - * __of_prop_dup - Copy a property dynamically.
>>> - * @prop:  Property to copy
>>> + * of_property_free - Free a property allocated dynamically.
>>> + * @prop:  Property to be freed
>>> + */
>>> +void of_property_free(const struct property *prop)
>>> +{
>>> +   if (!of_property_check_flag(prop, OF_DYNAMIC))
>>> +   return;
>>> +  
>>
>> This looks wrong to me. From what I understand the value data is allocated as
>> trailing memory that is part of the property allocation itself. (ie. prop =
>> kzalloc(sizeof(*prop) + len, allocflags)). So, kfree(prop) should also take 
>> care
>> of the trailing value data. Calling kfree(prop->value) is bogus since
>> prop->value wasn't dynamically allocated on its own.
> 
> kfree(prop->value) is only called if the value is not the trailing data
> of the property so I don't see what is wrong there. In that case, only
> kfree(prop) is called.

Right, Rob clarified for me in the v1 patch.

> 
>>
>> Also, this condition will always fail. You explicitly set prop->value = prop 
>> + 1
>> in alloc.
> 
> The user that did allocated the property might want to provide its own
> "value". In that case, prop->value would be overwritten by the user
> allocated value and thus the check would be true, hence calling
> kfree(prop->value).

So, that was the part I was missing. I think a comment would be helpful so its
clear value can be either trailing or user assigned.

-Tyrel

> 
>>
>> Maybe I need to go back and look at v1 again.
>>
>> -Tyrel
>>
>>> +   if (prop->value != prop + 1)
>>> +   kfree(prop->value);
>>> +
>>> +   kfree(prop->name);
>>> +   kfree(prop);
>>> +}
>>> +EXPORT_SYMBOL(of_property_free);
>>> +
> 
> 



Re: [PATCH 1/3] of: dynamic: add of_property_alloc() and of_property_free()

2022-06-02 Thread Tyrel Datwyler
On 6/2/22 07:06, Rob Herring wrote:
> On Wed, Jun 1, 2022 at 5:31 PM Tyrel Datwyler  wrote:
>>
>> On 5/5/22 12:37, Rob Herring wrote:
>>> On Wed, May 04, 2022 at 05:40:31PM +0200, Clément Léger wrote:
>>>> Add function which allows to dynamically allocate and free properties.
>>>> Use this function internally for all code that used the same logic
>>>> (mainly __of_prop_dup()).
>>>>
>>>> Signed-off-by: Clément Léger 
>>>> ---
>>>>  drivers/of/dynamic.c | 101 ++-
>>>>  include/linux/of.h   |  16 +++
>>>>  2 files changed, 88 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>>>> index cd3821a6444f..e8700e509d2e 100644
>>>> --- a/drivers/of/dynamic.c
>>>> +++ b/drivers/of/dynamic.c
>>>> @@ -313,9 +313,7 @@ static void property_list_free(struct property 
>>>> *prop_list)
>>>>
>>>>  for (prop = prop_list; prop != NULL; prop = next) {
>>>>  next = prop->next;
>>>> -kfree(prop->name);
>>>> -kfree(prop->value);
>>>> -kfree(prop);
>>>> +of_property_free(prop);
>>>>  }
>>>>  }
>>>>
>>>> @@ -367,48 +365,95 @@ void of_node_release(struct kobject *kobj)
>>>>  }
>>>>
>>>>  /**
>>>> - * __of_prop_dup - Copy a property dynamically.
>>>> - * @prop:   Property to copy
>>>> + * of_property_free - Free a property allocated dynamically.
>>>> + * @prop:   Property to be freed
>>>> + */
>>>> +void of_property_free(const struct property *prop)
>>>> +{
>>>> +kfree(prop->value);
>>>> +kfree(prop->name);
>>>> +kfree(prop);
>>>> +}
>>>> +EXPORT_SYMBOL(of_property_free);
>>>> +
>>>> +/**
>>>> + * of_property_alloc - Allocate a property dynamically.
>>>> + * @name:   Name of the new property
>>>> + * @value:  Value that will be copied into the new property value
>>>> + * @value_len:  length of @value to be copied into the new property 
>>>> value
>>>> + * @len:Length of new property value, must be greater than @value_len
>>>
>>> What's the usecase for the lengths being different? That doesn't seem
>>> like a common case, so perhaps handle it with a NULL value and
>>> non-zero length. Then the caller has to deal with populating
>>> prop->value.
>>>
>>>>   * @allocflags: Allocation flags (typically pass GFP_KERNEL)
>>>>   *
>>>> - * Copy a property by dynamically allocating the memory of both the
>>>> + * Create a property by dynamically allocating the memory of both the
>>>>   * property structure and the property name & contents. The property's
>>>>   * flags have the OF_DYNAMIC bit set so that we can differentiate between
>>>>   * dynamically allocated properties and not.
>>>>   *
>>>>   * Return: The newly allocated property or NULL on out of memory error.
>>>>   */
>>>> -struct property *__of_prop_dup(const struct property *prop, gfp_t 
>>>> allocflags)
>>>> +struct property *of_property_alloc(const char *name, const void *value,
>>>> +   int value_len, int len, gfp_t allocflags)
>>>>  {
>>>> -struct property *new;
>>>> +int alloc_len = len;
>>>> +struct property *prop;
>>>> +
>>>> +if (len < value_len)
>>>> +return NULL;
>>>>
>>>> -new = kzalloc(sizeof(*new), allocflags);
>>>> -if (!new)
>>>> +prop = kzalloc(sizeof(*prop), allocflags);
>>>> +if (!prop)
>>>>  return NULL;
>>>>
>>>> +prop->name = kstrdup(name, allocflags);
>>>> +if (!prop->name)
>>>> +goto out_err;
>>>> +
>>>>  /*
>>>> - * NOTE: There is no check for zero length value.
>>>> - * In case of a boolean property, this will allocate a value
>>>> - * of zero bytes. We do this to work around the use
>>>> - * of of_get_property() calls on boolean values.
>>>> + * Even if the property has no value, it must be set to a
>>>> + * non-null value since of_get_property() is used to check
>>>> + * some values that might or not have a values (ranges for
>>>> + * instance). Moreover, when the node is released, prop->value
>>>> + * is kfreed so the memory must come from kmalloc.
>>>
>>> Allowing for NULL value didn't turn out well...
>>>
>>> We know that we can do the kfree because OF_DYNAMIC is set IIRC...
>>>
>>> If we do 1 allocation for prop and value, then we can test
>>> for "prop->value == prop + 1" to determine if we need to free or not.
>>
>> If its a single allocation do we even need a test? Doesn't kfree(prop) take 
>> care
>> of the property and the trailing memory allocated for the value?
> 
> Yes, it does when it's a single alloc, but it's testing for when
> prop->value is not a single allocation because we could have either.
> 

Ok, that is the part I was missing. Thanks for the clarification.

-Tyrel



Re: [PATCH v2 2/4] of: dynamic: add of_property_alloc() and of_property_free()

2022-06-01 Thread Tyrel Datwyler
On 6/1/22 01:17, Clément Léger wrote:
> Add function which allows to dynamically allocate and free properties.
> Use this function internally for all code that used the same logic
> (mainly __of_prop_dup()).
> 
> Signed-off-by: Clément Léger 
> ---
>  drivers/of/dynamic.c| 82 -
>  drivers/of/of_private.h | 21 ++-
>  include/linux/of.h  | 14 +++
>  3 files changed, 82 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index cd3821a6444f..c0dcbea31d28 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -313,9 +313,7 @@ static void property_list_free(struct property *prop_list)
> 
>   for (prop = prop_list; prop != NULL; prop = next) {
>   next = prop->next;
> - kfree(prop->name);
> - kfree(prop->value);
> - kfree(prop);
> + of_property_free(prop);
>   }
>  }
> 
> @@ -367,48 +365,66 @@ void of_node_release(struct kobject *kobj)
>  }
> 
>  /**
> - * __of_prop_dup - Copy a property dynamically.
> - * @prop:Property to copy
> + * of_property_free - Free a property allocated dynamically.
> + * @prop:Property to be freed
> + */
> +void of_property_free(const struct property *prop)
> +{
> + if (!of_property_check_flag(prop, OF_DYNAMIC))
> + return;
> +

This looks wrong to me. From what I understand the value data is allocated as
trailing memory that is part of the property allocation itself. (ie. prop =
kzalloc(sizeof(*prop) + len, allocflags)). So, kfree(prop) should also take care
of the trailing value data. Calling kfree(prop->value) is bogus since
prop->value wasn't dynamically allocated on its own.

Also, this condition will always fail. You explicitly set prop->value = prop + 1
in alloc.

Maybe I need to go back and look at v1 again.

-Tyrel

> + if (prop->value != prop + 1)
> + kfree(prop->value);
> +
> + kfree(prop->name);
> + kfree(prop);
> +}
> +EXPORT_SYMBOL(of_property_free);
> +
> +/**
> + * of_property_alloc - Allocate a property dynamically.
> + * @name:Name of the new property
> + * @value:   Value that will be copied into the new property value or NULL
> + *   if only @len allocation is needed.
> + * @len: Length of new property value and if @value is provided, the
> + *   length of the value to be copied
>   * @allocflags:  Allocation flags (typically pass GFP_KERNEL)
>   *
> - * Copy a property by dynamically allocating the memory of both the
> + * Create a property by dynamically allocating the memory of both the
>   * property structure and the property name & contents. The property's
>   * flags have the OF_DYNAMIC bit set so that we can differentiate between
>   * dynamically allocated properties and not.
>   *
>   * Return: The newly allocated property or NULL on out of memory error.
>   */
> -struct property *__of_prop_dup(const struct property *prop, gfp_t allocflags)
> +struct property *of_property_alloc(const char *name, const void *value,
> +size_t len, gfp_t allocflags)
>  {
> - struct property *new;
> + struct property *prop;
> 
> - new = kzalloc(sizeof(*new), allocflags);
> - if (!new)
> + prop = kzalloc(sizeof(*prop) + len, allocflags);
> + if (!prop)
>   return NULL;
> 
> - /*
> -  * NOTE: There is no check for zero length value.
> -  * In case of a boolean property, this will allocate a value
> -  * of zero bytes. We do this to work around the use
> -  * of of_get_property() calls on boolean values.
> -  */
> - new->name = kstrdup(prop->name, allocflags);
> - new->value = kmemdup(prop->value, prop->length, allocflags);
> - new->length = prop->length;
> - if (!new->name || !new->value)
> - goto err_free;
> -
> - /* mark the property as dynamic */
> - of_property_set_flag(new, OF_DYNAMIC);
> -
> - return new;
> -
> - err_free:
> - kfree(new->name);
> - kfree(new->value);
> - kfree(new);
> + prop->name = kstrdup(name, allocflags);
> + if (!prop->name)
> + goto out_err;
> +
> + prop->value = prop + 1;
> + if (value)
> + memcpy(prop->value, value, len);
> +
> + prop->length = len;
> + of_property_set_flag(prop, OF_DYNAMIC);
> +
> + return prop;
> +
> +out_err:
> + of_property_free(prop);
> +
>   return NULL;
>  }
> +EXPORT_SYMBOL(of_property_alloc);
> 
>  /**
>   * __of_node_dup() - Duplicate or create an empty device node dynamically.
> @@ -447,9 +463,7 @@ struct device_node *__of_node_dup(const struct 
> device_node *np,
>   if (!new_pp)
>   goto err_prop;
>   if (__of_add_property(node, new_pp)) {
> - kfree(new_pp->name);
> - kfree(new_pp->value);
> - kfree(new_pp);
> +  

Re: [PATCH 1/3] of: dynamic: add of_property_alloc() and of_property_free()

2022-06-01 Thread Tyrel Datwyler
On 5/5/22 12:37, Rob Herring wrote:
> On Wed, May 04, 2022 at 05:40:31PM +0200, Clément Léger wrote:
>> Add function which allows to dynamically allocate and free properties.
>> Use this function internally for all code that used the same logic
>> (mainly __of_prop_dup()).
>>
>> Signed-off-by: Clément Léger 
>> ---
>>  drivers/of/dynamic.c | 101 ++-
>>  include/linux/of.h   |  16 +++
>>  2 files changed, 88 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index cd3821a6444f..e8700e509d2e 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -313,9 +313,7 @@ static void property_list_free(struct property 
>> *prop_list)
>>  
>>  for (prop = prop_list; prop != NULL; prop = next) {
>>  next = prop->next;
>> -kfree(prop->name);
>> -kfree(prop->value);
>> -kfree(prop);
>> +of_property_free(prop);
>>  }
>>  }
>>  
>> @@ -367,48 +365,95 @@ void of_node_release(struct kobject *kobj)
>>  }
>>  
>>  /**
>> - * __of_prop_dup - Copy a property dynamically.
>> - * @prop:   Property to copy
>> + * of_property_free - Free a property allocated dynamically.
>> + * @prop:   Property to be freed
>> + */
>> +void of_property_free(const struct property *prop)
>> +{
>> +kfree(prop->value);
>> +kfree(prop->name);
>> +kfree(prop);
>> +}
>> +EXPORT_SYMBOL(of_property_free);
>> +
>> +/**
>> + * of_property_alloc - Allocate a property dynamically.
>> + * @name:   Name of the new property
>> + * @value:  Value that will be copied into the new property value
>> + * @value_len:  length of @value to be copied into the new property 
>> value
>> + * @len:Length of new property value, must be greater than @value_len
> 
> What's the usecase for the lengths being different? That doesn't seem 
> like a common case, so perhaps handle it with a NULL value and 
> non-zero length. Then the caller has to deal with populating 
> prop->value.
> 
>>   * @allocflags: Allocation flags (typically pass GFP_KERNEL)
>>   *
>> - * Copy a property by dynamically allocating the memory of both the
>> + * Create a property by dynamically allocating the memory of both the
>>   * property structure and the property name & contents. The property's
>>   * flags have the OF_DYNAMIC bit set so that we can differentiate between
>>   * dynamically allocated properties and not.
>>   *
>>   * Return: The newly allocated property or NULL on out of memory error.
>>   */
>> -struct property *__of_prop_dup(const struct property *prop, gfp_t 
>> allocflags)
>> +struct property *of_property_alloc(const char *name, const void *value,
>> +   int value_len, int len, gfp_t allocflags)
>>  {
>> -struct property *new;
>> +int alloc_len = len;
>> +struct property *prop;
>> +
>> +if (len < value_len)
>> +return NULL;
>>  
>> -new = kzalloc(sizeof(*new), allocflags);
>> -if (!new)
>> +prop = kzalloc(sizeof(*prop), allocflags);
>> +if (!prop)
>>  return NULL;
>>  
>> +prop->name = kstrdup(name, allocflags);
>> +if (!prop->name)
>> +goto out_err;
>> +
>>  /*
>> - * NOTE: There is no check for zero length value.
>> - * In case of a boolean property, this will allocate a value
>> - * of zero bytes. We do this to work around the use
>> - * of of_get_property() calls on boolean values.
>> + * Even if the property has no value, it must be set to a
>> + * non-null value since of_get_property() is used to check
>> + * some values that might or not have a values (ranges for
>> + * instance). Moreover, when the node is released, prop->value
>> + * is kfreed so the memory must come from kmalloc.
> 
> Allowing for NULL value didn't turn out well...
> 
> We know that we can do the kfree because OF_DYNAMIC is set IIRC...
> 
> If we do 1 allocation for prop and value, then we can test 
> for "prop->value == prop + 1" to determine if we need to free or not.

If its a single allocation do we even need a test? Doesn't kfree(prop) take care
of the property and the trailing memory allocated for the value?

-Tyrel

> 
>>   */
>> -new->name = kstrdup(prop->name, allocflags);
>> -new->value = kmemdup(prop->value, prop->length, allocflags);
>> -new->length = prop->length;
>> -if (!new->name || !new->value)
>> -goto err_free;
>> +if (!alloc_len)
>> +alloc_len = 1;
>>  
>> -/* mark the property as dynamic */
>> -of_property_set_flag(new, OF_DYNAMIC);
>> +prop->value = kzalloc(alloc_len, allocflags);
>> +if (!prop->value)
>> +goto out_err;
>>  
>> -return new;
>> +if (value)
>> +memcpy(prop->value, value, value_len);
>> +
>> +prop->length = len;
>> +of_property_set_flag(prop, OF_DYNAMIC);
>> +
>> +return prop;
>> +
>> +out_err:
>> +of_property_free(prop);
>>  
>>

Re: [PATCH] powerpc/eeh: Drop redundant spinlock initialization

2022-05-10 Thread Tyrel Datwyler
On 5/10/22 02:53, Haowen Bai wrote:
> slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK,
> so we don't need to spin_lock_init again, drop it.
> 
> Signed-off-by: Haowen Bai 
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index f9af879c0222..77b476093e06 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -848,7 +848,6 @@ static int __init eeh_pseries_init(void)
>   }
> 
>   /* Initialize error log lock and size */

Update the comment, or just drop it entirely?

-Tyrel

> - spin_lock_init(&slot_errbuf_lock);
>   eeh_error_buf_size = rtas_token("rtas-error-log-max");
>   if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
>   pr_info("%s: unknown EEH error log size\n",



Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain

2022-05-05 Thread Tyrel Datwyler
On 5/5/22 02:31, Pali Rohár wrote:
> Hello!
> 
> On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote:
>> Le 04/05/2022 à 19:57, Pali Rohár a écrit :
>>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on
>>> device-tree properties"), powerpc kernel always fallback to PCI domain
>>> assignment from OF / Device Tree 'reg' property of the PCI controller.
>>>
>>> PCI code for other Linux architectures use increasing assignment of the PCI
>>> domain for individual controllers (assign the first free number), like it
>>> was also for powerpc prior mentioned commit.
>>>
>>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain
>>> mentioned commit) to new LTS versions brings a regression in domain
>>> assignment.
>>
>> Can you elaborate why it is a regression ?
>>63a72284b159
>> That commit says 'no functionnal changes', I'm having hard time 
>> understanding how a nochange can be a regression.
> 
> It is not 'no functional change'. That commit completely changed PCI
> domain assignment in a way that is incompatible with other architectures
> and also incompatible with the way how it was done prior that commit.

I agree that the "no functional change" statement is incorrect. However, for
most powerpc platforms it ended up being simply a cosmetic behavior change. As
far as I can tell there is nothing requiring domain ids to increase montonically
from zero or that each architecture is required to use the same domain numbering
scheme. Its hard to call this a true regression unless it actually broke
something. The commit in question has been in the kernel since 4.8 which was
released over 5 1/2 years ago.

With all that said looking closer at the code in question I think it is fair to
assume that the author only intended this change for powernv and pseries
platforms and not every powerpc platform. That change was done to make
persistent naming easier to manage in userspace. Your change defaults back to
the old behavior which will now break both powernv and pseries platforms with
regard to hotplugging and persistent naming.

We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead
of reg property in the look up that follows a failed ibm,opal-phbid lookup. I
think this is acceptable as long as no other powerpc platforms have started
using this behavior for persistent naming.

-Tyrel

> For example, prior that commit on P2020 RDB board were PCI domains 0, 1 and 2.
> 
> $ lspci
> :00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> :01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
> xHCI Host Controller (rev 02)
> 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network 
> Adapter (rev 01)
> 0002:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 0002:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac 
> Wireless Network Adapter
> 
> After that commit on P2020 RDB board are PCI domains 0x8000, 0x9000 and 
> 0xa000.
> 
> $ lspci
> 8000:00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 8000:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
> xHCI Host Controller (rev 02)
> 9000:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> 9000:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network 
> Adapter (rev 01)
> a000:04:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21)
> a000:05:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac 
> Wireless Network Adapter
> 
> It is somehow strange that PCI domains are not indexed one by one and
> also that there is no domain 0
> 
> With my patch when CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG is not set, then
> previous behavior used and PCI domains are again 0, 1 and 2.
> 
>> Usually we don't commit regressions to mainline ...
>>
>>
>>>
>>> Fix this issue by introducing a new option CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG
>>> When this options is disabled then powerpc kernel would assign PCI domains
>>> in the similar way like it is doing kernel for other architectures and also
>>> how it was done prior that commit.
>>
>> You don't define CONFIG_PPC_PCI_DOMAIN_FROM_OF_REG on by default, it 
>> means this commit will change the behaviour. Is that expected ?
>>
>> Is that really worth a user selectable option ? Is the user able to 
>> decide what he needs ?
> 
> Well, I hope that maintainers of that code answer to these questions.
> 
> In any case, I think that it could be a user selectable option as in
> that commit is explained that in some situation is makes sense to do
> PCI domain numbering based on DT reg.
> 
> But as I pointed above, upgrading from 4.4 TLS kernel to some new TLS
> kernel brings above regression, so I think that there should be a way to
> disable this behavior.
> 
> In my opinion, for people who are upgrading from 4.4 TLS kernel, this
> option should be turned off by def

Re: [PATCH] powerpc/pci: Remove useless null check before call of_node_put()

2022-04-25 Thread Tyrel Datwyler
On 4/23/22 07:32, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 4/20/22 19:52, Haowen Bai wrote:
>>> No need to add null check before call of_node_put(), since the
>>> implementation of of_node_put() has done it.
>>>
>>> Signed-off-by: Haowen Bai 
>>> ---
>>>  arch/powerpc/kernel/pci_dn.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>> index 61571ae23953..ba3bbc9bec2d 100644
>>> --- a/arch/powerpc/kernel/pci_dn.c
>>> +++ b/arch/powerpc/kernel/pci_dn.c
>>> @@ -357,8 +357,8 @@ void pci_remove_device_node_info(struct device_node *dn)
>>>
>>> /* Drop the parent pci_dn's ref to our backing dt node */
>>> parent = of_get_parent(dn);
>>> -   if (parent)
>>> -   of_node_put(parent);
>>> +
>>> +   of_node_put(parent);
>>
>> This whole block of code looks useless, or suspect. Examining the rest of the
>> code for this function this is the only place that parent is referenced. The
>> of_get_parent() call returns the parent with its refcount incremented, and 
>> then
>> we turn around and call of_node_put() which drops that reference we just 
>> took.
>> The comment doesn't do what it says it does. If we really need to drop a
>> previous reference to the parent device node this code block would need to 
>> call
>> of_node_put() twice on parent to accomplish that.
> 
> Yeah good analysis.
> 
> It used to use pdn->parent, which didn't grab  an extra reference, see
> commit 14db3d52d3a2 ("powerpc/eeh: Reduce use of pci_dn::node").
> 
> The old code was:
> 
> if (pdn->parent)
> of_node_put(pdn->parent->node);
> 
>> A closer examination is required to determine if what the comment says we 
>> need
>> to do is required. If it is then the code as it exists today is leaking that
>> reference AFAICS.
> 
> Yeah. This function is only called from pnv_php.c, ie. powernv PCI
> hotplug, which I think gets less testing than pseries hotplug. So
> possibly we are leaking references and haven't noticed, or maybe the
> comment is out of date.

Looks like we leak it. From pci_add_device_node_info() we clearly take a
reference we don't free:

/* Attach to parent node */
INIT_LIST_HEAD(&pdn->child_list);
INIT_LIST_HEAD(&pdn->list);
parent = of_get_parent(dn);
pdn->parent = parent ? PCI_DN(parent) : NULL;
if (pdn->parent)
list_add_tail(&pdn->list, &pdn->parent->child_list);

return pdn;

The question becomes whats the right fix. Doing a double put in the remove path
seems wrong, and looks gross. We no longer store a reference to the parent
device node in pci_dn::parent but instead a reference to the an actual pci_dn
struct. Seems to suggest we can drop the reference taken in
pci_add_device_node_info().

-Tyrel

> 
> cheers



Re: [PATCH] powerpc/pci: Remove useless null check before call of_node_put()

2022-04-22 Thread Tyrel Datwyler
On 4/20/22 19:52, Haowen Bai wrote:
> No need to add null check before call of_node_put(), since the
> implementation of of_node_put() has done it.
> 
> Signed-off-by: Haowen Bai 
> ---
>  arch/powerpc/kernel/pci_dn.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 61571ae23953..ba3bbc9bec2d 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -357,8 +357,8 @@ void pci_remove_device_node_info(struct device_node *dn)
> 
>   /* Drop the parent pci_dn's ref to our backing dt node */
>   parent = of_get_parent(dn);
> - if (parent)
> - of_node_put(parent);
> +
> + of_node_put(parent);

This whole block of code looks useless, or suspect. Examining the rest of the
code for this function this is the only place that parent is referenced. The
of_get_parent() call returns the parent with its refcount incremented, and then
we turn around and call of_node_put() which drops that reference we just took.
The comment doesn't do what it says it does. If we really need to drop a
previous reference to the parent device node this code block would need to call
of_node_put() twice on parent to accomplish that.

A closer examination is required to determine if what the comment says we need
to do is required. If it is then the code as it exists today is leaking that
reference AFAICS.

-Tyrel

> 
>   /*
>* At this point we *might* still have a pci_dev that was



[PATCH] ibmvscsis: increase INITIAL_SRP_LIMIT to 1024

2022-03-22 Thread Tyrel Datwyler
The adapter request_limit is hardcoded to be INITIAL_SRP_LIMIT which is
currently an arbitrary value of 800. Increase this value to 1024 which
better matches the characteristics of the typical IBMi Initiator that
supports 32 LUNs and a queue depth of 32.

This change also has the secondary benefit of being a power of two as
required by the kfifo API. Since, Commit ab9bb6318b09 ("Partially revert
"kfifo: fix kfifo_alloc() and kfifo_init()"") the size of IU pool for
each target has been rounded down to 512 when attempting to kfifo_init()
those pools with the current request_limit size of 800.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 61f06f6885a5..89b9fbce7488 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -36,7 +36,7 @@
 
 #define IBMVSCSIS_VERSION  "v0.2"
 
-#defineINITIAL_SRP_LIMIT   800
+#defineINITIAL_SRP_LIMIT   1024
 #defineDEFAULT_MAX_SECTORS 256
 #define MAX_TXU1024 * 1024
 
-- 
2.35.1



Re: [PATCH] powerpc: kernel: fix a refcount leak in format_show()

2022-03-01 Thread Tyrel Datwyler
On 3/1/22 04:55, Michael Ellerman wrote:
> Hangyu Hua  writes:
>> node needs to be dropped when of_property_read_string fails. So an earlier 
>> call
>> to of_node_put is required here.
> 
> That's true but ...
> 
>> diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
>> b/arch/powerpc/kernel/secvar-sysfs.c
>> index a0a78aba2083..cd0fa7028d86 100644
>> --- a/arch/powerpc/kernel/secvar-sysfs.c
>> +++ b/arch/powerpc/kernel/secvar-sysfs.c
>> @@ -30,13 +30,12 @@ static ssize_t format_show(struct kobject *kobj, struct 
>> kobj_attribute *attr,
>>  return -ENODEV;
> 
> There's also a reference leak there ^
> 
> So if you're going to touch this code I'd like you to fix both reference
> leaks in a single patch please.
> 
> Having the error cases set rc and then goto "out" which does the
> of_node_put() is the obvious solution I think.

update_kobj_size() in the same source file provides a good example of the
suggested solution.

-Tyrel

> 
> cheers
> 
>>  rc = of_property_read_string(node, "format", &format);
>> +of_node_put(node);
>>  if (rc)
>>  return rc;
>>  
>>  rc = sprintf(buf, "%s\n", format);
>>  
>> -of_node_put(node);
>> -
>>  return rc;
>>  }
>>  
>> -- 
>> 2.25.1



Re: [PATCH] powerpc/pseries: make pseries_devicetree_update() static

2022-02-08 Thread Tyrel Datwyler
On 2/7/22 2:12 PM, Nathan Lynch wrote:
> pseries_devicetree_update() has only one call site, in the same file in
> which it is defined. Make it static.
> 
> Signed-off-by: Nathan Lynch 
> ---

Reviewed-by: Tyrel Datwyler 


Re: [PATCH v4] powerpc/pseries: read the lpar name from the firmware

2022-01-05 Thread Tyrel Datwyler
On 1/5/22 5:36 PM, Nathan Lynch wrote:
> Tyrel Datwyler  writes:
>> On 1/5/22 3:19 PM, Nathan Lynch wrote:
>>>
>>
>> Is there benefit of adding a partition_name field/value pair to lparcfg? The
>> lparstat utility can just as easily make the get_sysparm call via librtas.
>> Further, rtas_filters allows this particular RTAS call from userspace.
> 
> The RTAS syscall is root-only, but we want the partition name (whether
> supplied by RTAS or the device tree) to be available to unprivileged
> programs.
> 

Ah, right. I recall this discussion now from previous iterations.

-Tyrel


Re: [PATCH v4] powerpc/pseries: read the lpar name from the firmware

2022-01-05 Thread Tyrel Datwyler
On 1/5/22 3:19 PM, Nathan Lynch wrote:
> Laurent Dufour  writes:
>> On 07/12/2021, 18:11:09, Laurent Dufour wrote:
>>> The LPAR name may be changed after the LPAR has been started in the HMC.
>>> In that case lparstat command is not reporting the updated value because it
>>> reads it from the device tree which is read at boot time.
>>>
>>> However this value could be read from RTAS.
>>>
>>> Adding this value in the /proc/powerpc/lparcfg output allows to read the
>>> updated value.
>>
>> Do you consider taking that patch soon?
> 
> This version prints an error on non-PowerVM guests the first time
> lparcfg is read.

I assume because QEMU doesn't implement the LPAR_NAME token for get_sysparm.

> 
> And I still contend that having this function fall back to reporting the
> partition name in the DT would provide a beneficial consistency in the
> user-facing API, allowing programs to avoid hypervisor-specific branches
> in their code. 

Agreed, if the get_sysparm fails just report the lpar-name from the device tree.

I don't understand the resistance I've encountered here.
> The fallback I'm suggesting (a root node property lookup) is certainly
> not more complex than the RTAS call sequence you've already implemented.
> 

Is there benefit of adding a partition_name field/value pair to lparcfg? The
lparstat utility can just as easily make the get_sysparm call via librtas.
Further, rtas_filters allows this particular RTAS call from userspace.

-Tyrel



Re: [PATCH] ethernet: ibmveth: use default_groups in kobj_type

2022-01-05 Thread Tyrel Datwyler
On 1/5/22 10:41 AM, Greg Kroah-Hartman wrote:
> There are currently 2 ways to create a set of sysfs files for a
> kobj_type, through the default_attrs field, and the default_groups
> field.  Move the ibmveth sysfs code to use default_groups
> field which has been the preferred way since aa30f47cf666 ("kobject: Add
> support for default attribute groups to kobj_type") so that we can soon
> get rid of the obsolete default_attrs field.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Cristobal Forno 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman 
> ---

Reviewed-by: Tyrel Datwyler 



Re: [PATCH] powerpc/cacheinfo: use default_groups in kobj_type

2022-01-04 Thread Tyrel Datwyler
On 1/4/22 7:54 AM, Greg Kroah-Hartman wrote:
> There are currently 2 ways to create a set of sysfs files for a
> kobj_type, through the default_attrs field, and the default_groups
> field.  Move the powerpc cacheinfo sysfs code to use default_groups
> field which has been the preferred way since aa30f47cf666 ("kobject: Add
> support for default attribute groups to kobj_type") so that we can soon
> get rid of the obsolete default_attrs field.
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: "Gautham R. Shenoy" 
> Cc: Parth Shah 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Greg Kroah-Hartman 
> ---

Reviewed-by: Tyrel Datwyler 


Re: [PATCH] tpm: Fix kexec crash due to access to ops NULL pointer (powerpc)

2021-12-20 Thread Tyrel Datwyler
On 12/20/21 5:31 PM, Stefan Berger wrote:
> 
> On 12/20/21 20:13, Jason Gunthorpe wrote:
>> On Mon, Dec 20, 2021 at 08:05:58PM -0500, Stefan Berger wrote:
>>
>>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>>> index ddaeceb7e109..4cb908349b31 100644
>>> +++ b/drivers/char/tpm/tpm-chip.c
>>> @@ -473,15 +473,8 @@ static void tpm_del_char_device(struct tpm_chip *chip)
>>>  mutex_unlock(&idr_lock);
>>>
>>>  /* Make the driver uncallable. */
>>> -   down_write(&chip->ops_sem);
>>> -   if (chip->flags & TPM_CHIP_FLAG_TPM2) {
>>> -   if (!tpm_chip_start(chip)) {
>>> -   tpm2_shutdown(chip, TPM2_SU_CLEAR);
>>> -   tpm_chip_stop(chip);
>>> -   }
>>> -   }
>>> -   chip->ops = NULL;
>>> -   up_write(&chip->ops_sem);
>>> +   if (chip->ops)
>> ops cannot be read without locking
> 
> This here could be an alternative:

I still think code de-duplication is a good thing. Maybe combine the two
approaches. Call tpm_class_shutdown from tpm_del_char_device and add a chip->ops
check in tpm_class_shutdown to ensure we only do the shutdown when chip->ops is
valid.

-Tyrel

> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb7e109..7772b475ebc0 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -474,7 +474,7 @@ static void tpm_del_char_device(struct tpm_chip *chip)
> 
>     /* Make the driver uncallable. */
>     down_write(&chip->ops_sem);
> -   if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> +   if (chip->ops && chip->flags & TPM_CHIP_FLAG_TPM2) {
>     if (!tpm_chip_start(chip)) {
>     tpm2_shutdown(chip, TPM2_SU_CLEAR);
>     tpm_chip_stop(chip);
> 
>    Stefan
> 
> 
>>
>> Jason



Re: [PATCH] tpm: Fix kexec crash due to access to ops NULL pointer (powerpc)

2021-12-20 Thread Tyrel Datwyler
On 12/20/21 5:05 PM, Stefan Berger wrote:
> 
> On 12/20/21 19:39, Tyrel Datwyler wrote:
>> On 12/11/21 5:28 PM, Stefan Berger wrote:
>>> Fix the following crash on kexec by checking chip->ops for a NULL pointer
>>> in tpm_chip_start() and returning an error code if this is the case.
>>>
>>> BUG: Kernel NULL pointer dereference on read at 0x0060
>>> Faulting instruction address: 0xc099a06c
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> ...
>>> NIP [c099a06c] tpm_chip_start+0x2c/0x140
>>>   LR [c099a808] tpm_chip_unregister+0x108/0x170
>>> Call Trace:
>>> [c000188bfa00] [c2b03930] fw_devlink_strict+0x0/0x8 (unreliable)
>>> [c000188bfa30] [c099a808] tpm_chip_unregister+0x108/0x170
>>> [c000188bfa70] [c09a3874] tpm_ibmvtpm_remove+0x34/0x130
>>> [c000188bfae0] [c0110dbc] vio_bus_remove+0x5c/0xb0
>>> [c000188bfb20] [c09bc154] device_shutdown+0x1d4/0x3a8
>>> [c000188bfbc0] [c0196e14] kernel_restart_prepare+0x54/0x70
>>>
>>> The referenced patch below introduced a function to shut down the VIO bus.
>>> The bus shutdown now calls tpm_del_char_device (via tpm_chip_unregister)
>>> after a call to tpm_class_shutdown, which already set chip->ops to NULL.
>>> The crash occurrs when tpm_del_char_device calls tpm_chip_start with the
>>> chip->ops NULL pointer.
>> It was unclear to me at first, but IIUC the problem is the ibmvtpm device
>> receives two shutdown calls, the first is a class shutdown call for TPM 
>> devices,
>> followed by a bus shutdown call for VIO devices.
>>
>> It appears that the class shutdown routines are meant to be a pre-shutdown
>> routine as they are defined as class->shutdown_pre(), and it is clearly 
>> allowed
>> to call class->shutdown_pre() followed by one of but not both of the 
>> following
>> bus->shutdown() or driver->shutdown(). Even tpm_class_shutdown() mentions in 
>> the
>> function comment that bus/device shutdown to follow.
> 
> I suppose you are referring to this here:
> 
> /**
>  * tpm_class_shutdown() - prepare the TPM device for loss of power.
>  * @dev: device to which the chip is associated.
>  *
>  * Issues a TPM2_Shutdown command prior to loss of power, as required by the
>  * TPM 2.0 spec. Then, calls bus- and device- specific shutdown code.
>  *
>  * Return: always 0 (i.e. success)
>  */
> 
> I think this comment still refers to the ancient code here:
> 
> https://elixir.bootlin.com/linux/v4.4.295/source/drivers/char/tpm/tpm-chip.c#L161
> 
>     if (dev->bus && dev->bus->shutdown)
>     dev->bus->shutdown(dev);
>     else if (dev->driver && dev->driver->shutdown)
>     dev->driver->shutdown(dev);
> 

Right, but that was because device_shutdown didn't use to call bus or driver
shutdown routines if class->shutdown was defined. TPM is the class of devices
that implements class->shutdown and that code above was removed and
device_shutdown was made to call class and either bus/driver shutdown as well.

See: Commit 7521621e600ae ("Do not disable driver and bus shutdown hook when
class shutdown hook is set")
> 
> 
>>> Fixes: 39d0099f9439 ("powerpc/pseries: Add shutdown() to vio_driver and
>>> vio_bus")
>> This patch left implementing each vio driver shutdown routine as an exercise 
>> for
>> the respective maintainers, and instead just big hammers anything that 
>> doesn't
>> have a shutdown routine by calling the driver->remove().
>>
>> If tpm_class_shutdown() quiecses ibmvtpm enough implementing a no-op
>> ibmvtpm->shutdown() with a comment saying so suffices.
>>
>> However, the generic TPM code is still introducing a bug that an attempt to 
>> call
>> tpm_chip_unregister() after tpm_class_shutdown() will crash as mentioned 
>> above.
> 
> 
> An alternative solution may be this here:

Right, this is exactly what I was proposing in my last comment previously.

> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb7e109..4cb908349b31 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -473,15 +473,8 @@ static void tpm_del_char_device(struct tpm_chip *chip)
>     mutex_unlock(&idr_lock);
> 
>     /* Make the driver uncallable. */
> -   down_write(&chip->ops_sem);
> -   if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> -   if (!tpm_chip_start(chip)) {
> -   

Re: [PATCH] tpm: Fix kexec crash due to access to ops NULL pointer (powerpc)

2021-12-20 Thread Tyrel Datwyler
On 12/11/21 5:28 PM, Stefan Berger wrote:
> Fix the following crash on kexec by checking chip->ops for a NULL pointer
> in tpm_chip_start() and returning an error code if this is the case.
> 
> BUG: Kernel NULL pointer dereference on read at 0x0060
> Faulting instruction address: 0xc099a06c
> Oops: Kernel access of bad area, sig: 11 [#1]
> ...
> NIP [c099a06c] tpm_chip_start+0x2c/0x140
>  LR [c099a808] tpm_chip_unregister+0x108/0x170
> Call Trace:
> [c000188bfa00] [c2b03930] fw_devlink_strict+0x0/0x8 (unreliable)
> [c000188bfa30] [c099a808] tpm_chip_unregister+0x108/0x170
> [c000188bfa70] [c09a3874] tpm_ibmvtpm_remove+0x34/0x130
> [c000188bfae0] [c0110dbc] vio_bus_remove+0x5c/0xb0
> [c000188bfb20] [c09bc154] device_shutdown+0x1d4/0x3a8
> [c000188bfbc0] [c0196e14] kernel_restart_prepare+0x54/0x70
> 
> The referenced patch below introduced a function to shut down the VIO bus.
> The bus shutdown now calls tpm_del_char_device (via tpm_chip_unregister)
> after a call to tpm_class_shutdown, which already set chip->ops to NULL.
> The crash occurrs when tpm_del_char_device calls tpm_chip_start with the
> chip->ops NULL pointer.

It was unclear to me at first, but IIUC the problem is the ibmvtpm device
receives two shutdown calls, the first is a class shutdown call for TPM devices,
followed by a bus shutdown call for VIO devices.

It appears that the class shutdown routines are meant to be a pre-shutdown
routine as they are defined as class->shutdown_pre(), and it is clearly allowed
to call class->shutdown_pre() followed by one of but not both of the following
bus->shutdown() or driver->shutdown(). Even tpm_class_shutdown() mentions in the
function comment that bus/device shutdown to follow.

> 
> Fixes: 39d0099f9439 ("powerpc/pseries: Add shutdown() to vio_driver and 
> vio_bus")

This patch left implementing each vio driver shutdown routine as an exercise for
the respective maintainers, and instead just big hammers anything that doesn't
have a shutdown routine by calling the driver->remove().

If tpm_class_shutdown() quiecses ibmvtpm enough implementing a no-op
ibmvtpm->shutdown() with a comment saying so suffices.

However, the generic TPM code is still introducing a bug that an attempt to call
tpm_chip_unregister() after tpm_class_shutdown() will crash as mentioned above.

> Signed-off-by: Stefan Berger 
> ---
>  drivers/char/tpm/tpm-chip.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb7e109..cca1bde296ee 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -101,6 +101,9 @@ int tpm_chip_start(struct tpm_chip *chip)
>  {
>   int ret;
> 
> + if (!chip->ops)
> + return -EINVAL;
> +

I wonder if a better fix would to have tpm_del_char_device() check for valid
chip->ops and call tpm_class_shutdown() when the ops are still valid. Calling
tpm_class_shutdown() allows for some code deduplication in 
tpm_del_char_device().

-Tyrel

>   tpm_clk_enable(chip);
> 
>   if (chip->locality == -1) {
> 



Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.

2021-11-29 Thread Tyrel Datwyler
On 11/29/21 5:06 PM, Nathan Lynch wrote:
> Tyrel Datwyler  writes:
>> On 11/29/21 12:58 AM, Mahesh Salgaonkar wrote:
>>> -int rtas_get_sensor_fast(int sensor, int index, int *state)
>>> +static int
>>> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
>>>  {
>>> int token = rtas_token("get-sensor-state");
>>> int rc;
>>> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int 
>>> *state)
>>> return -ENOENT;
>>>
>>> rc = rtas_call(token, 2, 2, state, sensor, index);
>>> -   WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>>> -   rc <= RTAS_EXTENDED_DELAY_MAX));
>>> +   WARN_ON(warn_on &&
>>> +   (rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>>> +   rc <= RTAS_EXTENDED_DELAY_MAX)));
>>
>> The whole point of rtas_get_sensor_fast() is that on busy we will just let it
>> error out because we don't want to wait. I'm not sure I see the point of the
>> spurious WARN_ONs anytime we hit a BUSY or DELAY return code. Maybe 
>> converting
>> that to a pr_debug() might be better and save expanding the API with a _fast 
>> and
>> _nonblocking variant that do the same thing minus one surpressing a
>> WARN_ON splat.
> 
> There is a subset of sensors that are specified to not ever return busy
> or delay statuses. rtas_get_sensor_fast() is meant to be used with
> those, and it would be an error to use it on a sensor not in that set.
> So the WARN_ON() is appropriate IMO; if it triggers it indicates either
> a misuse of the API or a firmware bug. See commit 1c2cb594441d
> "powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers"
> 

Fair enough. Seems I misremembered the nature of the original problem and should
have looked back at the commit to completely jog my memory.


Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.

2021-11-29 Thread Tyrel Datwyler
On 11/29/21 12:58 AM, Mahesh Salgaonkar wrote:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the pci error it uses
> get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
> recovery is skipped entirely.
> 
> However on certain PHB failures, the rtas call get-sesnor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The rtas call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog). On timeouts, network driver go into ffdc capture mode
> and reset path assuming the PCI device is in fatal condition. This
> sometimes causes EEH recovery to fail. This impacts the ssh connection and
> leads to the system being inaccessible.
> 
> 
> [52732.244731] DEBUG: ibm_read_slot_reset_state2()
> [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
> [52732.244798] DEBUG: in eeh_slot_presence_check
> [52732.244804] DEBUG: error state check
> [52732.244807] DEBUG: Is slot hotpluggable
> [52732.244810] DEBUG: hotpluggable ops ?
> [52732.244953] DEBUG: Calling ops->get_adapter_status
> [52732.244958] DEBUG: calling rpaphp_get_sensor_state
> [52736.564262] [ cut here ]
> [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
> [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
> [...]
> [52736.564505] NIP [c0c32368] dev_watchdog+0x438/0x440
> [52736.564513] LR [c0c32364] dev_watchdog+0x434/0x440
> 
> 
> Fix this issue by introducing a new rtas_get_sensor_nonblocking() that does
> not get blocked on BUSY condition and returns immediately with error. Use
> this function in pseries pci hotplug driver which can return an error if
> slot presence state can not be detected immediately. Please note that only
> in certain PHB failures, the slot presence check returns BUSY condition. In
> normal cases it returns immediately with a correct presence state value.
> Hence this change has no impact on normal pci dlpar operations.
> 
> We could use rtas_get_sensor_fast() variant, but it thorws WARN_ON on BUSY
> condition. The rtas_get_sensor_nonblocking() suppresses WARN_ON.
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
> 
> This is an alternate approach to fix the EEH issue instead of delaying slot
> presence check proposed at
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html
> 
> Also refer:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html
> ---
>  arch/powerpc/include/asm/rtas.h  |1 +
>  arch/powerpc/kernel/rtas.c   |   19 ---
>  drivers/pci/hotplug/rpaphp_pci.c |8 
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 9dc97d2f9d27e..d8e8befb1c193 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -250,6 +250,7 @@ extern void rtas_os_term(char *str);
>  void rtas_activate_firmware(void);
>  extern int rtas_get_sensor(int sensor, int index, int *state);
>  extern int rtas_get_sensor_fast(int sensor, int index, int *state);
> +int rtas_get_sensor_nonblocking(int sensor, int index, int *state);
>  extern int rtas_get_power_level(int powerdomain, int *level);
>  extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
>  extern bool rtas_indicator_present(int token, int *maxindex);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index ac61e226c9af6..fd5aa3bbd46c5 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -609,7 +609,8 @@ int rtas_get_sensor(int sensor, int index, int *state)
>  }
>  EXPORT_SYMBOL(rtas_get_sensor);
> 
> -int rtas_get_sensor_fast(int sensor, int index, int *state)
> +static int
> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
>  {
>   int token = rtas_token("get-sensor-state");
>   int rc;
> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int 
> *state)
>   return -ENOENT;
> 
>   rc = rtas_call(token, 2, 2, state, sensor, index);
> - WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> - 

Re: [PATCH] powerpc/pseries/vas: Don't print an error when VAS is unavailable

2021-11-29 Thread Tyrel Datwyler
On 11/26/21 2:31 AM, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of November 26, 2021 5:13 pm:
>> On 11/26/21 06:21, Nicholas Piggin wrote:
>>> KVM does not support VAS so guests always print a useless error on boot
>>>
>>>  vas: HCALL(398) error -2, query_type 0, result buffer 0x57f2000
>>>
>>> Change this to only print the message if the error is not H_FUNCTION.
>>
>>
>> Just being curious, why is it even called since "ibm,compression" should
>> not be exposed in the DT ?
> 
> It looks like vas does not test for it. I guess in theory there can be 
> other functions than compression implemented as an accelerator. Maybe
> that's why?
> 
> Thanks,
> Nick
> 
Looks like pseries_vas_init() simply calls h_query_vas_capabilities() to test
for VAS coprocessor support. I would assume KVM doesn't expose hcall-vas or
hcall-nx in /rtas/ibm,hypertas-functions? Doesn't look like hcall-vas or
hcall-nx have been added to the hypertas_fw_feature matching, but maybe they
should and we can gate VAS initialization on those, or at the minimum
FW_FEATURE_VAS?

-Tyrel


Re: [PATCH] soc: fsl: guts: Fix a resource leak in the error handling path of 'fsl_guts_probe()'

2021-10-22 Thread Tyrel Datwyler
On 10/21/21 5:26 PM, Li Yang wrote:
> On Wed, Aug 18, 2021 at 4:23 PM Christophe JAILLET
>  wrote:
>>
>> If an error occurs after 'of_find_node_by_path()', the reference taken for
>> 'root' will never be released and some memory will leak.
> 
> Thanks for finding this.  This truly is a problem.
> 
>>
>> Instead of adding an error handling path and modifying all the
>> 'return -SOMETHING' into 'goto errorpath', use 'devm_add_action_or_reset()'
>> to release the reference when needed.
>>
>> Simplify the remove function accordingly.
>>
>> As an extra benefit, the 'root' global variable can now be removed as well.
>>
>> Fixes: 3c0d64e867ed ("soc: fsl: guts: reuse machine name from device tree")
>> Signed-off-by: Christophe JAILLET 
>> ---
>> Compile tested only
>> ---
>>  drivers/soc/fsl/guts.c | 16 ++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
>> index d5e9a5f2c087..4d9476c7b87c 100644
>> --- a/drivers/soc/fsl/guts.c
>> +++ b/drivers/soc/fsl/guts.c
>> @@ -28,7 +28,6 @@ struct fsl_soc_die_attr {
>>  static struct guts *guts;
>>  static struct soc_device_attribute soc_dev_attr;
>>  static struct soc_device *soc_dev;
>> -static struct device_node *root;
>>
>>
>>  /* SoC die attribute definition for QorIQ platform */
>> @@ -136,14 +135,23 @@ static u32 fsl_guts_get_svr(void)
>> return svr;
>>  }
>>
>> +static void fsl_guts_put_root(void *data)
>> +{
>> +   struct device_node *root = data;
>> +
>> +   of_node_put(root);
>> +}
>> +
>>  static int fsl_guts_probe(struct platform_device *pdev)
>>  {
>> struct device_node *np = pdev->dev.of_node;
>> struct device *dev = &pdev->dev;
>> +   struct device_node *root;
>> struct resource *res;
>> const struct fsl_soc_die_attr *soc_die;
>> const char *machine;
>> u32 svr;
>> +   int ret;
>>
>> /* Initialize guts */
>> guts = devm_kzalloc(dev, sizeof(*guts), GFP_KERNEL);
>> @@ -159,6 +167,10 @@ static int fsl_guts_probe(struct platform_device *pdev)
>>
>> /* Register soc device */
>> root = of_find_node_by_path("/");
>> +   ret = devm_add_action_or_reset(dev, fsl_guts_put_root, root);
>> +   if (ret)
>> +   return ret;
> 
> We probably only need to hold the reference when we do get "machine"
> from the device tree, otherwise we can put it directly.

To be pedantic since you are using the a properties string value of the root
node directly you need to hold the reference of that node for the lifetime of
the device. Realistically, its not like the root node and its model/compatible
properties are going to need to get modified at runtime so you could also argue
that holding the reference is unnecessary.

> 
> Or maybe we just maintain a local copy of string machine which means
> we can release the reference right away?

Looks like that is the original behavior that commit 3c0d64e867ed changed.
Although it looks like that behavior neglected to handle a memory allocation
failure during the string copy. How much memory is really being saved by not
keeping a local copy? Maybe revert 3c0d64e867ed and add the memory allocation 
check?

-Tyrel

> 
>> +
>> if (of_property_read_string(root, "model", &machine))
>> of_property_read_string_index(root, "compatible", 0, 
>> &machine);
>> if (machine)
>> @@ -197,7 +209,7 @@ static int fsl_guts_probe(struct platform_device *pdev)
>>  static int fsl_guts_remove(struct platform_device *dev)
>>  {
>> soc_device_unregister(soc_dev);
>> -   of_node_put(root);
>> +
>> return 0;
>>  }
>>
>> --
>> 2.30.2
>>



Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-20 Thread Tyrel Datwyler
On 10/20/21 8:54 AM, Nathan Lynch wrote:
> Tyrel Datwyler  writes:
>> On 10/19/21 2:36 PM, Nathan Lynch wrote:
>>> Tyrel Datwyler  writes:
>>>> On 10/18/21 9:34 AM, Nathan Lynch wrote:
>>>>> On VMs with NX encryption, compression, and/or RNG offload, these
>>>>> capabilities are described by nodes in the ibm,platform-facilities device
>>>>> tree hierarchy:
>>>>>
>>>>>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>>>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>>>   ├── ibm,compression-v1
>>>>>   ├── ibm,random-v1
>>>>>   └── ibm,sym-encryption-v1
>>>>>
>>>>>   3 directories
>>>>>
>>>>> The acceleration functions that these nodes describe are not disrupted by
>>>>> live migration, not even temporarily.
>>>>>
>>>>> But the post-migration ibm,update-nodes sequence firmware always sends
>>>>> "delete" messages for this hierarchy, followed by an "add" directive to
>>>>> reconstruct it via ibm,configure-connector (log with debugging statements
>>>>> enabled in mobility.c):
>>>>>
>>>>>   mobility: removing node 
>>>>> /ibm,platform-facilities/ibm,random-v1:4294967285
>>>>>   mobility: removing node 
>>>>> /ibm,platform-facilities/ibm,compression-v1:4294967284
>>>>>   mobility: removing node 
>>>>> /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>>>>   mobility: removing node /ibm,platform-facilities:4294967286
>>>>>   ...
>>>>>   mobility: added node /ibm,platform-facilities:4294967286
>>>>
>>>> It always bothered me that the update from firmware here was an delete/add 
>>>> at
>>>> the entire '/ibm,platform-facilities' node level instead of update 
>>>> properties
>>>> calls. When I asked about it years ago the claim was it was just easier to 
>>>> pass
>>>> an entire sub-tree as a single node add.
>>>
>>> Yes, I believe this is for ease of implementation on their part.
>>
>> I'd still argue that a full node removal or addition is a platform
>> reconfiguration on par with a hotplug operation.
> 
> Well... I think we might be better served by a slightly more flexible
> view of things :-) These are just a series of messages from firmware
> that should be considered as a whole, and they don't dictate exactly
> what the OS must do to its own data structures. Nothing mandates that
> the OS immediately and literally apply the changes as they are
> individually reported by the update-nodes sequence.

Agree to disagree. Not saying we can do much about it here, but I think being
flexible leads to scope creep down the road that may eventually complicate
things worse. A node removal doesn't guarantee an immediate node addition. So we
are just papering over the issue with a plan to paper over it some more in a
more complicated fashion.

> 
> Anyway, whether the firmware behavior is reasonable is sort of beside
> the point since we're stuck with it.

I'm aware. Moaning for the sake of moaning about it.

> 
> 
>>>>> This is because add_dt_node() -> dlpar_attach_node() attaches only the
>>>>> parent node returned from configure-connector, ignoring any children. This
>>>>> should be corrected for the general case, but fixing that won't help with
>>>>> the stale OF node references, which is the more urgent problem.
>>>>
>>>> I don't follow. If the code path you mention is truly broken in the way 
>>>> you say
>>>> then DLPAR operations involving nodes with child nodes should also be
>>>> broken.
>>>
>>> Hmm I don't know of any kernel-based DLPAR workflow that attaches
>>> sub-trees i.e. parent + children. Processor addition attaches CPU nodes
>>> and possibly cache nodes, but they have sibling relationships. If you're
>>> thinking of I/O adapters, the DT updates are (still!) driven from user
>>> space. As undesirable as that is, that use case actually works correctly
>>> AFAIK.
>>>
>>> What happens for the platfac LPM scenario is that
>>> dlpar_configure_connector() returns the node pointer for the root
>>> ibm,platform-facilities node, with children attached. That node pointer
>>> is passed from add_dt_node() -> dlpar_attach_node() -> of_attach_node

Re: [PATCH] PCI/hotplug: Remove unneeded of_node_put() in pnv_php

2021-10-20 Thread Tyrel Datwyler
On 10/20/21 4:39 AM, Nathan Lynch wrote:
> Wan Jiabing  writes:
>> Fix following coccicheck warning:
>> ./drivers/pci/hotplug/pnv_php.c:161:2-13: ERROR: probable double put.
>>
>> Device node iterators put the previous value of the index variable, so
>> an explicit put causes a double put.
> 
> I suppose Coccinelle doesn't take into account that this code is
> detaching and freeing the nodes.
> 
> 
>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>> index f4c2e6e01be0..f3da4f95d73f 100644
>> --- a/drivers/pci/hotplug/pnv_php.c
>> +++ b/drivers/pci/hotplug/pnv_php.c
>> @@ -158,7 +158,6 @@ static void pnv_php_detach_device_nodes(struct 
>> device_node *parent)
>>  for_each_child_of_node(parent, dn) {
>>  pnv_php_detach_device_nodes(dn);
>>  
>> -of_node_put(dn);
>>  of_detach_node(dn);
>>  }
> 
> The code might be improved by comments explaining how the bare
> of_node_put() corresponds to a "get" somewhere else in the driver, and
> how it doesn't render the ongoing traversal unsafe. It looks suspicious
> on first review, but I believe it's intentional and probably correct as
> written.
> 

This is a common usage pattern which if we put a comment about the pattern here
we need to do it every where. I suppose a better solution is to wrap this put in
a more descriptive function name like of_node_long_put() or something of the
sort the makes it obvious we are dropping a long held global scope reference.

-Tyrel


Re: [PATCH] PCI/hotplug: Remove unneeded of_node_put() in pnv_php

2021-10-20 Thread Tyrel Datwyler
On 10/20/21 2:46 AM, Wan Jiabing wrote:
> Fix following coccicheck warning:
> ./drivers/pci/hotplug/pnv_php.c:161:2-13: ERROR: probable double put.
> 
> Device node iterators put the previous value of the index variable, so
> an explicit put causes a double put.
> 
> Signed-off-by: Wan Jiabing 

NACK

This is a false positive from coccicheck. This is a case were a node is being
dynamically removed and the long reference needs to be dropped. Otherwise, the
reference count doesn't go to zero and trigger cleanup. This would result in us
ending up in a leaked device node.

-Tyrel

> ---
>  drivers/pci/hotplug/pnv_php.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index f4c2e6e01be0..f3da4f95d73f 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -158,7 +158,6 @@ static void pnv_php_detach_device_nodes(struct 
> device_node *parent)
>   for_each_child_of_node(parent, dn) {
>   pnv_php_detach_device_nodes(dn);
> 
> - of_node_put(dn);
>   of_detach_node(dn);
>   }
>  }
> 



Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-19 Thread Tyrel Datwyler
On 10/19/21 2:36 PM, Nathan Lynch wrote:
> Hi Tyrel, thanks for the detailed review.
> 
> Tyrel Datwyler  writes:
>> On 10/18/21 9:34 AM, Nathan Lynch wrote:
>>> On VMs with NX encryption, compression, and/or RNG offload, these
>>> capabilities are described by nodes in the ibm,platform-facilities device
>>> tree hierarchy:
>>>
>>>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>>>   ├── ibm,compression-v1
>>>   ├── ibm,random-v1
>>>   └── ibm,sym-encryption-v1
>>>
>>>   3 directories
>>>
>>> The acceleration functions that these nodes describe are not disrupted by
>>> live migration, not even temporarily.
>>>
>>> But the post-migration ibm,update-nodes sequence firmware always sends
>>> "delete" messages for this hierarchy, followed by an "add" directive to
>>> reconstruct it via ibm,configure-connector (log with debugging statements
>>> enabled in mobility.c):
>>>
>>>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>>>   mobility: removing node 
>>> /ibm,platform-facilities/ibm,compression-v1:4294967284
>>>   mobility: removing node 
>>> /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>>   mobility: removing node /ibm,platform-facilities:4294967286
>>>   ...
>>>   mobility: added node /ibm,platform-facilities:4294967286
>>
>> It always bothered me that the update from firmware here was an delete/add at
>> the entire '/ibm,platform-facilities' node level instead of update properties
>> calls. When I asked about it years ago the claim was it was just easier to 
>> pass
>> an entire sub-tree as a single node add.
> 
> Yes, I believe this is for ease of implementation on their part.

I'd still argue that a full node removal or addition is a platform
reconfiguration on par with a hotplug operation.

> 
>>> Note we receive a single "add" message for the entire hierarchy, and what
>>> we receive from the ibm,configure-connector sequence is the top-level
>>> platform-facilities node along with its three children. The debug message
>>> simply reports the parent node and not the whole subtree.
>>>
>>> Also, significantly, the nodes added are almost completely equivalent to
>>> the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
>>> the leaf nodes is the only property I've observed to differ, and Linux does
>>> not use that. So in practice, the sum of update messages Linux receives for
>>> this hierarchy is equivalent to minor property updates.
>>
>> The two props I would think maybe we would need to be most be concerned about
>> ensuring don't change are "ibm,resource-id" which gets used in the vio bus 
>> code
>> when configuring platform facilities nodes, and 'ibm,max-sync-cop' used by 
>> the
>> pseries-nx driver.
> 
> The proposed change does not introduce any regressions with respect to
> those properties. At least, that's the intention.
> 
> ibm,resource-id: the vio code lacks any way of reacting to this changing
> at runtime. (I could be wrong, but I suspect it does not change in
> practice even though it is technically allowed by the spec.)
> 
> ibm,max-sync-cop: the nx code has a notifier callback to handle changes
> to this and a couple other properties, but only if they are propagated
> through an "update property" event. Node remove+add doesn't trigger the
> callback. I think this one is more likely than resource-id to change
> between different models or configurations, but I haven't observed this
> to happen in my test environment.
> 
> Without this change, the child nodes of the ibm,platform-facilities
> container in Linux's device tree are completely discarded (see more
> below) after a partition migration. This makes driver re-initialization
> impossible without a reboot. With the change, the nodes are retained,
> albeit with properties that may not correctly reflect the destination
> system. To be clear, I think this is bad, and I am working on a better
> solution. But I consider the change here a net improvement for the
> common case (no meaningful property changes), and it's a wash for
> migrations where the properties could change (we weren't handling that
> anyway).

This didn't use to be the case. After some digging I see the problem, and it is
clearly a long standing regression (see comment further down). Part of that i

Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-18 Thread Tyrel Datwyler
On 10/18/21 3:37 PM, Tyrel Datwyler wrote:
> On 10/18/21 9:34 AM, Nathan Lynch wrote:

<>

>>
>> One way to address that would be to make the drivers respond to node
>> removal notifications, so that node references can be dropped
>> appropriately. But this would likely force the drivers to disrupt active
>> clients for no useful purpose: equivalent nodes are immediately re-added.
>> And recall that the acceleration capabilities described by the nodes remain
>> available throughout the whole process.
> 
> See my comments above about its the vio bus more at fault here then the 
> drivers
> themselves. I'm inclined to agree though that disrupting active operations 
> with
> a driver unbind/rebind is a little extreme.
> 
> This also brings me back to firmware removing and re-adding the whole
> '/ibm,platform-facilities' node instead of simply updating changed properties
> could avoid this whole fiasco.
> 

Thinking more on this and trying to recall my discussion so very long ago with
firmware I now recall that I had complained that the idea of a node remove/add
is akin to a DLPAR operation which we have no notion of for platform facilities.
They know better than to do this with other virtual devices so I'm still not
sure why they insist on doing it here.

-Tyrel


Re: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

2021-10-18 Thread Tyrel Datwyler
On 10/18/21 9:34 AM, Nathan Lynch wrote:
> On VMs with NX encryption, compression, and/or RNG offload, these
> capabilities are described by nodes in the ibm,platform-facilities device
> tree hierarchy:
> 
>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>   ├── ibm,compression-v1
>   ├── ibm,random-v1
>   └── ibm,sym-encryption-v1
> 
>   3 directories
> 
> The acceleration functions that these nodes describe are not disrupted by
> live migration, not even temporarily.
> 
> But the post-migration ibm,update-nodes sequence firmware always sends
> "delete" messages for this hierarchy, followed by an "add" directive to
> reconstruct it via ibm,configure-connector (log with debugging statements
> enabled in mobility.c):
> 
>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>   mobility: removing node 
> /ibm,platform-facilities/ibm,compression-v1:4294967284
>   mobility: removing node 
> /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>   mobility: removing node /ibm,platform-facilities:4294967286
>   ...
>   mobility: added node /ibm,platform-facilities:4294967286

It always bothered me that the update from firmware here was an delete/add at
the entire '/ibm,platform-facilities' node level instead of update properties
calls. When I asked about it years ago the claim was it was just easier to pass
an entire sub-tree as a single node add.

> 
> Note we receive a single "add" message for the entire hierarchy, and what
> we receive from the ibm,configure-connector sequence is the top-level
> platform-facilities node along with its three children. The debug message
> simply reports the parent node and not the whole subtree.
> 
> Also, significantly, the nodes added are almost completely equivalent to
> the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
> the leaf nodes is the only property I've observed to differ, and Linux does
> not use that. So in practice, the sum of update messages Linux receives for
> this hierarchy is equivalent to minor property updates.

The two props I would think maybe we would need to be most be concerned about
ensuring don't change are "ibm,resource-id" which gets used in the vio bus code
when configuring platform facilities nodes, and 'ibm,max-sync-cop' used by the
pseries-nx driver.

> 
> We succeed in removing the original hierarchy from the device tree. But the
> drivers bound to the leaf nodes are ignorant of this, and do not relinquish
> their references. The leaf nodes, still reachable through sysfs, of course
> still refer to the now-freed ibm,platform-facilities parent node, which
> makes use-after-free possible:

It is actually more subtle then the drivers themselves being ignorant. They
could register node update notifiers, but the real problem here is that the vio
bus device itself takes a reference to each child node registered to the bus.
I'm not sure we really want to unbind/rebind drivers as a result of LPM, but it
would be generic to the vio bus instead of updating each driver to ensure its
handling it device node references properly.

> 
>   refcount_t: addition on 0; use-after-free.
>   WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 
> refcount_warn_saturate+0x164/0x1f0
>   refcount_warn_saturate+0x160/0x1f0 (unreliable)
>   kobject_get+0xf0/0x100
>   of_node_get+0x30/0x50
>   of_get_parent+0x50/0xb0
>   of_fwnode_get_parent+0x54/0x90
>   fwnode_count_parents+0x50/0x150
>   fwnode_full_name_string+0x30/0x110
>   device_node_string+0x49c/0x790
>   vsnprintf+0x1c0/0x4c0
>   sprintf+0x44/0x60
>   devspec_show+0x34/0x50
>   dev_attr_show+0x40/0xa0
>   sysfs_kf_seq_show+0xbc/0x200
>   kernfs_seq_show+0x44/0x60
>   seq_read_iter+0x2a4/0x740
>   kernfs_fop_read_iter+0x254/0x2e0
>   new_sync_read+0x120/0x190
>   vfs_read+0x1d0/0x240
> 
> Moreover, the "new" replacement subtree is not correctly added to the
> device tree, resulting in ibm,platform-facilities parent node without the
> appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
> 
>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>   /sys/firmware/devicetree/base/ibm,platform-facilities/
> 
>   0 directories
> 
>   $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
>   ./ibm,sym-encryption-v1/of_node: broken symbolic link to
> 
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
>   ./ibm,random-v1/of_node: broken symbolic link to
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
>   ./ibm,compression-v1/of_node:broken symbolic link to
> 
> ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
> 
> This is because add_dt_node() -> dlpar_attach_node() attaches only the
> parent node returned from configure-connector, ignoring any children. This
> should be corrected for the general case, but fixing that won't help with
> the stale OF n

Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference

2021-10-18 Thread Tyrel Datwyler
On 10/14/21 10:03 PM, Gustavo A. R. Silva wrote:
> (!ptr && !ptr->foo) strikes again. :)
> 
> The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
> it leads to a NULL pointer dereference: ptr->foo.
> 
> Fix this by converting && to ||
> 
> This issue was detected with the help of Coccinelle, and audited and
> fixed manually.
> 
> Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window 
> operations")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 
Looking at the usage pattern it is obvious that if we determine !ptr attempting
to also confirm !ptr->ops is going to blow up.

LGTM.

Reviewed-by: Tyrel Datwyler 

> ---
>  arch/powerpc/platforms/book3s/vas-api.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
> b/arch/powerpc/platforms/book3s/vas-api.c
> index 30172e52e16b..4d82c92ddd52 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -303,7 +303,7 @@ static int coproc_ioc_tx_win_open(struct file *fp, 
> unsigned long arg)
>   return -EINVAL;
>   }
> 
> - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->open_win) {
> + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->open_win) {
>   pr_err("VAS API is not registered\n");
>   return -EACCES;
>   }
> @@ -373,7 +373,7 @@ static int coproc_mmap(struct file *fp, struct 
> vm_area_struct *vma)
>   return -EINVAL;
>   }
> 
> - if (!cp_inst->coproc->vops && !cp_inst->coproc->vops->paste_addr) {
> + if (!cp_inst->coproc->vops || !cp_inst->coproc->vops->paste_addr) {
>   pr_err("%s(): VAS API is not registered\n", __func__);
>   return -EACCES;
>   }
> 



Re: [PATCH net-next 11/12] ethernet: ibmveth: use ether_addr_to_u64()

2021-10-15 Thread Tyrel Datwyler
On 10/15/21 3:16 PM, Jakub Kicinski wrote:
> We'll want to make netdev->dev_addr const, remove the local
> helper which is missing a const qualifier on the argument
> and use ether_addr_to_u64().

LGTM. ibmveth_encode_mac_addr() is clearly code duplication of
ether_addr_to_u64() minus the const qualifier.

Reviewed-by: Tyrel Datwyler 

> 
> Similar story to mlx4.
> 
> Signed-off-by: Jakub Kicinski 
> ---
> CC: cforn...@linux.ibm.com
> CC: m...@ellerman.id.au
> CC: b...@kernel.crashing.org
> CC: pau...@samba.org
> CC: linuxppc-dev@lists.ozlabs.org
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 17 +++--
>  1 file changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
> b/drivers/net/ethernet/ibm/ibmveth.c
> index 836617fb3f40..45ba40cf4d07 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -483,17 +483,6 @@ static int ibmveth_register_logical_lan(struct 
> ibmveth_adapter *adapter,
>   return rc;
>  }
> 
> -static u64 ibmveth_encode_mac_addr(u8 *mac)
> -{
> - int i;
> - u64 encoded = 0;
> -
> - for (i = 0; i < ETH_ALEN; i++)
> - encoded = (encoded << 8) | mac[i];
> -
> - return encoded;
> -}
> -
>  static int ibmveth_open(struct net_device *netdev)
>  {
>   struct ibmveth_adapter *adapter = netdev_priv(netdev);
> @@ -553,7 +542,7 @@ static int ibmveth_open(struct net_device *netdev)
>   adapter->rx_queue.num_slots = rxq_entries;
>   adapter->rx_queue.toggle = 1;
> 
> - mac_address = ibmveth_encode_mac_addr(netdev->dev_addr);
> + mac_address = ether_addr_to_u64(netdev->dev_addr);
> 
>   rxq_desc.fields.flags_len = IBMVETH_BUF_VALID |
>   adapter->rx_queue.queue_len;
> @@ -1476,7 +1465,7 @@ static void ibmveth_set_multicast_list(struct 
> net_device *netdev)
>   netdev_for_each_mc_addr(ha, netdev) {
>   /* add the multicast address to the filter table */
>   u64 mcast_addr;
> - mcast_addr = ibmveth_encode_mac_addr(ha->addr);
> + mcast_addr = ether_addr_to_u64(ha->addr);
>   lpar_rc = h_multicast_ctrl(adapter->vdev->unit_address,
>  IbmVethMcastAddFilter,
>  mcast_addr);
> @@ -1606,7 +1595,7 @@ static int ibmveth_set_mac_addr(struct net_device *dev, 
> void *p)
>   if (!is_valid_ether_addr(addr->sa_data))
>   return -EADDRNOTAVAIL;
> 
> - mac_address = ibmveth_encode_mac_addr(addr->sa_data);
> + mac_address = ether_addr_to_u64(addr->sa_data);
>   rc = h_change_logical_lan_mac(adapter->vdev->unit_address, mac_address);
>   if (rc) {
>   netdev_err(adapter->netdev, "h_change_logical_lan_mac failed 
> with rc=%d\n", rc);
> 



Re: [PATCH] ibmvscsi: use GFP_KERNEL with dma_alloc_coherent in initialize_event_pool

2021-10-15 Thread Tyrel Datwyler
On 10/14/21 9:36 PM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> Just stumbled upon this trivial little patch that looks to have gotten lost 
>> in
>> the shuffle. Seems it even got a reviewed-by from Brian [1].
>>
>> So, uh I guess after almost 3 years...ping?
> 
> It's marked "Changes Requested" here:
> 
>   
> https://patchwork.kernel.org/project/linux-scsi/patch/1547089149-20577-1-git-send-email-tyr...@linux.vnet.ibm.com/

Interesting

> 
> 
> If it isn't picked up in a few days you should probably do a resend so
> that it reappears in patchwork.

Fair enough

-Tyrel

> 
> cheers
> 



Re: [PATCH] ibmvscsi: use GFP_KERNEL with dma_alloc_coherent in initialize_event_pool

2021-10-14 Thread Tyrel Datwyler
Just stumbled upon this trivial little patch that looks to have gotten lost in
the shuffle. Seems it even got a reviewed-by from Brian [1].

So, uh I guess after almost 3 years...ping?

-Tyrel

[1]
https://yhbt.net/lore/all/fd33df0e-012b-e437-c6e9-29cd08838...@linux.vnet.ibm.com/

On 01/09/2019 08:59 PM, Tyrel Datwyler wrote:
> During driver probe we allocate a dma region for our event pool.
> Currently, zero is passed for the gfp_flags parameter. Driver probe
> callbacks run in process context and we hold no locks so we can sleep
> here if necessary.
>
> Fix by passing GFP_KERNEL explicitly to dma_alloc_coherent().
>
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index cb8535e..10d5e77 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -465,7 +465,7 @@ static int initialize_event_pool(struct event_pool *pool,
>   pool->iu_storage =
>   dma_alloc_coherent(hostdata->dev,
>  pool->size * sizeof(*pool->iu_storage),
> -&pool->iu_token, 0);
> +&pool->iu_token, GFP_KERNEL);
>   if (!pool->iu_storage) {
>   kfree(pool->events);
>   return -ENOMEM;
>


Re: [PATCH v2] scsi: ibmvscsi: Use dma_alloc_noncoherent() instead of get_zeroed_page/dma_map_single()

2021-10-14 Thread Tyrel Datwyler
On 10/11/21 8:23 PM, Cai Huoqing wrote:
> Replacing get_zeroed_page/free_page/dma_map_single/dma_unmap_single()
> with dma_alloc_noncoherent/dma_free_noncoherent() helps to reduce
> code size, and simplify the code, and the hardware can keeep DMA
> coherent itsel
Not sure why the switch from coherent in v1 to noncoherent in v2. I think that
was unnecessary and I believe requires explicit synchronization via
dma_sync_single_{for_device|for_cpu} calls.

Further, as both kernel-bot and Nathan have already pointed out this doesn't
even compile.

-Tyrel

> 
> Signed-off-by: Cai Huoqing 
> ---
> v1->v2:
>   *Change to dma_alloc/free_noncoherent from dma_alloc/free_coherent.
>   *Update changelog.
> 
>  drivers/scsi/ibmvscsi/ibmvfc.c   | 16 
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 29 +
>  2 files changed, 13 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 1f1586ad48fe..6e95fd02fd25 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -869,8 +869,8 @@ static void ibmvfc_free_queue(struct ibmvfc_host *vhost,
>  {
>   struct device *dev = vhost->dev;
>  
> - dma_unmap_single(dev, queue->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> - free_page((unsigned long)queue->msgs.handle);
> + dma_free_noncoherent(dev, PAGE_SIZE, queue->msgs.handle,
> +  queue->msg_token, DMA_BIDIRECTIONAL);
>   queue->msgs.handle = NULL;
>  
>   ibmvfc_free_event_pool(vhost, queue);
> @@ -5663,19 +5663,11 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host 
> *vhost,
>   return -ENOMEM;
>   }
>  
> - queue->msgs.handle = (void *)get_zeroed_page(GFP_KERNEL);
> + queue->msgs.handle = dma_alloc_noncoherent(dev, PAGE_SIZE, 
> &queue->msg_token,
> +DMA_BIDIRECTIONAL, 
> GFP_KERNEL);
>   if (!queue->msgs.handle)
>   return -ENOMEM;
>  
> - queue->msg_token = dma_map_single(dev, queue->msgs.handle, PAGE_SIZE,
> -   DMA_BIDIRECTIONAL);
> -
> - if (dma_mapping_error(dev, queue->msg_token)) {
> - free_page((unsigned long)queue->msgs.handle);
> - queue->msgs.handle = NULL;
> - return -ENOMEM;
> - }
> -
>   queue->cur = 0;
>   queue->fmt = fmt;
>   queue->size = PAGE_SIZE / fmt_size;
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index ea8e01f49cba..68409c298c74 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -151,10 +151,8 @@ static void ibmvscsi_release_crq_queue(struct crq_queue 
> *queue,
>   msleep(100);
>   rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
>   } while ((rc == H_BUSY) || (H_IS_LONG_BUSY(rc)));
> - dma_unmap_single(hostdata->dev,
> -  queue->msg_token,
> -  queue->size * sizeof(*queue->msgs), DMA_BIDIRECTIONAL);
> - free_page((unsigned long)queue->msgs);
> + dma_free_noncoherent(hostdata->dev, PAGE_SIZE,
> +  queue->msgs, queue->msg_token, DMA_BIDIRECTIONAL);
>  }
>  
>  /**
> @@ -331,18 +329,12 @@ static int ibmvscsi_init_crq_queue(struct crq_queue 
> *queue,
>   int retrc;
>   struct vio_dev *vdev = to_vio_dev(hostdata->dev);
>  
> - queue->msgs = (struct viosrp_crq *)get_zeroed_page(GFP_KERNEL);
> -
> - if (!queue->msgs)
> - goto malloc_failed;
>   queue->size = PAGE_SIZE / sizeof(*queue->msgs);
> -
> - queue->msg_token = dma_map_single(hostdata->dev, queue->msgs,
> -   queue->size * sizeof(*queue->msgs),
> -   DMA_BIDIRECTIONAL);
> -
> - if (dma_mapping_error(hostdata->dev, queue->msg_token))
> - goto map_failed;
> + queue->msgs = dma_alloc_noncoherent(hostdata->dev,
> + PAGE_SIZE, &queue->msg_token,
> + DMA_BIDIRECTIONAL, GFP_KERNEL);
> + if (!queue->msg)
> + goto malloc_failed;
>  
>   gather_partition_info();
>   set_adapter_info(hostdata);
> @@ -395,11 +387,8 @@ static int ibmvscsi_init_crq_queue(struct crq_queue 
> *queue,
>   rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
>   } while ((rc == H_BUSY) || (H_IS_LONG_BUSY(rc)));
>reg_crq_failed:
> - dma_unmap_single(hostdata->dev,
> -  queue->msg_token,
> -  queue->size * sizeof(*queue->msgs), DMA_BIDIRECTIONAL);
> -  map_failed:
> - free_page((unsigned long)queue->msgs);
> + dma_free_noncoherent(hostdata->dev, PAGE_SIZE, queue->msg,
> +  queue->msg_token, DMA_BIDIRECTIONAL);
>malloc_failed:
>   return -1;
>  }
> 



Re: [PATCH] powerpc: fix unbalanced node refcount in check_kvm_guest()

2021-09-28 Thread Tyrel Datwyler
On 9/28/21 5:45 AM, Nathan Lynch wrote:
> When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
> returns without performing a matching put for the lookup, leaving the
> node's reference count elevated.
> 
> Add the necessary call to of_node_put(), rearranging the code slightly to
> avoid repetition or goto.
> 
> Signed-off-by: Nathan Lynch 
> Fixes: 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")

Reviewed-by: Tyrel Datwyler 


Re: [PATCH] pci/hotplug/pnv-php: Remove probable double put

2021-09-07 Thread Tyrel Datwyler
On 9/7/21 1:59 AM, Xu Wang wrote:
> Device node iterators put the previous value of the index variable,
> so an explicit put causes a double put.
> 
> Signed-off-by: Xu Wang 
> ---
>  drivers/pci/hotplug/pnv_php.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 04565162a449..ed4d1a2c3f22 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -158,7 +158,6 @@ static void pnv_php_detach_device_nodes(struct 
> device_node *parent)
>   for_each_child_of_node(parent, dn) {
>   pnv_php_detach_device_nodes(dn);
> 
> - of_node_put(dn);
>   of_detach_node(dn);

Are you sure this is a double put? This looks to me like its meant to drive tear
down of the device by putting a long term reference and not the short term get
that is part of the iterator.

-Tyrel

>   }
>  }
> 



[PATCH] ibmvfc: fix command state accounting and stale response detection

2021-07-16 Thread Tyrel Datwyler
Prior to commit 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside
the host/queue lock") responses to commands were completed sequentially
with the host lock held such that a command had a basic binary state of
active or free. It was therefore a simple affair of ensuring the
assocaiated ibmvfc_event to a VIOS response was valid by testing that it
was not already free. The lock relexation work to complete commands
outside the lock inadverdently made it a trinary command state such that
a command is either in flight, received and being completed, or
completed and now free. This breaks the stale command detection logic as
a command may be still marked active and been placed on the delayed
completion list when a second stale response for the same command
arrives. This can lead to double completions and list corruption. This
issue was exposed by a recent VIOS regression were a missing memory
barrier could occasionally result in the ibmvfc client receiveing a
duplicate response for the same command.

Fix the issue by introducing the atomic ibmvfc_event.active to track the
trinary state of a command. The state is explicitly set to 1 when a
command is successfully sent. The CRQ response handlers use
atomic_dec_if_positive() to test for stale responses and correctly
transition to the completion state when a active command is received.
Finally, atomic_dec_and_test() is used to sanity check transistions
when commands are freed as a result of a completion, or moved to the
purge list as a result of error handling or adapter reset.

Cc: sta...@vger.kernel.org
Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the host/queue 
lock")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 19 +--
 drivers/scsi/ibmvscsi/ibmvfc.h |  1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index bee1bec49c09..935b01ee44b7 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -807,6 +807,13 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host 
*vhost,
for (i = 0; i < size; ++i) {
struct ibmvfc_event *evt = &pool->events[i];
 
+   /*
+* evt->active states
+*  1 = in flight
+*  0 = being completed
+* -1 = free/freed
+*/
+   atomic_set(&evt->active, -1);
atomic_set(&evt->free, 1);
evt->crq.valid = 0x80;
evt->crq.ioba = cpu_to_be64(pool->iu_token + 
(sizeof(*evt->xfer_iu) * i));
@@ -1017,6 +1024,7 @@ static void ibmvfc_free_event(struct ibmvfc_event *evt)
 
BUG_ON(!ibmvfc_valid_event(pool, evt));
BUG_ON(atomic_inc_return(&evt->free) != 1);
+   BUG_ON(atomic_dec_and_test(&evt->active));
 
spin_lock_irqsave(&evt->queue->l_lock, flags);
list_add_tail(&evt->queue_list, &evt->queue->free);
@@ -1072,6 +1080,12 @@ static void ibmvfc_complete_purge(struct list_head 
*purge_list)
  **/
 static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
 {
+   /*
+* Anything we are failing should still be active. Otherwise, it
+* implies we already got a response for the command and are doing
+* something bad like double completing it.
+*/
+   BUG_ON(!atomic_dec_and_test(&evt->active));
if (evt->cmnd) {
evt->cmnd->result = (error_code << 16);
evt->done = ibmvfc_scsi_eh_done;
@@ -1723,6 +1737,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
 
evt->done(evt);
} else {
+   atomic_set(&evt->active, 1);
spin_unlock_irqrestore(&evt->queue->l_lock, flags);
ibmvfc_trc_start(evt);
}
@@ -3251,7 +3266,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
struct ibmvfc_host *vhost,
return;
}
 
-   if (unlikely(atomic_read(&evt->free))) {
+   if (unlikely(atomic_dec_if_positive(&evt->active))) {
dev_err(vhost->dev, "Received duplicate correlation_token 
0x%08llx!\n",
crq->ioba);
return;
@@ -3778,7 +3793,7 @@ static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, 
struct ibmvfc_host *vhost
return;
}
 
-   if (unlikely(atomic_read(&evt->free))) {
+   if (unlikely(atomic_dec_if_positive(&evt->active))) {
dev_err(vhost->dev, "Received duplicate correlation_token 
0x%08llx!\n",
crq->ioba);
return;
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 4f0f3baefae4..92fb889d7eb0 100644
--- a/drivers/scsi/ibmvs

Re: [PATCH 1/2] powerpc/prom_init: Convert prom_strcpy() into prom_strscpy_pad()

2021-06-22 Thread Tyrel Datwyler
On 6/21/21 9:11 PM, Michael Ellerman wrote:
> Daniel Axtens  writes:
>> Hi
>>
>>> -static char __init *prom_strcpy(char *dest, const char *src)
>>> +static ssize_t __init prom_strscpy_pad(char *dest, const char *src, size_t 
>>> n)
>>>  {
>>> -   char *tmp = dest;
>>> +   ssize_t rc;
>>> +   size_t i;
>>>  
>>> -   while ((*dest++ = *src++) != '\0')
>>> -   /* nothing */;
>>> -   return tmp;
>>> +   if (n == 0 || n > INT_MAX)
>>> +   return -E2BIG;
>>> +
>>> +   // Copy up to n bytes
>>> +   for (i = 0; i < n && src[i] != '\0'; i++)
>>> +   dest[i] = src[i];
>>> +
>>> +   rc = i;
>>> +
>>> +   // If we copied all n then we have run out of space for the nul
>>> +   if (rc == n) {
>>> +   // Rewind by one character to ensure nul termination
>>> +   i--;
>>> +   rc = -E2BIG;
>>> +   }
>>> +
>>> +   for (; i < n; i++)
>>> +   dest[i] = '\0';
>>> +
>>> +   return rc;
>>>  }
>>>  
>>
>> This implementation seems good to me.
>>
>> I copied it into a new C file and added the following:
>>
>> int main() {
>>  char longstr[255]="abcdefghijklmnopqrstuvwxyz";
>>  char shortstr[5];
>>  assert(prom_strscpy_pad(longstr, "", 0) == -E2BIG);
>>  assert(prom_strscpy_pad(longstr, "hello", 255) == 5);
>>  assert(prom_strscpy_pad(shortstr, "hello", 5) == -E2BIG);
>>  assert(memcmp(shortstr, "hell", 5) == 0);
>>  assert(memcmp(longstr, "hello\0\0\0\0\0\0\0\0\0", 6) == 0);
>>  return 0;
>> }
>>
>> All the assertions pass. I believe this covers all the conditions from
>> the strscpy_pad docstring.
>>
>> Reviewed-by: Daniel Axtens 
> 
> Thanks.
> 
> I'll also drop the explicit nul termination in patch 2, which is a
> leftover from when I was using strncpy().

I guess you can ignore my other email questioning this.

-Tyrel

> 
> cheers
> 



Re: [PATCH 2/2] powerpc/prom_init: Pass linux_banner to firmware via option vector 7

2021-06-22 Thread Tyrel Datwyler
On 6/20/21 11:49 PM, Michael Ellerman wrote:
> Pass the value of linux_banner to firmware via option vector 7.
> 
> Option vector 7 is described in "LoPAR" Linux on Power Architecture
> Reference v2.9, in table B.7 on page 824:
> 
>   An ASCII character formatted null terminated string that describes
>   the client operating system. The string shall be human readable and
>   may be displayed on the console.
> 
> The string can be up to 256 bytes total, including the nul terminator.
> 
> linux_banner contains lots of information, and should make it possible
> to identify the exact kernel version that is running:
> 
>   const char linux_banner[] =
>   "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
> 
> For example:
>   Linux version 4.15.0-144-generic (buildd@bos02-ppc64el-018) (gcc
>   version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #148-Ubuntu SMP Sat May 8
>   02:32:13 UTC 2021 (Ubuntu 4.15.0-144.148-generic 4.15.18)
> 
> It's also printed at boot to the console/dmesg, which should make it
> possible to correlate what firmware receives with the console/dmesg on
> the machine.
> 
> Signed-off-by: Michael Ellerman 
> ---
> 
> NB. linux_banner is already allowed by prom_init_check.sh
> 
> LoPAR: 
> https://openpowerfoundation.org/?resource_lib=linux-on-power-architecture-reference-a-papr-linux-subset-review-draft
> ---
>  arch/powerpc/kernel/prom_init.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index c18d55f8b951..7343076b261c 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -944,6 +945,10 @@ struct option_vector6 {
>   u8 os_name;
>  } __packed;
> 
> +struct option_vector7 {
> + u8 os_id[256];
> +} __packed;
> +
>  struct ibm_arch_vec {
>   struct { u32 mask, val; } pvrs[14];
> 
> @@ -966,6 +971,9 @@ struct ibm_arch_vec {
> 
>   u8 vec6_len;
>   struct option_vector6 vec6;
> +
> + u8 vec7_len;
> + struct option_vector7 vec7;
>  } __packed;
> 
>  static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = 
> {
> @@ -1112,6 +1120,9 @@ static const struct ibm_arch_vec 
> ibm_architecture_vec_template __initconst = {
>   .secondary_pteg = 0,
>   .os_name = OV6_LINUX,
>   },
> +
> + /* option vector 7: OS Identification */
> + .vec7_len = VECTOR_LENGTH(sizeof(struct option_vector7)),
>  };
> 
>  static struct ibm_arch_vec __prombss ibm_architecture_vec  
> cacheline_aligned;
> @@ -1340,6 +1351,10 @@ static void __init prom_check_platform_support(void)
>   memcpy(&ibm_architecture_vec, &ibm_architecture_vec_template,
>  sizeof(ibm_architecture_vec));
> 
> + prom_strscpy_pad(ibm_architecture_vec.vec7.os_id, linux_banner, 256);
> + // Ensure nul termination
> + ibm_architecture_vec.vec7.os_id[255] = '\0';
> +

Doesn't the implementation of prom_strscpy_pad() in patch 1 ensure nul 
termination?

-Tyrel

>   if (prop_len > 1) {
>   int i;
>   u8 vec[8];
> 



Re: [PATCH v3] pseries/drmem: update LMBs after LPM

2021-05-03 Thread Tyrel Datwyler
On 5/3/21 10:28 AM, Laurent Dufour wrote:
> Le 01/05/2021 à 01:58, Tyrel Datwyler a écrit :
>> On 4/30/21 9:13 AM, Laurent Dufour wrote:
>>> Le 29/04/2021 à 21:12, Tyrel Datwyler a écrit :
>>>> On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
>>>>> Laurent Dufour  writes:
>>>>>

Snip

>>
>> As of today I don't have a problem with your patch. This was more of me 
>> pointing
>> out things that I think are currently wrong with our memory hotplug
>> implementation, and that we need to take a long hard look at it down the 
>> road.
> 
> I do agree, there is a lot of odd things there to address in this area.
> If you're ok with that patch, do you mind to add a reviewed-by?
> 

Can you send a v4 with the fix for the duplicate update included?

-Tyrel


Re: [RFC] powerpc/pseries: delete scanlog

2021-05-03 Thread Tyrel Datwyler
On 5/3/21 10:18 AM, Nathan Lynch wrote:
> A commit from 2008 says this driver was relevant only for "older
> systems", and currently supported hardware doesn't have this
> facility. Get rid of it.

The only references I could find to scan log dump support are several Power 4+
systems, in particular the IntelliStation POWER 9114 and pSeries 615, which were
released in 2003 at the same time this code was originally introduced.

Historical Linux commit form February 2003:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

IntelliStation and pSeries docs:
http://ps-2.retropc.se/basil.holloway/ALL%20PDF/380635.pdf
http://ps-2.kev009.com/rs6000/manuals/p/p615-6C3-6E3/6C3_and_6E3_Users_Guide_SA38-0629.pdf

Current firmware RTAS implementations have no reference to ibm,scan-log-dump,
and a long standing developer for that code has no recollection of its 
existence.

This appears to be a straggler from RPA and Power 4 days. Based on my
understanding that we dropped support Power 4 in mainline this looks pretty
orphaned to me and a solid candidate for removal barring and insight from
someone else that knows better.

+1

Feel free to add my RB tag to any non-RFC followup.

Reviewed-by: Tyrel Datwyler 

> 
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/configs/ppc64_defconfig |   1 -
>  arch/powerpc/configs/pseries_defconfig   |   1 -
>  arch/powerpc/platforms/pseries/Kconfig   |   4 -
>  arch/powerpc/platforms/pseries/Makefile  |   1 -
>  arch/powerpc/platforms/pseries/scanlog.c | 195 ---
>  5 files changed, 202 deletions(-)
>  delete mode 100644 arch/powerpc/platforms/pseries/scanlog.c
> 
> diff --git a/arch/powerpc/configs/ppc64_defconfig 
> b/arch/powerpc/configs/ppc64_defconfig
> index 701811c91a6f..acf13b4917c4 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -26,7 +26,6 @@ CONFIG_PPC64=y
>  CONFIG_NR_CPUS=2048
>  CONFIG_PPC_SPLPAR=y
>  CONFIG_DTL=y
> -CONFIG_SCANLOG=m
>  CONFIG_PPC_SMLPAR=y
>  CONFIG_IBMEBUS=y
>  CONFIG_PPC_SVM=y
> diff --git a/arch/powerpc/configs/pseries_defconfig 
> b/arch/powerpc/configs/pseries_defconfig
> index 50168dde4ea5..d120321e4eea 100644
> --- a/arch/powerpc/configs/pseries_defconfig
> +++ b/arch/powerpc/configs/pseries_defconfig
> @@ -38,7 +38,6 @@ CONFIG_MODULE_SRCVERSION_ALL=y
>  CONFIG_PARTITION_ADVANCED=y
>  CONFIG_PPC_SPLPAR=y
>  CONFIG_DTL=y
> -CONFIG_SCANLOG=m
>  CONFIG_PPC_SMLPAR=y
>  CONFIG_IBMEBUS=y
>  CONFIG_PAPR_SCM=m
> diff --git a/arch/powerpc/platforms/pseries/Kconfig 
> b/arch/powerpc/platforms/pseries/Kconfig
> index 5e037df2a3a1..bf9b612a929b 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -61,10 +61,6 @@ config PSERIES_ENERGY
> Provides: /sys/devices/system/cpu/pseries_(de)activation_hint_list
> and /sys/devices/system/cpu/cpuN/pseries_(de)activation_hint
> 
> -config SCANLOG
> - tristate "Scanlog dump interface"
> - depends on RTAS_PROC && PPC_PSERIES
> -
>  config IO_EVENT_IRQ
>   bool "IO Event Interrupt support"
>   depends on PPC_PSERIES
> diff --git a/arch/powerpc/platforms/pseries/Makefile 
> b/arch/powerpc/platforms/pseries/Makefile
> index c8a2b0b05ac0..754d1102de08 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -8,7 +8,6 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
>  firmware.o power.o dlpar.o mobility.o rng.o \
>  pci.o pci_dlpar.o eeh_pseries.o msi.o
>  obj-$(CONFIG_SMP)+= smp.o
> -obj-$(CONFIG_SCANLOG)+= scanlog.o
>  obj-$(CONFIG_KEXEC_CORE) += kexec.o
>  obj-$(CONFIG_PSERIES_ENERGY) += pseries_energy.o
> 
> diff --git a/arch/powerpc/platforms/pseries/scanlog.c 
> b/arch/powerpc/platforms/pseries/scanlog.c
> deleted file mode 100644
> index 2879c4f0ceb7..
> --- a/arch/powerpc/platforms/pseries/scanlog.c
> +++ /dev/null
> @@ -1,195 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0-or-later
> -/*
> - *  c 2001 PPC 64 Team, IBM Corp
> - *
> - * scan-log-data driver for PPC64  Todd Inglett 
> - *
> - * When ppc64 hardware fails the service processor dumps internal state
> - * of the system.  After a reboot the operating system can access a dump
> - * of this data using this driver.  A dump exists if the device-tree
> - * /chosen/ibm,scan-log-data property exists.
> - *
> - * This driver exports /proc/powerpc/scan-log-dump which can be read.
> - * The driver supports only sequential reads.
> - *
> - * The driver looks at a write to the driv

Re: [PATCH v3] pseries/drmem: update LMBs after LPM

2021-04-30 Thread Tyrel Datwyler
On 4/30/21 9:13 AM, Laurent Dufour wrote:
> Le 29/04/2021 à 21:12, Tyrel Datwyler a écrit :
>> On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
>>> Laurent Dufour  writes:
>>>
>>>> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
>>>> updated by the hypervisor in the case the NUMA topology of the LPAR's
>>>> memory is updated.
>>>>
>>>> This is caught by the kernel, but the memory's node is updated because
>>>> there is no way to move a memory block between nodes.
>>>>
>>>> If later a memory block is added or removed, drmem_update_dt() is called
>>>> and it is overwriting the DT node to match the added or removed LMB. But
>>>> the LMB's associativity node has not been updated after the DT node update
>>>> and thus the node is overwritten by the Linux's topology instead of the
>>>> hypervisor one.
>>>>
>>>> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
>>>> updated to force an update of the LMB's associativity.
>>>>
>>>> Cc: Tyrel Datwyler 
>>>> Signed-off-by: Laurent Dufour 
>>>> ---
>>>>
>>>> V3:
>>>>   - Check rd->dn->name instead of rd->dn->full_name
>>>> V2:
>>>>   - Take Tyrel's idea to rely on OF_RECONFIG_UPDATE_PROPERTY instead of
>>>>   introducing a new hook mechanism.
>>>> ---
>>>>   arch/powerpc/include/asm/drmem.h  |  1 +
>>>>   arch/powerpc/mm/drmem.c   | 35 +++
>>>>   .../platforms/pseries/hotplug-memory.c    |  4 +++
>>>>   3 files changed, 40 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/include/asm/drmem.h
>>>> b/arch/powerpc/include/asm/drmem.h
>>>> index bf2402fed3e0..4265d5e95c2c 100644
>>>> --- a/arch/powerpc/include/asm/drmem.h
>>>> +++ b/arch/powerpc/include/asm/drmem.h
>>>> @@ -111,6 +111,7 @@ int drmem_update_dt(void);
>>>>   int __init
>>>>   walk_drmem_lmbs_early(unsigned long node, void *data,
>>>>     int (*func)(struct drmem_lmb *, const __be32 **, void *));
>>>> +void drmem_update_lmbs(struct property *prop);
>>>>   #endif
>>>>     static inline void invalidate_lmb_associativity_index(struct drmem_lmb
>>>> *lmb)
>>>> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
>>>> index 9af3832c9d8d..f0a6633132af 100644
>>>> --- a/arch/powerpc/mm/drmem.c
>>>> +++ b/arch/powerpc/mm/drmem.c
>>>> @@ -307,6 +307,41 @@ int __init walk_drmem_lmbs_early(unsigned long node,
>>>> void *data,
>>>>   return ret;
>>>>   }
>>>>   +/*
>>>> + * Update the LMB associativity index.
>>>> + */
>>>> +static int update_lmb(struct drmem_lmb *updated_lmb,
>>>> +  __maybe_unused const __be32 **usm,
>>>> +  __maybe_unused void *data)
>>>> +{
>>>> +    struct drmem_lmb *lmb;
>>>> +
>>>> +    /*
>>>> + * Brut force there may be better way to fetch the LMB
>>>> + */
>>>> +    for_each_drmem_lmb(lmb) {
>>>> +    if (lmb->drc_index != updated_lmb->drc_index)
>>>> +    continue;
>>>> +
>>>> +    lmb->aa_index = updated_lmb->aa_index;
>>>> +    break;
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Update the LMB associativity index.
>>>> + *
>>>> + * This needs to be called when the hypervisor is updating the
>>>> + * dynamic-reconfiguration-memory node property.
>>>> + */
>>>> +void drmem_update_lmbs(struct property *prop)
>>>> +{
>>>> +    if (!strcmp(prop->name, "ibm,dynamic-memory"))
>>>> +    __walk_drmem_v1_lmbs(prop->value, NULL, NULL, update_lmb);
>>>> +    else if (!strcmp(prop->name, "ibm,dynamic-memory-v2"))
>>>> +    __walk_drmem_v2_lmbs(prop->value, NULL, NULL, update_lmb);
>>>> +}
>>>>   #endif
>>>>     static int init_drmem_lmb_size(struct device_node *dn)
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index 8377f1f7c

Re: [PATCH] ppc64/numa: consider the max numa node for migratable LPAR

2021-04-29 Thread Tyrel Datwyler
On 4/29/21 11:19 AM, Laurent Dufour wrote:
> When a LPAR is migratable, we should consider the maximum possible NUMA
> node instead the number of NUMA node from the actual system.
> 
> The DT property 'ibm,current-associativity-domains' is defining the maximum
> number of nodes the LPAR can see when running on that box. But if the LPAR
> is being migrated on another box, it may seen up to the nodes defined by
> 'ibm,max-associativity-domains'. So if a LPAR is migratable, that value
> should be used.
> 
> Unfortunately, there is no easy way to know if a LPAR is migratable or
> not. The hypervisor is exporting the property 'ibm,migratable-partition' in
> the case it set to migrate partition, but that would not mean that the
> current partition is migratable.

Wording is a little hard to follow for me here. From PAPR the
'ibm,migratable-partition' property presence indicates that the platform
supports the potential migration of the partition. I guess maybe the point is
that all migratable partitions define 'ibm,migratable-partition', but all
partitions that define 'ibm,migratable-partition' are not necessarily 
migratable.

-Tyrel

> 
> Without that patch, when a LPAR is started on a 2 nodes box and then
> migrated to a 3 nodes box, the hypervisor may spread the LPAR's CPUs on the
> 3rd node. In that case if a CPU from that 3rd node is added to the LPAR, it
> will be wrongly assigned to the node because the kernel has been set to use
> up to 2 nodes (the configuration of the departure node). With that patch
> applies, the CPU is correctly added to the 3rd node.
> 
> Cc: Srikar Dronamraju 
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/mm/numa.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index f2bf98bdcea2..673fa6e47850 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -893,7 +893,7 @@ static void __init setup_node_data(int nid, u64 
> start_pfn, u64 end_pfn)
>  static void __init find_possible_nodes(void)
>  {
>   struct device_node *rtas;
> - const __be32 *domains;
> + const __be32 *domains = NULL;
>   int prop_length, max_nodes;
>   u32 i;
> 
> @@ -909,9 +909,14 @@ static void __init find_possible_nodes(void)
>* it doesn't exist, then fallback on ibm,max-associativity-domains.
>* Current denotes what the platform can support compared to max
>* which denotes what the Hypervisor can support.
> +  *
> +  * If the LPAR is migratable, new nodes might be activated after a LPM,
> +  * so we should consider the max number in that case.
>*/
> - domains = of_get_property(rtas, "ibm,current-associativity-domains",
> - &prop_length);
> + if (!of_get_property(of_root, "ibm,migratable-partition", NULL))
> + domains = of_get_property(rtas,
> +   "ibm,current-associativity-domains",
> +   &prop_length);
>   if (!domains) {
>   domains = of_get_property(rtas, "ibm,max-associativity-domains",
>   &prop_length);
> @@ -920,6 +925,9 @@ static void __init find_possible_nodes(void)
>   }
> 
>   max_nodes = of_read_number(&domains[min_common_depth], 1);
> + printk(KERN_INFO "Partition configured for %d NUMA nodes.\n",
> +max_nodes);
> +
>   for (i = 0; i < max_nodes; i++) {
>   if (!node_possible(i))
>   node_set(i, node_possible_map);
> 



Re: [PATCH v3] pseries/drmem: update LMBs after LPM

2021-04-29 Thread Tyrel Datwyler
On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
> Laurent Dufour  writes:
> 
>> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
>> updated by the hypervisor in the case the NUMA topology of the LPAR's
>> memory is updated.
>>
>> This is caught by the kernel, but the memory's node is updated because
>> there is no way to move a memory block between nodes.
>>
>> If later a memory block is added or removed, drmem_update_dt() is called
>> and it is overwriting the DT node to match the added or removed LMB. But
>> the LMB's associativity node has not been updated after the DT node update
>> and thus the node is overwritten by the Linux's topology instead of the
>> hypervisor one.
>>
>> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
>> updated to force an update of the LMB's associativity.
>>
>> Cc: Tyrel Datwyler 
>> Signed-off-by: Laurent Dufour 
>> ---
>>
>> V3:
>>  - Check rd->dn->name instead of rd->dn->full_name
>> V2:
>>  - Take Tyrel's idea to rely on OF_RECONFIG_UPDATE_PROPERTY instead of
>>  introducing a new hook mechanism.
>> ---
>>  arch/powerpc/include/asm/drmem.h  |  1 +
>>  arch/powerpc/mm/drmem.c   | 35 +++
>>  .../platforms/pseries/hotplug-memory.c|  4 +++
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/drmem.h 
>> b/arch/powerpc/include/asm/drmem.h
>> index bf2402fed3e0..4265d5e95c2c 100644
>> --- a/arch/powerpc/include/asm/drmem.h
>> +++ b/arch/powerpc/include/asm/drmem.h
>> @@ -111,6 +111,7 @@ int drmem_update_dt(void);
>>  int __init
>>  walk_drmem_lmbs_early(unsigned long node, void *data,
>>int (*func)(struct drmem_lmb *, const __be32 **, void *));
>> +void drmem_update_lmbs(struct property *prop);
>>  #endif
>>  
>>  static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
>> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
>> index 9af3832c9d8d..f0a6633132af 100644
>> --- a/arch/powerpc/mm/drmem.c
>> +++ b/arch/powerpc/mm/drmem.c
>> @@ -307,6 +307,41 @@ int __init walk_drmem_lmbs_early(unsigned long node, 
>> void *data,
>>  return ret;
>>  }
>>  
>> +/*
>> + * Update the LMB associativity index.
>> + */
>> +static int update_lmb(struct drmem_lmb *updated_lmb,
>> +  __maybe_unused const __be32 **usm,
>> +  __maybe_unused void *data)
>> +{
>> +struct drmem_lmb *lmb;
>> +
>> +/*
>> + * Brut force there may be better way to fetch the LMB
>> + */
>> +for_each_drmem_lmb(lmb) {
>> +if (lmb->drc_index != updated_lmb->drc_index)
>> +continue;
>> +
>> +lmb->aa_index = updated_lmb->aa_index;
>> +break;
>> +}
>> +return 0;
>> +}
>> +
>> +/*
>> + * Update the LMB associativity index.
>> + *
>> + * This needs to be called when the hypervisor is updating the
>> + * dynamic-reconfiguration-memory node property.
>> + */
>> +void drmem_update_lmbs(struct property *prop)
>> +{
>> +if (!strcmp(prop->name, "ibm,dynamic-memory"))
>> +__walk_drmem_v1_lmbs(prop->value, NULL, NULL, update_lmb);
>> +else if (!strcmp(prop->name, "ibm,dynamic-memory-v2"))
>> +__walk_drmem_v2_lmbs(prop->value, NULL, NULL, update_lmb);
>> +}
>>  #endif
>>  
>>  static int init_drmem_lmb_size(struct device_node *dn)
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 8377f1f7c78e..672ffbee2e78 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -949,6 +949,10 @@ static int pseries_memory_notifier(struct 
>> notifier_block *nb,
>>  case OF_RECONFIG_DETACH_NODE:
>>  err = pseries_remove_mem_node(rd->dn);
>>  break;
>> +case OF_RECONFIG_UPDATE_PROPERTY:
>> +if (!strcmp(rd->dn->name,
>> +"ibm,dynamic-reconfiguration-memory"))
>> +drmem_update_lmbs(rd->prop);
>>  }
>>  return notifier_from_errno(err);
> 
> How will this interact with DLPAR memory? When we dlpar memory,
> ibm,configure-connector is use

Re: [PATCH v6] powerpc/kexec_file: use current CPU info while setting up FDT

2021-04-27 Thread Tyrel Datwyler
On 4/26/21 9:51 PM, Sourabh Jain wrote:
> kexec_file_load uses initial_boot_params in setting up the device-tree
> for the kernel to be loaded. Though initial_boot_params holds info
> about CPUs at the time of boot, it doesn't account for hot added CPUs.
> 
> So, kexec'ing with kexec_file_load syscall would leave the kexec'ed
> kernel with inaccurate CPU info. Also, if kdump kernel is loaded with
> kexec_file_load syscall and the system crashes on a hot added CPU,
> capture kernel hangs failing to identify the boot CPU.
> 
>  Kernel panic - not syncing: sysrq triggered crash
>  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
>  Call Trace:
>  [c000e590fac0] [c07b2400] dump_stack+0xc4/0x114 (unreliable)
>  [c000e590fb00] [c0145290] panic+0x16c/0x41c
>  [c000e590fba0] [c08892e0] sysrq_handle_crash+0x30/0x40
>  [c000e590fc00] [c0889cdc] __handle_sysrq+0xcc/0x1f0
>  [c000e590fca0] [c088a538] write_sysrq_trigger+0xd8/0x178
>  [c000e590fce0] [c05e9b7c] proc_reg_write+0x10c/0x1b0
>  [c000e590fd10] [c04f26d0] vfs_write+0xf0/0x330
>  [c000e590fd60] [c04f2aec] ksys_write+0x7c/0x140
>  [c000e590fdb0] [c0031ee0] system_call_exception+0x150/0x290
>  [c000e590fe10] [c000ca5c] system_call_common+0xec/0x278
>  --- interrupt: c00 at 0x7fff905b9664
>  NIP:  7fff905b9664 LR: 7fff905320c4 CTR: 
>  REGS: c000e590fe80 TRAP: 0c00   Not tainted  (5.12.0-rc5upstream)
>  MSR:  8280f033   CR: 28000242
>XER: 
>  IRQMASK: 0
>  GPR00: 0004 75fedf30 7fff906a7300 0001
>  GPR04: 01002a7355b0 0002 0001 75fef616
>  GPR08: 0001   
>  GPR12:  7fff9073a160  
>  GPR16:    
>  GPR20:  7fff906a4ee0 0002 0001
>  GPR24: 7fff906a0898  0002 01002a7355b0
>  GPR28: 0002 7fff906a1790 01002a7355b0 0002
>  NIP [7fff905b9664] 0x7fff905b9664
>  LR [7fff905320c4] 0x7fff905320c4
>  --- interrupt: c00
> 
> To avoid this from happening, extract current CPU info from of_root
> device node and use it for setting up the fdt in kexec_file_load case.
> 
> Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory 
> reserve map")
> 
> Signed-off-by: Sourabh Jain 
> Reviewed-by: Hari Bathini 
> Cc: 
> ---
>  arch/powerpc/kexec/file_load_64.c | 88 +++
>  1 file changed, 88 insertions(+)
> 
>  ---
> Changelog:
> 
> v1 -> v5
>   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-April/227950.html
> 
> v5 -> v6
>   - use exiting macro (for_each_property_of_node) to loop through all
> properties of a node.
>   - removed devtree_lock while accessing the node properties.
>   - function name update, add_node_prop to add_node_props.
>  ---
> 
> diff --git a/arch/powerpc/kexec/file_load_64.c 
> b/arch/powerpc/kexec/file_load_64.c
> index 02b9e4d0dc40..4f7d4c10f939 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -960,6 +960,89 @@ unsigned int kexec_fdt_totalsize_ppc64(struct kimage 
> *image)
>   return fdt_size;
>  }
> 
> +/**
> + * add_node_props - Reads node properties from device node structure and add
> + *  them to fdt.
> + * @fdt:Flattened device tree of the kernel
> + * @node_offset:offset of the node to add a property at
> + * @dn: device node pointer
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int add_node_props(void *fdt, int node_offset, const struct 
> device_node *dn)
> +{
> + int ret = 0;
> + struct property *pp;
> +
> + if (!dn)
> + return -EINVAL;
> +
> + for_each_property_of_node(dn, pp) {
> + ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, 
> pp->length);
> + if (ret < 0) {
> + pr_err("Unable to add %s property: %s\n", pp->name, 
> fdt_strerror(ret));
> + return ret;
> + }
> + }
> + return ret;
> +}
> +
> +/**
> + * update_cpus_node - Update cpus node of flattened device tree using of_root
> + *device node.
> + * @fdt:  Flattened device tree of the kernel.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int update_cpus_node(void *fdt)
> +{
> + struct device_node *cpus_node, *dn;
> + int cpus_offset, cpus_subnode_offset, ret = 0;
> +
> + cpus_offset = fdt_path_offset(fdt, "/cpus");
> + if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
> + pr_err("Malformed device tree: error reading /cpus node: %s\n",
> +

Re: [PATCH] pseries/drmem: update LMBs after LPM

2021-04-27 Thread Tyrel Datwyler
On 4/27/21 11:13 AM, Laurent Dufour wrote:
> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
> updated by the hypervisor in the case the NUMA topology of the LPAR's
> memory is updated.
> 
> This is caught by the kernel, but the memory's node is updated because
> there is no way to move a memory block between nodes.
> 
> If later a memory block is added or removed, drmem_update_dt() is called
> and it is overwriting the DT node to match the added or removed LMB. But
> the LMB's associativity node has not been updated after the DT node update
> and thus the node is overwritten by the Linux's topology instead of the
> hypervisor one.
> 
> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
> updated to force an update of the LMB's associativity.
> 
> Cc: Tyrel Datwyler 
> Signed-off-by: Laurent Dufour 
> 
> Change since V1:
>  - Take Tyrel's idea to rely on OF_RECONFIG_UPDATE_PROPERTY instead of
>  introducing a new hook mechanism.
> ---
>  arch/powerpc/include/asm/drmem.h  |  1 +
>  arch/powerpc/mm/drmem.c   | 35 +++
>  .../platforms/pseries/hotplug-memory.c|  4 +++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h 
> b/arch/powerpc/include/asm/drmem.h
> index bf2402fed3e0..4265d5e95c2c 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -111,6 +111,7 @@ int drmem_update_dt(void);
>  int __init
>  walk_drmem_lmbs_early(unsigned long node, void *data,
> int (*func)(struct drmem_lmb *, const __be32 **, void *));
> +void drmem_update_lmbs(struct property *prop);
>  #endif
>  
>  static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 9af3832c9d8d..f0a6633132af 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -307,6 +307,41 @@ int __init walk_drmem_lmbs_early(unsigned long node, 
> void *data,
>   return ret;
>  }
>  
> +/*
> + * Update the LMB associativity index.
> + */
> +static int update_lmb(struct drmem_lmb *updated_lmb,
> +   __maybe_unused const __be32 **usm,
> +   __maybe_unused void *data)
> +{
> + struct drmem_lmb *lmb;
> +
> + /*
> +  * Brut force there may be better way to fetch the LMB
> +  */
> + for_each_drmem_lmb(lmb) {
> + if (lmb->drc_index != updated_lmb->drc_index)
> + continue;
> +
> + lmb->aa_index = updated_lmb->aa_index;
> + break;
> + }
> + return 0;
> +}
> +
> +/*
> + * Update the LMB associativity index.
> + *
> + * This needs to be called when the hypervisor is updating the
> + * dynamic-reconfiguration-memory node property.
> + */
> +void drmem_update_lmbs(struct property *prop)
> +{
> + if (!strcmp(prop->name, "ibm,dynamic-memory"))
> + __walk_drmem_v1_lmbs(prop->value, NULL, NULL, update_lmb);
> + else if (!strcmp(prop->name, "ibm,dynamic-memory-v2"))
> + __walk_drmem_v2_lmbs(prop->value, NULL, NULL, update_lmb);
> +}
>  #endif
>  
>  static int init_drmem_lmb_size(struct device_node *dn)
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 8377f1f7c78e..8aabaafc484b 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -949,6 +949,10 @@ static int pseries_memory_notifier(struct notifier_block 
> *nb,
>   case OF_RECONFIG_DETACH_NODE:
>   err = pseries_remove_mem_node(rd->dn);
>   break;
> + case OF_RECONFIG_UPDATE_PROPERTY:
> + if (!strcmp(rd->dn->full_name,

Pretty much a self nit on myself since I just copied the device node name field
from your initial patch into my suggested code block.

It used to be that dn->full_name was intended to store the full device-tree path
name of the device node ane dn->name simply the base name. These days the values
of both name fields are simply the basename for pseries. Regardless,
rd->dn->name is technically correct and shorter.

-Tyrel

> + "ibm,dynamic-reconfiguration-memory"))
> + drmem_update_lmbs(rd->prop);
>   }
>   return notifier_from_errno(err);
>  }
> 



Re: [PATCH] pseries/drmem: update LMBs after LPM

2021-04-27 Thread Tyrel Datwyler
On 4/27/21 8:01 AM, Laurent Dufour wrote:
> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
> updated by the hypervisor in the case the NUMA topology of the LPAR's
> memory is updated.
> 
> This is caught by the kernel, but the memory's node is updated because
> there is no way to move a memory block between nodes.
> 
> If later a memory block is added or removed, drmem_update_dt() is called
> and it is overwriting the DT node to match the added or removed LMB. But
> the LMB's associativity node has not been updated after the DT node update
> and thus the node is overwritten by the Linux's topology instead of the
> hypervisor one.
> 
> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
> updated to force an update of the LMB's associativity.
> 
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/include/asm/drmem.h  |  1 +
>  arch/powerpc/mm/drmem.c   | 48 +++
>  arch/powerpc/platforms/pseries/mobility.c |  9 +
>  3 files changed, 58 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h 
> b/arch/powerpc/include/asm/drmem.h
> index bf2402fed3e0..55c2c25085b0 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -111,6 +111,7 @@ int drmem_update_dt(void);
>  int __init
>  walk_drmem_lmbs_early(unsigned long node, void *data,
> int (*func)(struct drmem_lmb *, const __be32 **, void *));
> +void drmem_update_lmbs(void);
>  #endif
> 
>  static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 9af3832c9d8d..46074bdfdb3c 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -307,6 +307,54 @@ int __init walk_drmem_lmbs_early(unsigned long node, 
> void *data,
>   return ret;
>  }
> 
> +/*
> + * Update the LMB associativity index.
> + */
> +static int update_lmb(struct drmem_lmb *updated_lmb,
> +   __maybe_unused const __be32 **usm,
> +   __maybe_unused void *data)
> +{
> + struct drmem_lmb *lmb;
> +
> + /*
> +  * Brut force there may be better way to fetch the LMB
> +  */
> + for_each_drmem_lmb(lmb) {
> + if (lmb->drc_index != updated_lmb->drc_index)
> + continue;
> +
> + lmb->aa_index = updated_lmb->aa_index;
> + break;
> + }
> + return 0;
> +}
> +
> +/*
> + * Update the LMB associativity index.
> + *
> + * This needs to be called when the hypervisor is updating the
> + * dynamic-reconfiguration-memory node property.
> + */
> +void drmem_update_lmbs(void)
> +{
> + struct device_node *node;
> + const __be32 *prop;
> +
> + node = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> + if (!node)
> + return;
> +
> + prop = of_get_property(node, "ibm,dynamic-memory", NULL);
> + if (prop) {
> + __walk_drmem_v1_lmbs(prop, NULL, NULL, update_lmb);
> + } else {
> + prop = of_get_property(node, "ibm,dynamic-memory-v2", NULL);
> + if (prop)
> + __walk_drmem_v2_lmbs(prop, NULL, NULL, update_lmb);
> + }
> +
> + of_node_put(node);
> +}
>  #endif
> 
>  static int init_drmem_lmb_size(struct device_node *dn)
> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
> b/arch/powerpc/platforms/pseries/mobility.c
> index ea4d6a660e0d..c68eccc6e8df 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -25,6 +25,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include "pseries.h"
>  #include "../../kernel/cacheinfo.h"
> 
> @@ -237,6 +238,7 @@ int pseries_devicetree_update(s32 scope)
>   __be32 *data;
>   int update_nodes_token;
>   int rc;
> + bool drmem_updated = false;
> 
>   update_nodes_token = rtas_token("ibm,update-nodes");
>   if (update_nodes_token == RTAS_UNKNOWN_SERVICE)
> @@ -271,6 +273,10 @@ int pseries_devicetree_update(s32 scope)
>   continue;
>   }
> 
> + if (!strcmp(np->full_name,
> + 
> "ibm,dynamic-reconfiguration-memory"))
> + drmem_updated = true;

Is there a reason that we can't use the existing pseries_memory_notifier()
callback in pseries/hotplug-memory.c to trigger the drmem_update_lmbs() when
either the ibm,dynamic-memory or ibm,dynamic-memory-v2 properties are updated?

Something like:

static int pseries_memory_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
{
struct of_reconfig_data *rd = data;
int err = 0;

switch (action) {
case OF_RECONFIG_ATTACH_NODE:
err = pseries_add_mem_node(rd->dn);
break;
case OF_RECONFIG_DETAC

Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-19 Thread Tyrel Datwyler
On 4/17/21 5:30 AM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
>>> Currently, neither the vio_bus or vio_driver structures provide support
>>> for a shutdown() routine.
>>>
>>> Add support for shutdown() by allowing drivers to provide a
>>> implementation via function pointer in their vio_driver struct and
>>> provide a proper implementation in the driver template for the vio_bus
>>> that calls a vio drivers shutdown() if defined.
>>>
>>> In the case that no shutdown() is defined by a vio driver and a kexec is
>>> in progress we implement a big hammer that calls remove() to ensure no
>>> further DMA for the devices is possible.
>>>
>>> Signed-off-by: Tyrel Datwyler 
>>> ---
>>
>> Ping... any comments, problems with this approach?
> 
> The kexec part seems like a bit of a hack.
> 
> It also doesn't help for kdump, when none of the shutdown code is run.

If I understand correctly for kdump we have a reserved memory space where the
kdump kernel is loaded, but for kexec the memory region isn't reserved ahead of
time meaning we can try and load the kernel over potential memory used for DMA
by the current kernel. Please correct me if I've got that wrong.

> 
> How many drivers do we have? Can we just implement a proper shutdown for
> them?

Well that is the end goal. I just don't currently have the bandwidth to do each
driver myself with a proper shutdown sequence, and thought this was a launching
off point to at least introduce the shutdown callback to the VIO bus.

Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, vmc,
pseries_rng, nx, nx842, hvcs, hvc_vio.

I can drop the kexec_in_progress hammer and just have each driver call remove()
themselves in their shutdown function. Leave it to each maintainer to decide if
remove() is enough or if there is a more lightweight quiesce sequence they
choose to implement.

-Tyrel

> 
> cheers
> 



Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-16 Thread Tyrel Datwyler
On 4/16/21 12:15 AM, Daniel Axtens wrote:
> Hi Tyrel,
> 
>> The pci_bus->bridge reference may no longer be valid after
>> pci_bus_remove() resulting in passing a bad value to device_unregister()
>> for the associated bridge device.
>>
>> Store the host_bridge reference in a separate variable prior to
>> pci_bus_remove().
>>
> The patch certainly seems to do what you say. I'm not really up on the
> innards of PCI, so I'm struggling to figure out by what code path
> pci_bus_remove() might invalidate pci_bus->bridge? A quick look at
> pci_remove_bus was not very illuminating but I didn't chase down every
> call it made.

remove_phb_dynamic()
|--> pci_remove_bus(bus)
 |--> device_unregister(&bus->dev)
  |--> put_device(dev)
   |--> device_release(kobj)
|--> dev->class->dev_release(dev) == release_pci_bus(dev)
 |--> kfree(bus)

We have the above call chain that takes place in the when put_device() triggers
the kobject ref count to go zero. The kobject_release function in this case is
device_release() which in turn calls dev->class->dev_release(dev). For a pci_bus
the class is appropriately pcibus_class whose dev_release() callback points to
release_pci_bus(). This in turn calls kfree() on the bus. Which means we can no
longer safely dereference any fields of the pci_bus struct.

-Tyrel

> 
> Kind regards,
> Daniel
> 
>> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
>> during PHB removal")
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
>> b/arch/powerpc/platforms/pseries/pci_dlpar.c
>> index f9ae17e8a0f4..a8f9140a24fa 100644
>> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
>> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic);
>>  int remove_phb_dynamic(struct pci_controller *phb)
>>  {
>>  struct pci_bus *b = phb->bus;
>> +struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge);
>>  struct resource *res;
>>  int rc, i;
>>  
>> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
>>  /* Remove the PCI bus and unregister the bridge device from sysfs */
>>  phb->bus = NULL;
>>  pci_remove_bus(b);
>> -device_unregister(b->bridge);
>> +host_bridge->bus = NULL;
>> +device_unregister(&host_bridge->dev);
>>  
>>  /* Now release the IO resource */
>>  if (res->flags & IORESOURCE_IO)
>> -- 
>> 2.27.0



Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-16 Thread Tyrel Datwyler
On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
> Currently, neither the vio_bus or vio_driver structures provide support
> for a shutdown() routine.
> 
> Add support for shutdown() by allowing drivers to provide a
> implementation via function pointer in their vio_driver struct and
> provide a proper implementation in the driver template for the vio_bus
> that calls a vio drivers shutdown() if defined.
> 
> In the case that no shutdown() is defined by a vio driver and a kexec is
> in progress we implement a big hammer that calls remove() to ensure no
> further DMA for the devices is possible.
> 
> Signed-off-by: Tyrel Datwyler 
> ---

Ping... any comments, problems with this approach?

-Tyrel


Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-13 Thread Tyrel Datwyler
On 2/11/21 10:24 AM, Tyrel Datwyler wrote:
> The pci_bus->bridge reference may no longer be valid after
> pci_bus_remove() resulting in passing a bad value to device_unregister()
> for the associated bridge device.
> 
> Store the host_bridge reference in a separate variable prior to
> pci_bus_remove().
> 
> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
> during PHB removal")
> Signed-off-by: Tyrel Datwyler 

Ping?


[PATCH] ibmvfc: Fix invalid state machine BUG_ON

2021-04-12 Thread Tyrel Datwyler
From: Brian King 

This fixes an issue hitting the BUG_ON in ibmvfc_do_work. When
going through a host action of IBMVFC_HOST_ACTION_RESET,
we change the action to IBMVFC_HOST_ACTION_TGT_DEL,
then drop the host lock, and reset the CRQ, which changes
the host state to IBMVFC_NO_CRQ. If, prior to setting the
host state to IBMVFC_NO_CRQ, ibmvfc_init_host is called,
it can then end up changing the host action to IBMVFC_HOST_ACTION_INIT.
If we then change the host state to IBMVFC_NO_CRQ, we will then
hit the BUG_ON. This patch makes a couple of changes to avoid this.
It leaves the host action to be IBMVFC_HOST_ACTION_RESET
or IBMVFC_HOST_ACTION_REENABLE until after we drop the host
lock and reset or reenable the CRQ. It also hardens the
host state machine to ensure we cannot leave the reset / reenable
state until we've finished processing the reset or reenable.

Fixes: 73ee5d867287 ("[SCSI] ibmvfc: Fix soft lockup on resume")
Signed-off-by: Brian King 
[tyreld: added fixes tag]
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 53 ++
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 61831f2fdb30..f813608d74cc 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -603,8 +603,17 @@ static void ibmvfc_set_host_action(struct ibmvfc_host 
*vhost,
if (vhost->action == IBMVFC_HOST_ACTION_ALLOC_TGTS)
vhost->action = action;
break;
+   case IBMVFC_HOST_ACTION_REENABLE:
+   case IBMVFC_HOST_ACTION_RESET:
+   vhost->action = action;
+   break;
case IBMVFC_HOST_ACTION_INIT:
case IBMVFC_HOST_ACTION_TGT_DEL:
+   case IBMVFC_HOST_ACTION_LOGO:
+   case IBMVFC_HOST_ACTION_QUERY_TGTS:
+   case IBMVFC_HOST_ACTION_TGT_DEL_FAILED:
+   case IBMVFC_HOST_ACTION_NONE:
+   default:
switch (vhost->action) {
case IBMVFC_HOST_ACTION_RESET:
case IBMVFC_HOST_ACTION_REENABLE:
@@ -614,15 +623,6 @@ static void ibmvfc_set_host_action(struct ibmvfc_host 
*vhost,
break;
}
break;
-   case IBMVFC_HOST_ACTION_LOGO:
-   case IBMVFC_HOST_ACTION_QUERY_TGTS:
-   case IBMVFC_HOST_ACTION_TGT_DEL_FAILED:
-   case IBMVFC_HOST_ACTION_NONE:
-   case IBMVFC_HOST_ACTION_RESET:
-   case IBMVFC_HOST_ACTION_REENABLE:
-   default:
-   vhost->action = action;
-   break;
}
 }
 
@@ -5373,30 +5373,45 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
case IBMVFC_HOST_ACTION_INIT_WAIT:
break;
case IBMVFC_HOST_ACTION_RESET:
-   vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
list_splice_init(&vhost->purge, &purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_complete_purge(&purge);
rc = ibmvfc_reset_crq(vhost);
+
spin_lock_irqsave(vhost->host->host_lock, flags);
-   if (rc == H_CLOSED)
+   if (!rc || rc == H_CLOSED)
vio_enable_interrupts(to_vio_dev(vhost->dev));
-   if (rc || (rc = ibmvfc_send_crq_init(vhost)) ||
-   (rc = vio_enable_interrupts(to_vio_dev(vhost->dev {
-   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
-   dev_err(vhost->dev, "Error after reset (rc=%d)\n", rc);
+   if (vhost->action == IBMVFC_HOST_ACTION_RESET) {
+   /* The only action we could have changed to would have 
been reenable,
+in which case, we skip the rest of this path and wait 
until
+we've done the re-enable before sending the crq init */
+
+   vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
+
+   if (rc || (rc = ibmvfc_send_crq_init(vhost)) ||
+   (rc = 
vio_enable_interrupts(to_vio_dev(vhost->dev {
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   dev_err(vhost->dev, "Error after reset 
(rc=%d)\n", rc);
+   }
}
break;
case IBMVFC_HOST_ACTION_REENABLE:
-   vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
list_splice_init(&vhost->purge, &purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_complete_purge(&purge);
rc = ibmvfc_reenable_crq_queue(vhost);
+
spin_lock_irqsave(vhost->host->host_lock, flags);
-   if (rc || (rc = ibmvfc_send_crq_init(vhost))) {
-   ibmvfc

[PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-01 Thread Tyrel Datwyler
Currently, neither the vio_bus or vio_driver structures provide support
for a shutdown() routine.

Add support for shutdown() by allowing drivers to provide a
implementation via function pointer in their vio_driver struct and
provide a proper implementation in the driver template for the vio_bus
that calls a vio drivers shutdown() if defined.

In the case that no shutdown() is defined by a vio driver and a kexec is
in progress we implement a big hammer that calls remove() to ensure no
further DMA for the devices is possible.

Signed-off-by: Tyrel Datwyler 
---
 arch/powerpc/include/asm/vio.h   |  1 +
 arch/powerpc/platforms/pseries/vio.c | 16 
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/vio.h b/arch/powerpc/include/asm/vio.h
index 721c0d6715ac..e7479a4abf96 100644
--- a/arch/powerpc/include/asm/vio.h
+++ b/arch/powerpc/include/asm/vio.h
@@ -114,6 +114,7 @@ struct vio_driver {
const struct vio_device_id *id_table;
int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
void (*remove)(struct vio_dev *dev);
+   void (*shutdown)(struct vio_dev *dev);
/* A driver must have a get_desired_dma() function to
 * be loaded in a CMO environment if it uses DMA.
 */
diff --git a/arch/powerpc/platforms/pseries/vio.c 
b/arch/powerpc/platforms/pseries/vio.c
index 9cb4fc839fd5..d122b8644319 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1278,6 +1279,20 @@ static int vio_bus_remove(struct device *dev)
return 0;
 }
 
+static void vio_bus_shutdown(struct device *dev)
+{
+   struct vio_dev *viodev = to_vio_dev(dev);
+   struct vio_driver *viodrv;
+
+   if (dev->driver) {
+   viodrv = to_vio_driver(dev->driver);
+   if (viodrv->shutdown)
+   viodrv->shutdown(viodev);
+   else if (kexec_in_progress)
+   vio_bus_remove(dev);
+   }
+}
+
 /**
  * vio_register_driver: - Register a new vio driver
  * @viodrv:The vio_driver structure to be registered.
@@ -1613,6 +1628,7 @@ struct bus_type vio_bus_type = {
.match = vio_bus_match,
.probe = vio_bus_probe,
.remove = vio_bus_remove,
+   .shutdown = vio_bus_shutdown,
 };
 
 /**
-- 
2.27.0



Re: [PATCH] powerpc/pseries: Only register vio drivers if vio bus exists

2021-03-30 Thread Tyrel Datwyler
On 3/15/21 6:09 PM, Michael Ellerman wrote:
> The vio bus is a fake bus, which we use on pseries LPARs (guests) to
> discover devices provided by the hypervisor. There's no need or sense
> in creating the vio bus on bare metal systems.
> 
> Which is why commit 4336b9337824 ("powerpc/pseries: Make vio and
> ibmebus initcalls pseries specific") made the initialisation of the
> vio bus only happen in LPARs.
> 
> However as a result of that commit we now see errors at boot on bare
> metal systems:
> 
>   Driver 'hvc_console' was unable to register with bus_type 'vio' because the 
> bus was not initialized.
>   Driver 'tpm_ibmvtpm' was unable to register with bus_type 'vio' because the 
> bus was not initialized.
> 
> This happens because those drivers are built-in, and are calling
> vio_register_driver(). It in turn calls driver_register() with a
> reference to vio_bus_type, but we haven't registered vio_bus_type with
> the driver core.
> 
> Fix it by also guarding vio_register_driver() with a check to see if
> we are on pseries.
> 
> Fixes: 4336b9337824 ("powerpc/pseries: Make vio and ibmebus initcalls pseries 
> specific")
> Reported-by: Paul Menzel 
> Signed-off-by: Michael Ellerman 
> ---

Reviewed-by: Tyrel Datwyler 


[PATCH 1/2] ibmvfc: fix potential race in ibmvfc_wait_for_ops

2021-03-19 Thread Tyrel Datwyler
For various EH activities the ibmvfc driver uses ibmvfc_wait_for_ops()
to wait for the completion of commands that match a given criteria be it
cancel key, or specific LUN. With recent changes commands are completed
outside the lock in bulk by removing them from the sent list and adding
them to a private completion list. This introduces a potential race in
ibmvfc_wait_for_ops() since the criteria for a command to be oustsanding
is no longer simply being on the sent list, but instead not being on the
free list.

Avoid this race by scanning the entire command event pool and checking
that any matching command that ibmvfc needs to wait on is not already on
the free list.

Fixes: 1f4a4a19508d ("scsi: ibmvfc: Complete commands outside the host/queue 
lock")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 42 ++
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f140eafc0d6a..5414b465a92f 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2371,6 +2371,24 @@ static int ibmvfc_match_lun(struct ibmvfc_event *evt, 
void *device)
return 0;
 }
 
+/**
+ * ibmvfc_event_is_free - Check if event is free or not
+ * @evt:   ibmvfc event struct
+ *
+ * Returns:
+ * true / false
+ **/
+static bool ibmvfc_event_is_free(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_event *loop_evt;
+
+   list_for_each_entry(loop_evt, &evt->queue->free, queue_list)
+   if (loop_evt == evt)
+   return true;
+
+   return false;
+}
+
 /**
  * ibmvfc_wait_for_ops - Wait for ops to complete
  * @vhost: ibmvfc host struct
@@ -2385,7 +2403,7 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, 
void *device,
 {
struct ibmvfc_event *evt;
DECLARE_COMPLETION_ONSTACK(comp);
-   int wait;
+   int wait, i;
unsigned long flags;
signed long timeout = IBMVFC_ABORT_WAIT_TIMEOUT * HZ;
 
@@ -2393,10 +2411,13 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host 
*vhost, void *device,
do {
wait = 0;
spin_lock_irqsave(&vhost->crq.l_lock, flags);
-   list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
-   if (match(evt, device)) {
-   evt->eh_comp = ∁
-   wait++;
+   for (i = 0; i < vhost->crq.evt_pool.size; i++) {
+   evt = &vhost->crq.evt_pool.events[i];
+   if (!ibmvfc_event_is_free(evt)) {
+   if (match(evt, device)) {
+   evt->eh_comp = ∁
+   wait++;
+   }
}
}
spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
@@ -2407,10 +2428,13 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host 
*vhost, void *device,
if (!timeout) {
wait = 0;
spin_lock_irqsave(&vhost->crq.l_lock, flags);
-   list_for_each_entry(evt, &vhost->crq.sent, 
queue_list) {
-   if (match(evt, device)) {
-   evt->eh_comp = NULL;
-   wait++;
+   for (i = 0; i < vhost->crq.evt_pool.size; i++) {
+   evt = &vhost->crq.evt_pool.events[i];
+   if (!ibmvfc_event_is_free(evt)) {
+   if (match(evt, device)) {
+   evt->eh_comp = NULL;
+   wait++;
+   }
}
}
spin_unlock_irqrestore(&vhost->crq.l_lock, 
flags);
-- 
2.27.0



[PATCH 2/2] ibmvfc: make ibmvfc_wait_for_ops MQ aware

2021-03-19 Thread Tyrel Datwyler
During MQ enablement of the ibmvfc driver ibmvfc_wait_for_ops() was
missed. This function is responsible for waiting on commands to complete
that match a certain criteria such as LUN or cancel key. The
implementation as is only scans the CRQ for events ignoring any
sub-queues and as a result will exit successfully without doing
anything when operating in MQ channelized mode.

Check the mq and channel use flags to determine which queues are
applicable, and scan each queue accordingly. Note in MQ mode scsi
commands are only issued down sub-queues and the CRQ is only used for
driver specific management commands. As such the CRQ events are ignored
when operating in MQ mode with channels.

Fixes: 9000cb998bcf ("scsi: ibmvfc: Enable MQ and set reasonable defaults")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 51 ++
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 5414b465a92f..61831f2fdb30 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2403,41 +2403,58 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host 
*vhost, void *device,
 {
struct ibmvfc_event *evt;
DECLARE_COMPLETION_ONSTACK(comp);
-   int wait, i;
+   int wait, i, q_index, q_size;
unsigned long flags;
signed long timeout = IBMVFC_ABORT_WAIT_TIMEOUT * HZ;
+   struct ibmvfc_queue *queues;
 
ENTER;
+   if (vhost->mq_enabled && vhost->using_channels) {
+   queues = vhost->scsi_scrqs.scrqs;
+   q_size = vhost->scsi_scrqs.active_queues;
+   } else {
+   queues = &vhost->crq;
+   q_size = 1;
+   }
+
do {
wait = 0;
-   spin_lock_irqsave(&vhost->crq.l_lock, flags);
-   for (i = 0; i < vhost->crq.evt_pool.size; i++) {
-   evt = &vhost->crq.evt_pool.events[i];
-   if (!ibmvfc_event_is_free(evt)) {
-   if (match(evt, device)) {
-   evt->eh_comp = ∁
-   wait++;
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   for (q_index = 0; q_index < q_size; q_index++) {
+   spin_lock(&queues[q_index].l_lock);
+   for (i = 0; i < queues[q_index].evt_pool.size; i++) {
+   evt = &queues[q_index].evt_pool.events[i];
+   if (!ibmvfc_event_is_free(evt)) {
+   if (match(evt, device)) {
+   evt->eh_comp = ∁
+   wait++;
+   }
}
}
+   spin_unlock(&queues[q_index].l_lock);
}
-   spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
if (wait) {
timeout = wait_for_completion_timeout(&comp, timeout);
 
if (!timeout) {
wait = 0;
-   spin_lock_irqsave(&vhost->crq.l_lock, flags);
-   for (i = 0; i < vhost->crq.evt_pool.size; i++) {
-   evt = &vhost->crq.evt_pool.events[i];
-   if (!ibmvfc_event_is_free(evt)) {
-   if (match(evt, device)) {
-   evt->eh_comp = NULL;
-   wait++;
+   spin_lock_irqsave(vhost->host->host_lock, 
flags);
+   for (q_index = 0; q_index < q_size; q_index++) {
+   spin_lock(&queues[q_index].l_lock);
+   for (i = 0; i < 
queues[q_index].evt_pool.size; i++) {
+   evt = 
&queues[q_index].evt_pool.events[i];
+   if (!ibmvfc_event_is_free(evt)) 
{
+   if (match(evt, device)) 
{
+   evt->eh_comp = 
NULL;
+   wait++;
+   }
}
}
+   spin_unlock(&queues[q_index].l_lock);
  

[PATCH 0/2] Fix EH race and MQ support

2021-03-19 Thread Tyrel Datwyler
Changes to the locking pattern protecting the event lists and handling of scsi
command completion introduced a race where an ouststanding command that EH is
waiting ifor to complete is no longer identifiable by being on the sent list, 
but
instead as a command that is not on the free list. This is a result of moving
commands to be completed off the sent list to a private list to be completed
outside the list lock.

Second, during MQ enablement the ibmvfc_wait_for_ops helper used during EH to
ensure commands were properely completed failed to be converted to check for
commands on the sub-queues isntead of the primary CRQ.

Tyrel Datwyler (2):
  ibmvfc: fix potential race in ibmvfc_wait_for_ops
  ibmvfc: make ibmvfc_wait_for_ops MQ aware

 drivers/scsi/ibmvscsi/ibmvfc.c | 67 +++---
 1 file changed, 54 insertions(+), 13 deletions(-)

-- 
2.27.0



[PATCH v2] rpadlpar: fix potential drc_name corruption in store functions

2021-03-15 Thread Tyrel Datwyler
Both add_slot_store() and remove_slot_store() try to fix up the drc_name
copied from the store buffer by placing a NULL terminator at nbyte + 1
or in place of a '\n' if present. However, the static buffer that we
copy the drc_name data into is not zeored and can contain anything past
the n-th byte. This is problematic if a '\n' byte appears in that buffer
after nbytes and the string copied into the store buffer was not NULL
terminated to start with as the strchr() search for a '\n' byte will mark
this incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte. The following
debugging shows an example of the drmgr utility writing "PHB 4543" to
the add_slot sysfs attribute, but add_slot_store logging a corrupted
string value.

[135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
[135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by using strscpy() instead of memcpy() to ensure the string is
NULL terminated when copied into the static drc_name buffer. Further,
since the string is now NULL terminated the code only needs to change
'\n' to '\0' when present.

Signed-off-by: Tyrel Datwyler 
---

Changes in v2:
* use strscpy instead of memcpy (suggested by mpe)

 drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
b/drivers/pci/hotplug/rpadlpar_sysfs.c
index cdbfa5df3a51..dbfa0b55d31a 100644
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct 
kobj_attribute *attr,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   strscpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)
-- 
2.27.0



Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-15 Thread Tyrel Datwyler
On 3/14/21 7:52 PM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 3/13/21 1:17 AM, Michal Suchánek wrote:
>>> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
>>>> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
>>>> copied from the store buffer by placing a NULL terminator at nbyte + 1
>>>> or in place of a '\n' if present. However, the static buffer that we
>>>> copy the drc_name data into is not zeored and can contain anything past
>>>> the n-th byte. This is problematic if a '\n' byte appears in that buffer
>>>> after nbytes and the string copied into the store buffer was not NULL
>>>> terminated to start with as the strchr() search for a '\n' byte will mark
>>>> this incorrectly as the end of the drc_name string resulting in a drc_name
>>>> string that contains garbage data after the n-th byte. The following
>>>> debugging shows an example of the drmgr utility writing "PHB 4543" to
>>>> the add_slot sysfs attribute, but add_slot_store logging a corrupted
>>>> string value.
>>>>
>>>> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
>>>> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
>>>>
>>>> Fix this by NULL terminating the string when we copy it into our static
>>>> buffer by coping nbytes + 1 of data from the store buffer. The code has
>>> Why is it OK to copy nbytes + 1 and why is it expected that the buffer
>>> contains a nul after the content?
>>
>> It is my understanding that the store function buffer is allocated as a
>> zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user 
>> data
>> into. Anything after nbytes would therefore be zeroed.
> 
> I think that's true, but it would be nice if we didn't have to rely on
> that obscure detail in order for this code to be correct & understandable.

I think its a security guarantee, but I guess barring a comment that explicitly
outlines the correctness it probably isn't obvious.

> 
>>> Isn't it much saner to just nul terminate the string after copying?
>>
>> At the cost of an extra line of code, sure.
> 
> Is there a reason we can't use strscpy()? That should deal with all the
> corner cases around the string copy, and then all you have to do is look
> for a newline and turn it into nul.

Fine with me. I'll spin v2 with strscpy().

-Tyrel

> 
> cheers
> 



Re: [PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-14 Thread Tyrel Datwyler
On 3/13/21 1:17 AM, Michal Suchánek wrote:
> On Wed, Mar 10, 2021 at 04:30:21PM -0600, Tyrel Datwyler wrote:
>> Both add_slot_store() and remove_slot_store() try to fix up the drc_name
>> copied from the store buffer by placing a NULL terminator at nbyte + 1
>> or in place of a '\n' if present. However, the static buffer that we
>> copy the drc_name data into is not zeored and can contain anything past
>> the n-th byte. This is problematic if a '\n' byte appears in that buffer
>> after nbytes and the string copied into the store buffer was not NULL
>> terminated to start with as the strchr() search for a '\n' byte will mark
>> this incorrectly as the end of the drc_name string resulting in a drc_name
>> string that contains garbage data after the n-th byte. The following
>> debugging shows an example of the drmgr utility writing "PHB 4543" to
>> the add_slot sysfs attribute, but add_slot_store logging a corrupted
>> string value.
>>
>> [135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
>> [135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19
>>
>> Fix this by NULL terminating the string when we copy it into our static
>> buffer by coping nbytes + 1 of data from the store buffer. The code has
> Why is it OK to copy nbytes + 1 and why is it expected that the buffer
> contains a nul after the content?

It is my understanding that the store function buffer is allocated as a
zeroed-page which the kernel copies up to at most (PAGE_SIZE - 1) of user data
into. Anything after nbytes would therefore be zeroed.

> 
> Isn't it much saner to just nul terminate the string after copying?

At the cost of an extra line of code, sure.

-Tyrel

> 
> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
> b/drivers/pci/hotplug/rpadlpar_sysfs.c
> index cdbfa5df3a51..cfbad67447da 100644
> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c
> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
> @@ -35,11 +35,11 @@ static ssize_t add_slot_store(struct kobject *kobj, 
> struct kobj_attribute *attr,
>   return 0;
>  
>   memcpy(drc_name, buf, nbytes);
> + &drc_name[nbytes] = '\0';
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_add_slot(drc_name);
>   if (rc)
> @@ -66,11 +66,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
>   return 0;
>  
>   memcpy(drc_name, buf, nbytes);
> + &drc_name[nbytes] = '\0';
>  
>   end = strchr(drc_name, '\n');
> - if (!end)
> - end = &drc_name[nbytes];
> - *end = '\0';
> + if (end)
> + *end = '\0';
>  
>   rc = dlpar_remove_slot(drc_name);
>   if (rc)
> 
> Thanks
> 
> Michal
> 
>> already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store
>> buffer is guaranteed to be zeroed beyond the nth-byte of data copied
>> from the user. Further, since the string is now NULL terminated the code
>> only needs to change '\n' to '\0' when present.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
>> b/drivers/pci/hotplug/rpadlpar_sysfs.c
>> index cdbfa5df3a51..375087921284 100644
>> --- a/drivers/pci/hotplug/rpadlpar_sysfs.c
>> +++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
>> @@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, 
>> struct kobj_attribute *attr,
>>  if (nbytes >= MAX_DRC_NAME_LEN)
>>  return 0;
>>  
>> -memcpy(drc_name, buf, nbytes);
>> +memcpy(drc_name, buf, nbytes + 1);
>>  
>>  end = strchr(drc_name, '\n');
>> -if (!end)
>> -end = &drc_name[nbytes];
>> -*end = '\0';
>> +if (end)
>> +*end = '\0';
>>  
>>  rc = dlpar_add_slot(drc_name);
>>  if (rc)
>> @@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
>>  if (nbytes >= MAX_DRC_NAME_LEN)
>>  return 0;
>>  
>> -memcpy(drc_name, buf, nbytes);
>> +memcpy(drc_name, buf, nbytes + 1);
>>  
>>  end = strchr(drc_name, '\n');
>> -if (!end)
>> -end = &drc_name[nbytes];
>> -*end = '\0';
>> +if (end)
>> +*end = '\0';
>>  
>>  rc = dlpar_remove_slot(drc_name);
>>  if (rc)
>> -- 
>> 2.27.0
>>



[PATCH] ibmvfc: free channel_setup_buf during device tear down

2021-03-10 Thread Tyrel Datwyler
The buffer for negotiating channel setup is DMA allocated at device
probe time. However, the remove path fails to free this allocation which
will prevent the hypervisor from releasing the virtual device in the
case of a hotplug remove.

Fix this issue by freeing the buffer allocation in ibmvfc_free_mem().

Fixes: e95eef3fc0bc ("scsi: ibmvfc: Implement channel enquiry and setup 
commands")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index e663085a8944..76531eec49de 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5770,6 +5770,8 @@ static void ibmvfc_free_mem(struct ibmvfc_host *vhost)
  vhost->disc_buf_dma);
dma_free_coherent(vhost->dev, sizeof(*vhost->login_buf),
  vhost->login_buf, vhost->login_buf_dma);
+   dma_free_coherent(vhost->dev, sizeof(*vhost->channel_setup_buf),
+ vhost->channel_setup_buf, vhost->channel_setup_dma);
dma_pool_destroy(vhost->sg_pool);
ibmvfc_free_queue(vhost, async_q);
LEAVE;
-- 
2.27.0



[PATCH] rpadlpar: fix potential drc_name corruption in store functions

2021-03-10 Thread Tyrel Datwyler
Both add_slot_store() and remove_slot_store() try to fix up the drc_name
copied from the store buffer by placing a NULL terminator at nbyte + 1
or in place of a '\n' if present. However, the static buffer that we
copy the drc_name data into is not zeored and can contain anything past
the n-th byte. This is problematic if a '\n' byte appears in that buffer
after nbytes and the string copied into the store buffer was not NULL
terminated to start with as the strchr() search for a '\n' byte will mark
this incorrectly as the end of the drc_name string resulting in a drc_name
string that contains garbage data after the n-th byte. The following
debugging shows an example of the drmgr utility writing "PHB 4543" to
the add_slot sysfs attribute, but add_slot_store logging a corrupted
string value.

[135823.702864] drmgr: drmgr: -c phb -a -s PHB 4543 -d 1
[135823.702879] add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19

Fix this by NULL terminating the string when we copy it into our static
buffer by coping nbytes + 1 of data from the store buffer. The code has
already made sure that nbytes is not >= MAX_DRC_NAME_LEN and the store
buffer is guaranteed to be zeroed beyond the nth-byte of data copied
from the user. Further, since the string is now NULL terminated the code
only needs to change '\n' to '\0' when present.

Signed-off-by: Tyrel Datwyler 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/hotplug/rpadlpar_sysfs.c 
b/drivers/pci/hotplug/rpadlpar_sysfs.c
index cdbfa5df3a51..375087921284 100644
--- a/drivers/pci/hotplug/rpadlpar_sysfs.c
+++ b/drivers/pci/hotplug/rpadlpar_sysfs.c
@@ -34,12 +34,11 @@ static ssize_t add_slot_store(struct kobject *kobj, struct 
kobj_attribute *attr,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   memcpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_add_slot(drc_name);
if (rc)
@@ -65,12 +64,11 @@ static ssize_t remove_slot_store(struct kobject *kobj,
if (nbytes >= MAX_DRC_NAME_LEN)
return 0;
 
-   memcpy(drc_name, buf, nbytes);
+   memcpy(drc_name, buf, nbytes + 1);
 
end = strchr(drc_name, '\n');
-   if (!end)
-   end = &drc_name[nbytes];
-   *end = '\0';
+   if (end)
+   *end = '\0';
 
rc = dlpar_remove_slot(drc_name);
if (rc)
-- 
2.27.0



Re: [PATCH] scsi: ibmvfc: Switch to using the new API kobj_to_dev()

2021-03-04 Thread Tyrel Datwyler
On 3/4/21 1:28 AM, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/scsi/ibmvscsi/ibmvfc.c:3483:60-61: WARNING opportunity for
> kobj_to_dev().
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Acked-by: Tyrel Datwyler 

> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 755313b..e5f1ca7 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3480,7 +3480,7 @@ static ssize_t ibmvfc_read_trace(struct file *filp, 
> struct kobject *kobj,
>struct bin_attribute *bin_attr,
>char *buf, loff_t off, size_t count)
>  {
> - struct device *dev = container_of(kobj, struct device, kobj);
> + struct device *dev = kobj_to_dev(kobj);
>   struct Scsi_Host *shost = class_to_shost(dev);
>   struct ibmvfc_host *vhost = shost_priv(shost);
>   unsigned long flags = 0;
> 



[PATCH v5 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-03-02 Thread Tyrel Datwyler
A live partition migration (LPM) results in a CRQ disconnect similar to
a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
transport such that it simply needs to be reenabled. However, the
capabilities may have changed such as fewer channels, or no channels at
all. Further, its possible that there may be sub-CRQ support, but no
channel support. The CRQ reenable path currently doesn't take any of
this into consideration.

For simpilicty release and reinitialize sub-CRQs during reenable, and
set do_enquiry and using_channels with the appropriate values to trigger
channel renegotiation.

fixes: 3034ebe26389 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ef03fa559433..1e2ea21713ad 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
 {
int rc = 0;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
+   unsigned long flags;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Re-enable the CRQ */
do {
@@ -914,6 +917,15 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
if (rc)
dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
 
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   spin_lock(vhost->crq.q_lock);
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
+   spin_unlock(vhost->crq.q_lock);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+   ibmvfc_init_sub_crqs(vhost);
+
return rc;
 }
 
-- 
2.27.0



[PATCH v5 4/5] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-03-02 Thread Tyrel Datwyler
The H_FREE_SUB_CRQ hypercall can return a retry delay return code that
indicates the call needs to be retried after a specific amount of time
delay. The error path to free a sub-CRQ in case of a failure during
channel registration fails to capture the return code of H_FREE_SUB_CRQ
which will result in the delay loop being skipped in the case of a retry
delay return code.

Store the return code result of the H_FREE_SUB_CRQ call such that the
return code check in the delay loop evaluates a meaningful value. Also,
use the rtas_busy_delay() to check the rc value and delay for the
appropriate amount of time.

Fixes: 39e461fddff0 ("ibmvfc: map/request irq and register Sub-CRQ interrupt 
handler")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 1d9f961715ca..ef03fa559433 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5670,8 +5671,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
 
 irq_failed:
do {
-   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
-   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rtas_busy_delay(rc));
 reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
-- 
2.27.0



[PATCH v5 1/5] ibmvfc: simplify handling of sub-CRQ initialization

2021-03-02 Thread Tyrel Datwyler
If ibmvfc_init_sub_crqs() fails ibmvfc_probe() simply parrots
registration failure reported elsewhere, and futher
vhost->scsi_scrq.scrq == NULL is indication enough to the driver that it
has no sub-CRQs available. The mq_enabled check can also be moved into
ibmvfc_init_sub_crqs() such that each caller doesn't have to gate the
call with a mq_enabled check. Finally, in the case of sub-CRQ setup
failure setting do_enquiry can be turned off to putting the driver into
single queue fallback mode.

The aforementioned changes also simplify the next patch in the series
that fixes a hard reset issue, by tying a sub-CRQ setup failure and
do_enquiry logic into ibmvfc_init_sub_crqs().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 7097028d4cb6..384960036f8b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5705,17 +5705,21 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
LEAVE;
 }
 
-static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
 {
int i, j;
 
ENTER;
+   if (!vhost->mq_enabled)
+   return;
 
vhost->scsi_scrqs.scrqs = kcalloc(nr_scsi_hw_queues,
  sizeof(*vhost->scsi_scrqs.scrqs),
  GFP_KERNEL);
-   if (!vhost->scsi_scrqs.scrqs)
-   return -1;
+   if (!vhost->scsi_scrqs.scrqs) {
+   vhost->do_enquiry = 0;
+   return;
+   }
 
for (i = 0; i < nr_scsi_hw_queues; i++) {
if (ibmvfc_register_scsi_channel(vhost, i)) {
@@ -5724,13 +5728,12 @@ static int ibmvfc_init_sub_crqs(struct ibmvfc_host 
*vhost)
kfree(vhost->scsi_scrqs.scrqs);
vhost->scsi_scrqs.scrqs = NULL;
vhost->scsi_scrqs.active_queues = 0;
-   LEAVE;
-   return -1;
+   vhost->do_enquiry = 0;
+   break;
}
}
 
LEAVE;
-   return 0;
 }
 
 static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
@@ -5997,11 +6000,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto remove_shost;
}
 
-   if (vhost->mq_enabled) {
-   rc = ibmvfc_init_sub_crqs(vhost);
-   if (rc)
-   dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
rc);
-   }
+   ibmvfc_init_sub_crqs(vhost);
 
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
-- 
2.27.0



[PATCH v5 0/5] ibmvfc: hard reset fixes

2021-03-02 Thread Tyrel Datwyler
This series contains a minor simplification of ibmvfc_init_sub_crqs() followed
by a couple fixes for sub-CRQ handling which effect hard reset of the
client/host adapter CRQ pair.

changes in v5:
Patches 2-5: Corrected upstream commit ids for Fixes: tags

changes in v4:
Patch 2: dropped Reviewed-by tag and moved sub-crq init to after locked region
Patch 5: moved sub-crq init to after locked region

changes in v3:
* Patch 1 & 5: moved ibmvfc_init_sub_crqs out of locked patch

changes in v2:
* added Reviewed-by tags for patches 1-3
* Patch 4: use rtas_busy_delay to test rc and delay correct amount of time
* Patch 5: (new) similar fix for LPM case where CRQ pair needs re-enablement

Tyrel Datwyler (5):
  powerpc/pseries: extract host bridge from pci_bus prior to bus removal
  ibmvfc: simplify handling of sub-CRQ initialization
  ibmvfc: fix invalid sub-CRQ handles after hard reset
  ibmvfc: treat H_CLOSED as success during sub-CRQ registration
  ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

 arch/powerpc/platforms/pseries/pci_dlpar.c |  4 +-
 drivers/scsi/ibmvscsi/ibmvfc.c | 49 ++
 2 files changed, 26 insertions(+), 27 deletions(-)

-- 
2.27.0



[PATCH v5 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-03-02 Thread Tyrel Datwyler
A hard reset results in a complete transport disconnect such that the
CRQ connection with the partner VIOS is broken. This has the side effect
of also invalidating the associated sub-CRQs. The current code assumes
that the sub-CRQs are perserved resulting in a protocol violation after
trying to reconnect them with the VIOS. This introduces an infinite loop
such that the VIOS forces a disconnect after each subsequent attempt to
re-register with invalid handles.

Avoid the aforementioned issue by releasing the sub-CRQs prior to CRQ
disconnect, and driving a reinitialization of the sub-CRQs once a new
CRQ is registered with the hypervisor.

fixes: 3034ebe26389 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 384960036f8b..d34e1a4f74d9 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -158,6 +158,9 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *);
 static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
 static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *);
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *);
+
 static const char *unknown_error = "unknown error";
 
 static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
@@ -926,8 +929,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_queue *crq = &vhost->crq;
-   struct ibmvfc_queue *scrq;
-   int i;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Close the CRQ */
do {
@@ -947,16 +950,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;
 
-   if (vhost->scsi_scrqs.scrqs) {
-   for (i = 0; i < nr_scsi_hw_queues; i++) {
-   scrq = &vhost->scsi_scrqs.scrqs[i];
-   spin_lock(scrq->q_lock);
-   memset(scrq->msgs.scrq, 0, PAGE_SIZE);
-   scrq->cur = 0;
-   spin_unlock(scrq->q_lock);
-   }
-   }
-
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -966,9 +959,12 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
dev_warn(vhost->dev, "Partner adapter not ready\n");
else if (rc != 0)
dev_warn(vhost->dev, "Couldn't register crq (rc=%d)\n", rc);
+
spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
+   ibmvfc_init_sub_crqs(vhost);
+
return rc;
 }
 
@@ -5692,6 +5688,7 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
 
free_irq(scrq->irq, scrq);
irq_dispose_mapping(scrq->irq);
+   scrq->irq = 0;
 
do {
rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
-- 
2.27.0



[PATCH v5 3/5] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-03-02 Thread Tyrel Datwyler
A non-zero return code for H_REG_SUB_CRQ is currently treated as a
failure resulting in failing sub-CRQ setup. The case of H_CLOSED should
not be treated as a failure. This return code translates to a successful
sub-CRQ registration by the hypervisor, and is meant to communicate back
that there is currently no partner VIOS CRQ connection established as of
yet. This is a common occurrence during a disconnect where the client
adapter can possibly come back up prior to the partner adapter.

For non-zero return code from H_REG_SUB_CRQ treat a H_CLOSED as success
so that sub-CRQs are successfully setup.

Fixes: 3034ebe26389 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d34e1a4f74d9..1d9f961715ca 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5636,7 +5636,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
   &scrq->cookie, &scrq->hw_irq);
 
-   if (rc) {
+   /* H_CLOSED indicates successful register, but no CRQ partner */
+   if (rc && rc != H_CLOSED) {
dev_warn(dev, "Error registering sub-crq: %d\n", rc);
if (rc == H_PARAMETER)
dev_warn_once(dev, "Firmware may not support MQ\n");
-- 
2.27.0



Re: [PATCH 40/44] tty: hvc, drop unneeded forward declarations

2021-03-02 Thread Tyrel Datwyler
On 3/1/21 10:22 PM, Jiri Slaby wrote:
> Forward declarations make the code larger and rewrites harder. Harder as
> they are often omitted from global changes. Remove forward declarations
> which are not really needed, i.e. the definition of the function is
> before its first use.
> 
> Signed-off-by: Jiri Slaby 
> Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Tyrel Datwyler 

> ---
>  drivers/tty/hvc/hvcs.c | 25 -
>  1 file changed, 25 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
> index c90848919644..0b89d878a108 100644
> --- a/drivers/tty/hvc/hvcs.c
> +++ b/drivers/tty/hvc/hvcs.c
> @@ -290,36 +290,11 @@ static LIST_HEAD(hvcs_structs);
>  static DEFINE_SPINLOCK(hvcs_structs_lock);
>  static DEFINE_MUTEX(hvcs_init_mutex);
> 
> -static void hvcs_unthrottle(struct tty_struct *tty);
> -static void hvcs_throttle(struct tty_struct *tty);
> -static irqreturn_t hvcs_handle_interrupt(int irq, void *dev_instance);
> -
> -static int hvcs_write(struct tty_struct *tty,
> - const unsigned char *buf, int count);
> -static int hvcs_write_room(struct tty_struct *tty);
> -static int hvcs_chars_in_buffer(struct tty_struct *tty);
> -
> -static int hvcs_has_pi(struct hvcs_struct *hvcsd);
> -static void hvcs_set_pi(struct hvcs_partner_info *pi,
> - struct hvcs_struct *hvcsd);
>  static int hvcs_get_pi(struct hvcs_struct *hvcsd);
>  static int hvcs_rescan_devices_list(void);
> 
> -static int hvcs_partner_connect(struct hvcs_struct *hvcsd);
>  static void hvcs_partner_free(struct hvcs_struct *hvcsd);
> 
> -static int hvcs_enable_device(struct hvcs_struct *hvcsd,
> - uint32_t unit_address, unsigned int irq, struct vio_dev *dev);
> -
> -static int hvcs_open(struct tty_struct *tty, struct file *filp);
> -static void hvcs_close(struct tty_struct *tty, struct file *filp);
> -static void hvcs_hangup(struct tty_struct * tty);
> -
> -static int hvcs_probe(struct vio_dev *dev,
> - const struct vio_device_id *id);
> -static int hvcs_remove(struct vio_dev *dev);
> -static int __init hvcs_module_init(void);
> -static void __exit hvcs_module_exit(void);
>  static int hvcs_initialize(void);
> 
>  #define HVCS_SCHED_READ  0x0001
> 



Re: [PATCH v3 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-25 Thread Tyrel Datwyler
On 2/25/21 1:42 PM, Tyrel Datwyler wrote:
> A hard reset results in a complete transport disconnect such that the
> CRQ connection with the partner VIOS is broken. This has the side effect
> of also invalidating the associated sub-CRQs. The current code assumes
> that the sub-CRQs are perserved resulting in a protocol violation after
> trying to reconnect them with the VIOS. This introduces an infinite loop
> such that the VIOS forces a disconnect after each subsequent attempt to
> re-register with invalid handles.
> 
> Avoid the aforementioned issue by releasing the sub-CRQs prior to CRQ
> disconnect, and driving a reinitialization of the sub-CRQs once a new
> CRQ is registered with the hypervisor.
> 
> fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
> Channels")
> Signed-off-by: Tyrel Datwyler 
> Reviewed-by: Brian King 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 384960036f8b..2cca55f2e464 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -158,6 +158,9 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *);
>  static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
>  static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
>  
> +static void ibmvfc_release_sub_crqs(struct ibmvfc_host *);
> +static void ibmvfc_init_sub_crqs(struct ibmvfc_host *);
> +
>  static const char *unknown_error = "unknown error";
>  
>  static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
> @@ -926,8 +929,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
>   unsigned long flags;
>   struct vio_dev *vdev = to_vio_dev(vhost->dev);
>   struct ibmvfc_queue *crq = &vhost->crq;
> - struct ibmvfc_queue *scrq;
> - int i;
> +
> + ibmvfc_release_sub_crqs(vhost);
>  
>   /* Close the CRQ */
>   do {
> @@ -936,6 +939,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
>   rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
>   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
>  
> + ibmvfc_init_sub_crqs(vhost);

This has the same issue as patch 5 in that if fail to set up sub-crqs do_enquiry
will be set to zero, but the locked code region below will then flip it back to
one which we don't want.

-T

> +
>   spin_lock_irqsave(vhost->host->host_lock, flags);
>   spin_lock(vhost->crq.q_lock);
>   vhost->state = IBMVFC_NO_CRQ;
> @@ -947,16 +952,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
>   memset(crq->msgs.crq, 0, PAGE_SIZE);
>   crq->cur = 0;
>  
> - if (vhost->scsi_scrqs.scrqs) {
> - for (i = 0; i < nr_scsi_hw_queues; i++) {
> - scrq = &vhost->scsi_scrqs.scrqs[i];
> - spin_lock(scrq->q_lock);
> - memset(scrq->msgs.scrq, 0, PAGE_SIZE);
> - scrq->cur = 0;
> - spin_unlock(scrq->q_lock);
> - }
> - }
> -
>   /* And re-open it again */
>   rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
>   crq->msg_token, PAGE_SIZE);
> @@ -966,6 +961,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
>   dev_warn(vhost->dev, "Partner adapter not ready\n");
>   else if (rc != 0)
>   dev_warn(vhost->dev, "Couldn't register crq (rc=%d)\n", rc);
> +
>   spin_unlock(vhost->crq.q_lock);
>   spin_unlock_irqrestore(vhost->host->host_lock, flags);
>  
> @@ -5692,6 +5688,7 @@ static void ibmvfc_deregister_scsi_channel(struct 
> ibmvfc_host *vhost, int index)
>  
>   free_irq(scrq->irq, scrq);
>   irq_dispose_mapping(scrq->irq);
> + scrq->irq = 0;
>  
>   do {
>   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
> 



Re: [PATCH v3 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Tyrel Datwyler
On 2/25/21 1:42 PM, Tyrel Datwyler wrote:
> A live partition migration (LPM) results in a CRQ disconnect similar to
> a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
> transport such that it simply needs to be reenabled. However, the
> capabilities may have changed such as fewer channels, or no channels at
> all. Further, its possible that there may be sub-CRQ support, but no
> channel support. The CRQ reenable path currently doesn't take any of
> this into consideration.
> 
> For simpilicty release and reinitialize sub-CRQs during reenable, and
> set do_enquiry and using_channels with the appropriate values to trigger
> channel renegotiation.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 1bb08e5f3674..6bbc2697ad5a 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>  {
>   int rc = 0;
>   struct vio_dev *vdev = to_vio_dev(vhost->dev);
> + unsigned long flags;
> +
> + ibmvfc_release_sub_crqs(vhost);
>  
>   /* Re-enable the CRQ */
>   do {
> @@ -914,6 +917,15 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>   if (rc)
>   dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
>  
> + ibmvfc_init_sub_crqs(vhost);

Realized that if this fails it set the do_enquiry flag to zero which the locked
region below will then flip back to one. Need to move sub-crq init to after
locked region.

-T

> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + spin_lock(vhost->crq.q_lock);
> + vhost->do_enquiry = 1;
> + vhost->using_channels = 0;
> + spin_unlock(vhost->crq.q_lock);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
>   return rc;
>  }
>  
> 



[PATCH v4 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-25 Thread Tyrel Datwyler
A hard reset results in a complete transport disconnect such that the
CRQ connection with the partner VIOS is broken. This has the side effect
of also invalidating the associated sub-CRQs. The current code assumes
that the sub-CRQs are perserved resulting in a protocol violation after
trying to reconnect them with the VIOS. This introduces an infinite loop
such that the VIOS forces a disconnect after each subsequent attempt to
re-register with invalid handles.

Avoid the aforementioned issue by releasing the sub-CRQs prior to CRQ
disconnect, and driving a reinitialization of the sub-CRQs once a new
CRQ is registered with the hypervisor.

fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 384960036f8b..d34e1a4f74d9 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -158,6 +158,9 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *);
 static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
 static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *);
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *);
+
 static const char *unknown_error = "unknown error";
 
 static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
@@ -926,8 +929,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_queue *crq = &vhost->crq;
-   struct ibmvfc_queue *scrq;
-   int i;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Close the CRQ */
do {
@@ -947,16 +950,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;
 
-   if (vhost->scsi_scrqs.scrqs) {
-   for (i = 0; i < nr_scsi_hw_queues; i++) {
-   scrq = &vhost->scsi_scrqs.scrqs[i];
-   spin_lock(scrq->q_lock);
-   memset(scrq->msgs.scrq, 0, PAGE_SIZE);
-   scrq->cur = 0;
-   spin_unlock(scrq->q_lock);
-   }
-   }
-
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -966,9 +959,12 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
dev_warn(vhost->dev, "Partner adapter not ready\n");
else if (rc != 0)
dev_warn(vhost->dev, "Couldn't register crq (rc=%d)\n", rc);
+
spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
+   ibmvfc_init_sub_crqs(vhost);
+
return rc;
 }
 
@@ -5692,6 +5688,7 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
 
free_irq(scrq->irq, scrq);
irq_dispose_mapping(scrq->irq);
+   scrq->irq = 0;
 
do {
rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
-- 
2.27.0



[PATCH v4 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Tyrel Datwyler
A live partition migration (LPM) results in a CRQ disconnect similar to
a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
transport such that it simply needs to be reenabled. However, the
capabilities may have changed such as fewer channels, or no channels at
all. Further, its possible that there may be sub-CRQ support, but no
channel support. The CRQ reenable path currently doesn't take any of
this into consideration.

For simpilicty release and reinitialize sub-CRQs during reenable, and
set do_enquiry and using_channels with the appropriate values to trigger
channel renegotiation.

fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ef03fa559433..1e2ea21713ad 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
 {
int rc = 0;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
+   unsigned long flags;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Re-enable the CRQ */
do {
@@ -914,6 +917,15 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
if (rc)
dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
 
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   spin_lock(vhost->crq.q_lock);
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
+   spin_unlock(vhost->crq.q_lock);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+   ibmvfc_init_sub_crqs(vhost);
+
return rc;
 }
 
-- 
2.27.0



[PATCH v4 0/5] ibmvfc: hard reset fixes

2021-02-25 Thread Tyrel Datwyler
This series contains a minor simplification of ibmvfc_init_sub_crqs() followed
by a couple fixes for sub-CRQ handling which effect hard reset of the
client/host adapter CRQ pair.

changes in v4:
Patch 2: dropped Reviewed-by tag and moved sub-crq init to after locked region
Patch 5: moved sub-crq init to after locked region

changes in v3:
* Patch 1 & 5: moved ibmvfc_init_sub_crqs out of locked patch

changes in v2:
* added Reviewed-by tags for patches 1-3
* Patch 4: use rtas_busy_delay to test rc and delay correct amount of time
* Patch 5: (new) similar fix for LPM case where CRQ pair needs re-enablement

Tyrel Datwyler (5):
  powerpc/pseries: extract host bridge from pci_bus prior to bus removal
  ibmvfc: simplify handling of sub-CRQ initialization
  ibmvfc: fix invalid sub-CRQ handles after hard reset
  ibmvfc: treat H_CLOSED as success during sub-CRQ registration
  ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

 arch/powerpc/platforms/pseries/pci_dlpar.c |  4 +-
 drivers/scsi/ibmvscsi/ibmvfc.c | 49 ++
 2 files changed, 26 insertions(+), 27 deletions(-)

-- 
2.27.0



[PATCH v4 3/5] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-02-25 Thread Tyrel Datwyler
A non-zero return code for H_REG_SUB_CRQ is currently treated as a
failure resulting in failing sub-CRQ setup. The case of H_CLOSED should
not be treated as a failure. This return code translates to a successful
sub-CRQ registration by the hypervisor, and is meant to communicate back
that there is currently no partner VIOS CRQ connection established as of
yet. This is a common occurrence during a disconnect where the client
adapter can possibly come back up prior to the partner adapter.

For non-zero return code from H_REG_SUB_CRQ treat a H_CLOSED as success
so that sub-CRQs are successfully setup.

Fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d34e1a4f74d9..1d9f961715ca 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5636,7 +5636,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
   &scrq->cookie, &scrq->hw_irq);
 
-   if (rc) {
+   /* H_CLOSED indicates successful register, but no CRQ partner */
+   if (rc && rc != H_CLOSED) {
dev_warn(dev, "Error registering sub-crq: %d\n", rc);
if (rc == H_PARAMETER)
dev_warn_once(dev, "Firmware may not support MQ\n");
-- 
2.27.0



[PATCH v4 4/5] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-25 Thread Tyrel Datwyler
The H_FREE_SUB_CRQ hypercall can return a retry delay return code that
indicates the call needs to be retried after a specific amount of time
delay. The error path to free a sub-CRQ in case of a failure during
channel registration fails to capture the return code of H_FREE_SUB_CRQ
which will result in the delay loop being skipped in the case of a retry
delay return code.

Store the return code result of the H_FREE_SUB_CRQ call such that the
return code check in the delay loop evaluates a meaningful value. Also,
use the rtas_busy_delay() to check the rc value and delay for the
appropriate amount of time.

Fixes: 9288d35d70b5 ("ibmvfc: map/request irq and register Sub-CRQ interrupt 
handler")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 1d9f961715ca..ef03fa559433 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5670,8 +5671,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
 
 irq_failed:
do {
-   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
-   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rtas_busy_delay(rc));
 reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
-- 
2.27.0



[PATCH v4 1/5] ibmvfc: simplify handling of sub-CRQ initialization

2021-02-25 Thread Tyrel Datwyler
If ibmvfc_init_sub_crqs() fails ibmvfc_probe() simply parrots
registration failure reported elsewhere, and futher
vhost->scsi_scrq.scrq == NULL is indication enough to the driver that it
has no sub-CRQs available. The mq_enabled check can also be moved into
ibmvfc_init_sub_crqs() such that each caller doesn't have to gate the
call with a mq_enabled check. Finally, in the case of sub-CRQ setup
failure setting do_enquiry can be turned off to putting the driver into
single queue fallback mode.

The aforementioned changes also simplify the next patch in the series
that fixes a hard reset issue, by tying a sub-CRQ setup failure and
do_enquiry logic into ibmvfc_init_sub_crqs().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 7097028d4cb6..384960036f8b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5705,17 +5705,21 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
LEAVE;
 }
 
-static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
 {
int i, j;
 
ENTER;
+   if (!vhost->mq_enabled)
+   return;
 
vhost->scsi_scrqs.scrqs = kcalloc(nr_scsi_hw_queues,
  sizeof(*vhost->scsi_scrqs.scrqs),
  GFP_KERNEL);
-   if (!vhost->scsi_scrqs.scrqs)
-   return -1;
+   if (!vhost->scsi_scrqs.scrqs) {
+   vhost->do_enquiry = 0;
+   return;
+   }
 
for (i = 0; i < nr_scsi_hw_queues; i++) {
if (ibmvfc_register_scsi_channel(vhost, i)) {
@@ -5724,13 +5728,12 @@ static int ibmvfc_init_sub_crqs(struct ibmvfc_host 
*vhost)
kfree(vhost->scsi_scrqs.scrqs);
vhost->scsi_scrqs.scrqs = NULL;
vhost->scsi_scrqs.active_queues = 0;
-   LEAVE;
-   return -1;
+   vhost->do_enquiry = 0;
+   break;
}
}
 
LEAVE;
-   return 0;
 }
 
 static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
@@ -5997,11 +6000,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto remove_shost;
}
 
-   if (vhost->mq_enabled) {
-   rc = ibmvfc_init_sub_crqs(vhost);
-   if (rc)
-   dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
rc);
-   }
+   ibmvfc_init_sub_crqs(vhost);
 
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
-- 
2.27.0



[PATCH v3 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Tyrel Datwyler
A live partition migration (LPM) results in a CRQ disconnect similar to
a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
transport such that it simply needs to be reenabled. However, the
capabilities may have changed such as fewer channels, or no channels at
all. Further, its possible that there may be sub-CRQ support, but no
channel support. The CRQ reenable path currently doesn't take any of
this into consideration.

For simpilicty release and reinitialize sub-CRQs during reenable, and
set do_enquiry and using_channels with the appropriate values to trigger
channel renegotiation.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 1bb08e5f3674..6bbc2697ad5a 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
 {
int rc = 0;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
+   unsigned long flags;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Re-enable the CRQ */
do {
@@ -914,6 +917,15 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
if (rc)
dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
 
+   ibmvfc_init_sub_crqs(vhost);
+
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   spin_lock(vhost->crq.q_lock);
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
+   spin_unlock(vhost->crq.q_lock);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
return rc;
 }
 
-- 
2.27.0



[PATCH v3 4/5] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-25 Thread Tyrel Datwyler
The H_FREE_SUB_CRQ hypercall can return a retry delay return code that
indicates the call needs to be retried after a specific amount of time
delay. The error path to free a sub-CRQ in case of a failure during
channel registration fails to capture the return code of H_FREE_SUB_CRQ
which will result in the delay loop being skipped in the case of a retry
delay return code.

Store the return code result of the H_FREE_SUB_CRQ call such that the
return code check in the delay loop evaluates a meaningful value. Also,
use the rtas_busy_delay() to check the rc value and delay for the
appropriate amount of time.

Fixes: 9288d35d70b5 ("ibmvfc: map/request irq and register Sub-CRQ interrupt 
handler")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 274c5a1fac9c..1bb08e5f3674 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5670,8 +5671,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
 
 irq_failed:
do {
-   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
-   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rtas_busy_delay(rc));
 reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
-- 
2.27.0



[PATCH v3 3/5] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-02-25 Thread Tyrel Datwyler
A non-zero return code for H_REG_SUB_CRQ is currently treated as a
failure resulting in failing sub-CRQ setup. The case of H_CLOSED should
not be treated as a failure. This return code translates to a successful
sub-CRQ registration by the hypervisor, and is meant to communicate back
that there is currently no partner VIOS CRQ connection established as of
yet. This is a common occurrence during a disconnect where the client
adapter can possibly come back up prior to the partner adapter.

For non-zero return code from H_REG_SUB_CRQ treat a H_CLOSED as success
so that sub-CRQs are successfully setup.

Fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 2cca55f2e464..274c5a1fac9c 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5636,7 +5636,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
   &scrq->cookie, &scrq->hw_irq);
 
-   if (rc) {
+   /* H_CLOSED indicates successful register, but no CRQ partner */
+   if (rc && rc != H_CLOSED) {
dev_warn(dev, "Error registering sub-crq: %d\n", rc);
if (rc == H_PARAMETER)
dev_warn_once(dev, "Firmware may not support MQ\n");
-- 
2.27.0



[PATCH v3 0/5] ibmvfc: hard reset fixes

2021-02-25 Thread Tyrel Datwyler
This series contains a minor simplification of ibmvfc_init_sub_crqs() followed
by a couple fixes for sub-CRQ handling which effect hard reset of the
client/host adapter CRQ pair.

changes in v3:
* Patch 1 & 5: moved ibmvfc_init_sub_crqs out of locked patch

changes in v2:
* added Reviewed-by tags for patches 1-3
* Patch 4: use rtas_busy_delay to test rc and delay correct amount of time
* Patch 5: (new) similar fix for LPM case where CRQ pair needs re-enablement

Tyrel Datwyler (5):
  powerpc/pseries: extract host bridge from pci_bus prior to bus removal
  ibmvfc: simplify handling of sub-CRQ initialization
  ibmvfc: fix invalid sub-CRQ handles after hard reset
  ibmvfc: treat H_CLOSED as success during sub-CRQ registration
  ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

 arch/powerpc/platforms/pseries/pci_dlpar.c |  4 +-
 drivers/scsi/ibmvscsi/ibmvfc.c | 49 ++
 2 files changed, 26 insertions(+), 27 deletions(-)

-- 
2.27.0



[PATCH v3 1/5] ibmvfc: simplify handling of sub-CRQ initialization

2021-02-25 Thread Tyrel Datwyler
If ibmvfc_init_sub_crqs() fails ibmvfc_probe() simply parrots
registration failure reported elsewhere, and futher
vhost->scsi_scrq.scrq == NULL is indication enough to the driver that it
has no sub-CRQs available. The mq_enabled check can also be moved into
ibmvfc_init_sub_crqs() such that each caller doesn't have to gate the
call with a mq_enabled check. Finally, in the case of sub-CRQ setup
failure setting do_enquiry can be turned off to putting the driver into
single queue fallback mode.

The aforementioned changes also simplify the next patch in the series
that fixes a hard reset issue, by tying a sub-CRQ setup failure and
do_enquiry logic into ibmvfc_init_sub_crqs().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 7097028d4cb6..384960036f8b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5705,17 +5705,21 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
LEAVE;
 }
 
-static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
 {
int i, j;
 
ENTER;
+   if (!vhost->mq_enabled)
+   return;
 
vhost->scsi_scrqs.scrqs = kcalloc(nr_scsi_hw_queues,
  sizeof(*vhost->scsi_scrqs.scrqs),
  GFP_KERNEL);
-   if (!vhost->scsi_scrqs.scrqs)
-   return -1;
+   if (!vhost->scsi_scrqs.scrqs) {
+   vhost->do_enquiry = 0;
+   return;
+   }
 
for (i = 0; i < nr_scsi_hw_queues; i++) {
if (ibmvfc_register_scsi_channel(vhost, i)) {
@@ -5724,13 +5728,12 @@ static int ibmvfc_init_sub_crqs(struct ibmvfc_host 
*vhost)
kfree(vhost->scsi_scrqs.scrqs);
vhost->scsi_scrqs.scrqs = NULL;
vhost->scsi_scrqs.active_queues = 0;
-   LEAVE;
-   return -1;
+   vhost->do_enquiry = 0;
+   break;
}
}
 
LEAVE;
-   return 0;
 }
 
 static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
@@ -5997,11 +6000,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto remove_shost;
}
 
-   if (vhost->mq_enabled) {
-   rc = ibmvfc_init_sub_crqs(vhost);
-   if (rc)
-   dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
rc);
-   }
+   ibmvfc_init_sub_crqs(vhost);
 
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
-- 
2.27.0



[PATCH v3 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-25 Thread Tyrel Datwyler
A hard reset results in a complete transport disconnect such that the
CRQ connection with the partner VIOS is broken. This has the side effect
of also invalidating the associated sub-CRQs. The current code assumes
that the sub-CRQs are perserved resulting in a protocol violation after
trying to reconnect them with the VIOS. This introduces an infinite loop
such that the VIOS forces a disconnect after each subsequent attempt to
re-register with invalid handles.

Avoid the aforementioned issue by releasing the sub-CRQs prior to CRQ
disconnect, and driving a reinitialization of the sub-CRQs once a new
CRQ is registered with the hypervisor.

fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 384960036f8b..2cca55f2e464 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -158,6 +158,9 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *);
 static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
 static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *);
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *);
+
 static const char *unknown_error = "unknown error";
 
 static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
@@ -926,8 +929,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_queue *crq = &vhost->crq;
-   struct ibmvfc_queue *scrq;
-   int i;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Close the CRQ */
do {
@@ -936,6 +939,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
 
+   ibmvfc_init_sub_crqs(vhost);
+
spin_lock_irqsave(vhost->host->host_lock, flags);
spin_lock(vhost->crq.q_lock);
vhost->state = IBMVFC_NO_CRQ;
@@ -947,16 +952,6 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;
 
-   if (vhost->scsi_scrqs.scrqs) {
-   for (i = 0; i < nr_scsi_hw_queues; i++) {
-   scrq = &vhost->scsi_scrqs.scrqs[i];
-   spin_lock(scrq->q_lock);
-   memset(scrq->msgs.scrq, 0, PAGE_SIZE);
-   scrq->cur = 0;
-   spin_unlock(scrq->q_lock);
-   }
-   }
-
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -966,6 +961,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
dev_warn(vhost->dev, "Partner adapter not ready\n");
else if (rc != 0)
dev_warn(vhost->dev, "Couldn't register crq (rc=%d)\n", rc);
+
spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 
@@ -5692,6 +5688,7 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
 
free_irq(scrq->irq, scrq);
irq_dispose_mapping(scrq->irq);
+   scrq->irq = 0;
 
do {
rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
-- 
2.27.0



Re: [PATCH v2 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Tyrel Datwyler
On 2/25/21 12:48 PM, Tyrel Datwyler wrote:
> A live partition migration (LPM) results in a CRQ disconnect similar to
> a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
> transport such that it simply needs to be reenabled. However, the
> capabilities may have changed such as fewer channels, or no channels at
> all. Further, its possible that there may be sub-CRQ support, but no
> channel support. The CRQ reenable path currently doesn't take any of
> this into consideration.
> 
> For simpilicty release and reinitialize sub-CRQs during reenable, and
> set do_enquiry and using_channels with the appropriate values to trigger
> channel renegotiation.
> 
This should of had the same fixes tag as patch 2.

fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")

> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 4ac2c442e1e2..9ae6be56e375 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>  {
>   int rc = 0;
>   struct vio_dev *vdev = to_vio_dev(vhost->dev);
> + unsigned long flags;
> +
> + ibmvfc_release_sub_crqs(vhost);
>  
>   /* Re-enable the CRQ */
>   do {
> @@ -914,6 +917,16 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>   if (rc)
>   dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
>  
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + spin_lock(vhost->crq.q_lock);
> + vhost->do_enquiry = 1;
> + vhost->using_channels = 0;
> +
> + ibmvfc_init_sub_crqs(vhost);
> +
> + spin_unlock(vhost->crq.q_lock);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
>   return rc;
>  }
>  
> 



[PATCH v2 4/5] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-25 Thread Tyrel Datwyler
The H_FREE_SUB_CRQ hypercall can return a retry delay return code that
indicates the call needs to be retried after a specific amount of time
delay. The error path to free a sub-CRQ in case of a failure during
channel registration fails to capture the return code of H_FREE_SUB_CRQ
which will result in the delay loop being skipped in the case of a retry
delay return code.

Store the return code result of the H_FREE_SUB_CRQ call such that the
return code check in the delay loop evaluates a meaningful value. Also,
use the rtas_busy_delay() to check the rc value and delay for the
appropriate amount of time.

Fixes: 9288d35d70b5 ("ibmvfc: map/request irq and register Sub-CRQ interrupt 
handler")
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ba6fcf9cbc57..4ac2c442e1e2 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5670,8 +5671,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
 
 irq_failed:
do {
-   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
-   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rtas_busy_delay(rc));
 reg_failed:
ibmvfc_free_queue(vhost, scrq);
LEAVE;
-- 
2.27.0



[PATCH v2 0/5] ibmvfc: hard reset fixes

2021-02-25 Thread Tyrel Datwyler
This series contains a minor simplification of ibmvfc_init_sub_crqs() followed
by a couple fixes for sub-CRQ handling which effect hard reset of the
client/host adapter CRQ pair.

changes in v2:
* added Reviewed-by tags for patches 1-3
* Patch 4: use rtas_busy_delay to test rc and delay correct amount of time
* Patch 5: (new) similar fix for LPM case where CRQ pair needs re-enablement

Tyrel Datwyler (5):
  powerpc/pseries: extract host bridge from pci_bus prior to bus removal
  ibmvfc: simplify handling of sub-CRQ initialization
  ibmvfc: fix invalid sub-CRQ handles after hard reset
  ibmvfc: treat H_CLOSED as success during sub-CRQ registration
  ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

 arch/powerpc/platforms/pseries/pci_dlpar.c |  4 +-
 drivers/scsi/ibmvscsi/ibmvfc.c | 49 ++
 2 files changed, 26 insertions(+), 27 deletions(-)

-- 
2.27.0



[PATCH v2 1/5] ibmvfc: simplify handling of sub-CRQ initialization

2021-02-25 Thread Tyrel Datwyler
If ibmvfc_init_sub_crqs() fails ibmvfc_probe() simply parrots
registration failure reported elsewhere, and futher
vhost->scsi_scrq.scrq == NULL is indication enough to the driver that it
has no sub-CRQs available. The mq_enabled check can also be moved into
ibmvfc_init_sub_crqs() such that each caller doesn't have to gate the
call with a mq_enabled check. Finally, in the case of sub-CRQ setup
failure setting do_enquiry can be turned off to putting the driver into
single queue fallback mode.

The aforementioned changes also simplify the next patch in the series
that fixes a hard reset issue, by tying a sub-CRQ setup failure and
do_enquiry logic into ibmvfc_init_sub_crqs().

Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 7097028d4cb6..384960036f8b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5705,17 +5705,21 @@ static void ibmvfc_deregister_scsi_channel(struct 
ibmvfc_host *vhost, int index)
LEAVE;
 }
 
-static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+static void ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
 {
int i, j;
 
ENTER;
+   if (!vhost->mq_enabled)
+   return;
 
vhost->scsi_scrqs.scrqs = kcalloc(nr_scsi_hw_queues,
  sizeof(*vhost->scsi_scrqs.scrqs),
  GFP_KERNEL);
-   if (!vhost->scsi_scrqs.scrqs)
-   return -1;
+   if (!vhost->scsi_scrqs.scrqs) {
+   vhost->do_enquiry = 0;
+   return;
+   }
 
for (i = 0; i < nr_scsi_hw_queues; i++) {
if (ibmvfc_register_scsi_channel(vhost, i)) {
@@ -5724,13 +5728,12 @@ static int ibmvfc_init_sub_crqs(struct ibmvfc_host 
*vhost)
kfree(vhost->scsi_scrqs.scrqs);
vhost->scsi_scrqs.scrqs = NULL;
vhost->scsi_scrqs.active_queues = 0;
-   LEAVE;
-   return -1;
+   vhost->do_enquiry = 0;
+   break;
}
}
 
LEAVE;
-   return 0;
 }
 
 static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
@@ -5997,11 +6000,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto remove_shost;
}
 
-   if (vhost->mq_enabled) {
-   rc = ibmvfc_init_sub_crqs(vhost);
-   if (rc)
-   dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
rc);
-   }
+   ibmvfc_init_sub_crqs(vhost);
 
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
-- 
2.27.0



[PATCH v2 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Tyrel Datwyler
A live partition migration (LPM) results in a CRQ disconnect similar to
a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
transport such that it simply needs to be reenabled. However, the
capabilities may have changed such as fewer channels, or no channels at
all. Further, its possible that there may be sub-CRQ support, but no
channel support. The CRQ reenable path currently doesn't take any of
this into consideration.

For simpilicty release and reinitialize sub-CRQs during reenable, and
set do_enquiry and using_channels with the appropriate values to trigger
channel renegotiation.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 4ac2c442e1e2..9ae6be56e375 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
 {
int rc = 0;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
+   unsigned long flags;
+
+   ibmvfc_release_sub_crqs(vhost);
 
/* Re-enable the CRQ */
do {
@@ -914,6 +917,16 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
*vhost)
if (rc)
dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
 
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   spin_lock(vhost->crq.q_lock);
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
+
+   ibmvfc_init_sub_crqs(vhost);
+
+   spin_unlock(vhost->crq.q_lock);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
return rc;
 }
 
-- 
2.27.0



[PATCH v2 3/5] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-02-25 Thread Tyrel Datwyler
A non-zero return code for H_REG_SUB_CRQ is currently treated as a
failure resulting in failing sub-CRQ setup. The case of H_CLOSED should
not be treated as a failure. This return code translates to a successful
sub-CRQ registration by the hypervisor, and is meant to communicate back
that there is currently no partner VIOS CRQ connection established as of
yet. This is a common occurrence during a disconnect where the client
adapter can possibly come back up prior to the partner adapter.

For non-zero return code from H_REG_SUB_CRQ treat a H_CLOSED as success
so that sub-CRQs are successfully setup.

Fixes: faacf8c5f1d5 ("ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ 
Channels")
Signed-off-by: Tyrel Datwyler 
Reviewed-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 3dd20f383453..ba6fcf9cbc57 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5636,7 +5636,8 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
   &scrq->cookie, &scrq->hw_irq);
 
-   if (rc) {
+   /* H_CLOSED indicates successful register, but no CRQ partner */
+   if (rc && rc != H_CLOSED) {
dev_warn(dev, "Error registering sub-crq: %d\n", rc);
if (rc == H_PARAMETER)
dev_warn_once(dev, "Firmware may not support MQ\n");
-- 
2.27.0



  1   2   3   4   5   6   7   >