Re: [PATCH 13/13] accel/habanalabs: modify pci health check
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Ofir Bitton Today we read PCI VENDOR-ID in order to make sure PCI link is healthy. Apparently the VENDOR-ID might be stored on host and hence, when we read it we might not access the PCI bus. In order to make sure PCI health check is reliable, we will start checking the DEVICE-ID instead. What's keeping some system from caching that as well? Since this is checking for PCIe link health, it will be 0xFF when bad. Checking some part of Config Space that is writable would be more reliable. -Carl V.
Re: [PATCH 12/13] accel/habanalabs: keep explicit size of reserved memory for FW
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tomer Tayar The reserved memory for FW is currently saved in an ASIC property in units of MB, just like the value that comes from FW. Except the fact that it is not clear from the property's name, it means also that a calculation to actual size is required everywhere that it is used. Modify the property to hold the size in bytes. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 11/13] accel/habanalabs: handle reserved memory request when working with full FW
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tomer Tayar Currently the reserved memory request from FW is handled when running with preboot only, but this request is relevant also when running with full FW. Modify to always handle this reservation request. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 10/13] accel/habanalabs/hwmon: rate limit errors user can generate
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Ofir Bitton Fetching sensor data can fail due to various reasons. In order not to pollute the kernel log, those error prints must be rate limited. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 09/13] accel/habanalabs/gaudi2: drain event lacks rd/wr indication
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Ofir Bitton Due to a H/W issue, AXI drain event does not include a read/write indication, hence we remove this print. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 08/13] accel/habanalabs: fix error print
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Dani Liberman The unmasking is for event and it can be other event than RAZWI. Signed-off-by: Dani Liberman Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 07/13] accel/habanalabs: initialize maybe-uninitialized variables
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tal Risin Prevent static analysis warning. Signed-off-by: Tal Risin Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 06/13] accel/habanalabs: fix debugfs files permissions
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Avri Kehat debugfs files are created with permissions that don't align with the access requirements. Signed-off-by: Avri Kehat Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 05/13] accel/habanalabs: fix glbl error cause handling
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tomer Tayar The glbl error cause handling has a wrong assumption that all error bits are consecutive. Fix the handling to check all relevant error bits per ASIC. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 04/13] accel/habanalabs/gaudi2: check extended errors according to PCIe addr_dec interrupt info
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tomer Tayar The FW interrupt info for a PCIe addr_dec event is set correctly, so check for either global errors or razwi according to the indications there. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 03/13] accel/habanalabs: modify print for skip loading linux FW to debug log
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Tomer Tayar Skip loading a linux FW image into the device with the current supported ASICs is done for test purposes only. Moreover, for future supported ASICs it is possible that there won't be a need to load such an image. The print in such a case is therefore not needed in most cases, so replace the used dev_info() with dev_dbg(). Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 02/13] accel/habanalabs: remove hop size from asic properties
On 2/20/2024 8:01 AM, Oded Gabbay wrote: From: Farah Kassabri The hop size related properties is a MMU properties and not asic properties. As for PMMU and HMMU we could have different sizes. Signed-off-by: Farah Kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Carl Vanderlip
Re: [PATCH 01/13] accel/habanalabs/gaudi2: use single function to compare FW versions
On 2/20/2024 8:01 AM, Oded Gabbay wrote:> From: Ohad Sharabi > > Currently, the code contains 2 types of FW version comparison functions: > - hl_is_fw_sw_ver_[below/equal_or_greater]() > - gaudi2 specific function of the type >gaudi2_is_fw_ver_[below/above]x_y_z() > > Moreover, some functions use the inner FW version which should be only > stage during development but not version dependencies. > > Finally, some tests are done to deprecated FW version to which LKD > should hold no compatibility. > > This commit aligns all APIs to a single function that just compares the > version and return an integers indicator (similar in some way to > strcmp()). > > In addition, this generic function now considers also the sub-minor FW > version and also remove dead code resulting in deprecated FW versions > compatibility. > > Signed-off-by: Ohad Sharabi > Reviewed-by: Oded Gabbay > Signed-off-by: Oded Gabbay > --- > drivers/accel/habanalabs/common/firmware_if.c | 25 > drivers/accel/habanalabs/common/habanalabs.h | 20 +-- > drivers/accel/habanalabs/gaudi2/gaudi2.c | 57 +++ > 3 files changed, 34 insertions(+), 68 deletions(-) > ... > diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c > index 1f061209ae21..4a0917aa4dd7 100644 > --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c > +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c > @@ -2601,6 +2601,8 @@ static int gaudi2_set_fixed_properties(struct hl_device *hdev) > >prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0; > > + prop->supports_advanced_cpucp_rc = true; > + >return 0; > > free_qprops: > @@ -3308,8 +3310,6 @@ static int gaudi2_late_init(struct hl_device *hdev) >struct gaudi2_device *gaudi2 = hdev->asic_specific; >int rc; > > - hdev->asic_prop.supports_advanced_cpucp_rc = true; > - >rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, >gaudi2->virt_msix_db_dma_addr); >if (rc) { Is this change in support of the others in this patch? Feels like this should be more than one patch (adding new version_cmp, removing old checks). -Carl V.
Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers
On 1/8/2024 11:07 AM, Dmitry Baryshkov wrote: On Mon, 8 Jan 2024 at 19:57, Carl Vanderlip wrote: On 1/5/2024 4:38 PM, Dmitry Baryshkov wrote: On Sat, 6 Jan 2024 at 02:04, Carl Vanderlip wrote: On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote: diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 50b65ffc24b1..ef57586fbeca 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev, return 0; } +#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU) +bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver) +{ + /* If just a single driver is enabled, use it no matter what */ + return true; +} This will cause both MDP/DPU probes to return -ENODEV, rather than select the enabled one. No. The code (e.g. for DPU) is: if (!msm_disp_drv_should_bind(>dev, true)) return -ENODEV; So the driver returns -ENODEV if msm_disp_drv_should_bind() returns false. Which is logical from the function name point of view. but msm_disp_drv_should_bind() is returning true in the #if !REACHABLE() case? at minimum the comment is incorrect since returning true causes the driver to NOT be used. No. Returning _false_ causes the driver to not be used. Apologies... you are correct. Reviewed-by: Carl Vanderlip -Carl V.
Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers
On 1/5/2024 4:38 PM, Dmitry Baryshkov wrote: On Sat, 6 Jan 2024 at 02:04, Carl Vanderlip wrote: On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote: diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 50b65ffc24b1..ef57586fbeca 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev, return 0; } +#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU) +bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver) +{ + /* If just a single driver is enabled, use it no matter what */ + return true; +} This will cause both MDP/DPU probes to return -ENODEV, rather than select the enabled one. No. The code (e.g. for DPU) is: if (!msm_disp_drv_should_bind(>dev, true)) return -ENODEV; So the driver returns -ENODEV if msm_disp_drv_should_bind() returns false. Which is logical from the function name point of view. but msm_disp_drv_should_bind() is returning true in the #if !REACHABLE() case? at minimum the comment is incorrect since returning true causes the driver to NOT be used. -Carl V.
Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers
On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote: For some of the platforms (e.g. SDM660, SDM630, MSM8996, etc.) it is possible to support this platform via the DPU driver (e.g. to provide support for DP, multirect, etc). Add a modparam to be able to switch between these two drivers. All platforms supported by both drivers are by default handled by the MDP5 driver. To let them be handled by the DPU driver pass the `msm.prefer_mdp5=false` kernel param. Reviewed-by: Stephen Boyd Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 3 +++ drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 3 +++ drivers/gpu/drm/msm/msm_drv.c| 31 +++ drivers/gpu/drm/msm/msm_drv.h| 1 + 4 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c index aa9e0ad33ebb..8f11a98491a1 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c @@ -1276,6 +1276,9 @@ static int dpu_dev_probe(struct platform_device *pdev) int irq; int ret = 0; + if (!msm_disp_drv_should_bind(>dev, true)) + return -ENODEV; + dpu_kms = devm_kzalloc(dev, sizeof(*dpu_kms), GFP_KERNEL); if (!dpu_kms) return -ENOMEM; diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index 0827634664ae..43d05851c54d 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -866,6 +866,9 @@ static int mdp5_dev_probe(struct platform_device *pdev) DBG(""); + if (!msm_disp_drv_should_bind(>dev, false)) + return -ENODEV; + mdp5_kms = devm_kzalloc(>dev, sizeof(*mdp5_kms), GFP_KERNEL); if (!mdp5_kms) return -ENOMEM; diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 50b65ffc24b1..ef57586fbeca 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev, return 0; } +#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU) +bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver) s/mdp5_driver/dpu_driver/ +{ + /* If just a single driver is enabled, use it no matter what */ + return true; +} This will cause both MDP/DPU probes to return -ENODEV, rather than select the enabled one. +#else + +static bool prefer_mdp5 = true; +MODULE_PARM_DESC(prefer_mdp5, "Select whether MDP5 or DPU driver should be preferred"); +module_param(prefer_mdp5, bool, 0444); + +/* list all platforms supported by both mdp5 and dpu drivers */ +static const char *const msm_mdp5_dpu_migration[] = { + NULL, +}; + +bool msm_disp_drv_should_bind(struct device *dev, bool dpu_driver) +{ + /* If it is not an MDP5 device, do not try MDP5 driver */ + if (!of_device_is_compatible(dev->of_node, "qcom,mdp5")) + return dpu_driver; + + /* If it is not in the migration list, use MDP5 */ + if (!of_device_compatible_match(dev->of_node, msm_mdp5_dpu_migration)) + return !dpu_driver; + + return prefer_mdp5 ? !dpu_driver : dpu_driver; +} +#endif + /* * We don't know what's the best binding to link the gpu with the drm device. * Fow now, we just hunt for all the possible gpus that we support, and add them diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 01e783130054..762e13e2df74 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -563,5 +563,6 @@ int msm_drv_probe(struct device *dev, struct msm_kms *kms); void msm_kms_shutdown(struct platform_device *pdev); +bool msm_disp_drv_should_bind(struct device *dev, bool dpu_driver); #endif /* __MSM_DRV_H__ */
Re: [PATCH 08/10] accel/ivpu: Disable buffer sharing among VPU contexts
On 1/5/2024 3:22 AM, Jacek Lawrynowicz wrote: Imported buffer from another VPU context will now have just reference increased and there will be a single sgt fixing above problem but buffers still can't be shared among VPU contexts because each context have its own MMU mapping and ivpu_bo supports only single MMU mapping. This paragraph/sentence needs rewrite. Here's my take... " Buffers imported from another VPU context will now just increase reference count, leaving only a single sgt, fixing the problem above. Buffers still can't be shared among VPU contexts because each has its own MMU mapping and ivpu_bo only supports single MMU mappings. " -Carl V.
Re: [PATCH] accel/qaic: Clean up integer overflow checking in map_user_pages()
On 8/10/2023 5:23 AM, Dan Carpenter wrote: The encode_dma() function has some validation on in_trans->size but it would be more clear to move those checks to find_and_map_user_pages(). The encode_dma() had two checks: if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size) return -EINVAL; The in_trans->addr variable is the starting address. The in_trans->size variable is the total size of the transfer. The transfer can occur in parts and the resources->xferred_dma_size tracks how many bytes we have already transferred. This patch introduces a new variable "remaining" which represents the amount we want to transfer (in_trans->size) minus the amount we have already transferred (resources->xferred_dma_size). I have modified the check for if in_trans->size is zero to instead check if in_trans->size is less than resources->xferred_dma_size. If we have already transferred more bytes than in_trans->size then there are negative bytes remaining which doesn't make sense. If there are zero bytes remaining to be copied, just return success. The check in encode_dma() checked that "addr + size" could not overflow and barring a driver bug that should work, but it's easier to check if we do this in parts. First check that "in_trans->addr + resources->xferred_dma_size" is safe. Then check that "xfer_start_addr + remaining" is safe. My final concern was that we are dealing with u64 values but on 32bit systems the kmalloc() function will truncate the sizes to 32 bits. So I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);" and returned -EINVAL if it were >= SIZE_MAX. This will not affect 64bit systems. Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Dan Carpenter Looks good to me. Reviewed-by: Carl Vanderlip
Re: [PATCH][next] accel/qaic: remove redundant assignment to pointer pexec
On 7/25/2023 4:40 AM, Colin Ian King wrote: Pointer pexec is being assigned a value however it is never read. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King Reviewed-by: Carl Vanderlip
Re: [PATCH] MAINTAINERS: Add Carl/Pranjal as QAIC reviewers
On 5/23/2023 9:14 AM, Jeffrey Hugo wrote: Carl and Pranjal have been reviewing the QAIC patches. List them as reviewers so that they are copied on all developments which will make it easier for them to continue reviewing QAIC patches. Signed-off-by: Jeffrey Hugo --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7e0b87d5aa2e..a0ec9ee090a4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17381,6 +17381,8 @@ F: include/dt-bindings/clock/qcom,* QUALCOMM CLOUD AI (QAIC) DRIVER M:Jeffrey Hugo +R: Carl Vanderlip +R: Pranjal Ramajor Asha Kanojiya L:linux-arm-...@vger.kernel.org L:dri-devel@lists.freedesktop.org S:Supported ACK
Re: [PATCH v2] accel/qaic: initialize ret variable to 0
On 5/17/2023 9:56 AM, Jeffrey Hugo wrote: From: Tom Rix clang static analysis reports drivers/accel/qaic/qaic_data.c:610:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; ^~ From a code analysis of the function, the ret variable is only set some of the time but is always returned. This suggests ret can return uninitialized garbage. However BO allocation will ensure ret is always set in reality. Initialize ret to 0 to silence the warning. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Tom Rix [jhugo: Reword commit text] Signed-off-by: Jeffrey Hugo --- drivers/accel/qaic/qaic_data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c index 8ab26e64b231..e42c1f98 100644 --- a/drivers/accel/qaic/qaic_data.c +++ b/drivers/accel/qaic/qaic_data.c @@ -591,7 +591,7 @@ static int qaic_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struc struct qaic_bo *bo = to_qaic_bo(obj); unsigned long offset = 0; struct scatterlist *sg; - int ret; + int ret = 0; if (obj->import_attach) return -EINVAL; LGTM Reviewed-by: Carl Vanderlip
Re: [PATCH] accel/qaic: silence some uninitialized variable warnings
On 5/3/2023 3:41 AM, Dan Carpenter wrote: Smatch complains that these are not initialized if get_cntl_version() fails but we still print them in the debug message. Not the end of the world, but true enough. Let's just initialize them to a dummy value to make the checker happy. Signed-off-by: Dan Carpenter LGTM Reviewed-by: Carl Vanderlip