Re: [PATCH 13/13] accel/habanalabs: modify pci health check

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Ofir Bitton 

Today we read PCI VENDOR-ID in order to make sure PCI link is
healthy. Apparently the VENDOR-ID might be stored on host and
hence, when we read it we might not access the PCI bus.
In order to make sure PCI health check is reliable, we will start
checking the DEVICE-ID instead.


What's keeping some system from caching that as well?

Since this is checking for PCIe link health, it will be 0xFF when bad. 
Checking some part of Config Space that is writable would be more reliable.


-Carl V.


Re: [PATCH 12/13] accel/habanalabs: keep explicit size of reserved memory for FW

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tomer Tayar 

The reserved memory for FW is currently saved in an ASIC property in
units of MB, just like the value that comes from FW.
Except the fact that it is not clear from the property's name, it means
also that a calculation to actual size is required everywhere that it is
used.
Modify the property to hold the size in bytes.

Signed-off-by: Tomer Tayar 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 11/13] accel/habanalabs: handle reserved memory request when working with full FW

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tomer Tayar 

Currently the reserved memory request from FW is handled when running
with preboot only, but this request is relevant also when running with
full FW.
Modify to always handle this reservation request.

Signed-off-by: Tomer Tayar 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 10/13] accel/habanalabs/hwmon: rate limit errors user can generate

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Ofir Bitton 

Fetching sensor data can fail due to various reasons. In order
not to pollute the kernel log, those error prints must be
rate limited.

Signed-off-by: Ofir Bitton 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 09/13] accel/habanalabs/gaudi2: drain event lacks rd/wr indication

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Ofir Bitton 

Due to a H/W issue, AXI drain event does not include a read/write
indication, hence we remove this print.

Signed-off-by: Ofir Bitton 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 08/13] accel/habanalabs: fix error print

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Dani Liberman 

The unmasking is for event and it can be other event than RAZWI.

Signed-off-by: Dani Liberman 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 07/13] accel/habanalabs: initialize maybe-uninitialized variables

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tal Risin 

Prevent static analysis warning.

Signed-off-by: Tal Risin 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 06/13] accel/habanalabs: fix debugfs files permissions

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Avri Kehat 

debugfs files are created with permissions that don't align
with the access requirements.

Signed-off-by: Avri Kehat 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 05/13] accel/habanalabs: fix glbl error cause handling

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tomer Tayar 

The glbl error cause handling has a wrong assumption that all error
bits are consecutive.
Fix the handling to check all relevant error bits per ASIC.

Signed-off-by: Tomer Tayar 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 04/13] accel/habanalabs/gaudi2: check extended errors according to PCIe addr_dec interrupt info

2024-02-23 Thread Carl Vanderlip

On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tomer Tayar 

The FW interrupt info for a PCIe addr_dec event is set correctly, so
check for either global errors or razwi according to the indications
there.

Signed-off-by: Tomer Tayar 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 03/13] accel/habanalabs: modify print for skip loading linux FW to debug log

2024-02-23 Thread Carl Vanderlip



On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Tomer Tayar 

Skip loading a linux FW image into the device with the current supported
ASICs is done for test purposes only.
Moreover, for future supported ASICs it is possible that there won't be
a need to load such an image.
The print in such a case is therefore not needed in most cases, so
replace the used dev_info() with dev_dbg().

Signed-off-by: Tomer Tayar 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 02/13] accel/habanalabs: remove hop size from asic properties

2024-02-23 Thread Carl Vanderlip



On 2/20/2024 8:01 AM, Oded Gabbay wrote:

From: Farah Kassabri 

The hop size related properties is a MMU properties and not
asic properties.
As for PMMU and HMMU we could have different sizes.

Signed-off-by: Farah Kassabri 
Reviewed-by: Oded Gabbay 
Signed-off-by: Oded Gabbay 


Reviewed-by: Carl Vanderlip 


Re: [PATCH 01/13] accel/habanalabs/gaudi2: use single function to compare FW versions

2024-02-23 Thread Carl Vanderlip
On 2/20/2024 8:01 AM, Oded Gabbay wrote:> From: Ohad Sharabi 


>
> Currently, the code contains 2 types of FW version comparison functions:
> - hl_is_fw_sw_ver_[below/equal_or_greater]()
> - gaudi2 specific function of the type
>gaudi2_is_fw_ver_[below/above]x_y_z()
>
> Moreover, some functions use the inner FW version which should be only
> stage during development but not version dependencies.
>
> Finally, some tests are done to deprecated FW version to which LKD
> should hold no compatibility.
>
> This commit aligns all APIs to a single function that just compares the
> version and return an integers indicator (similar in some way to
> strcmp()).
>
> In addition, this generic function now considers also the sub-minor FW
> version and also remove dead code resulting in deprecated FW versions
> compatibility.
>
> Signed-off-by: Ohad Sharabi 
> Reviewed-by: Oded Gabbay 
> Signed-off-by: Oded Gabbay 
> ---
>   drivers/accel/habanalabs/common/firmware_if.c | 25 
>   drivers/accel/habanalabs/common/habanalabs.h  | 20 +--
>   drivers/accel/habanalabs/gaudi2/gaudi2.c  | 57 +++
>   3 files changed, 34 insertions(+), 68 deletions(-)
>
...
> diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c 
b/drivers/accel/habanalabs/gaudi2/gaudi2.c

> index 1f061209ae21..4a0917aa4dd7 100644
> --- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
> +++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
> @@ -2601,6 +2601,8 @@ static int gaudi2_set_fixed_properties(struct 
hl_device *hdev)

>
>prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0;
>
> +  prop->supports_advanced_cpucp_rc = true;
> +
>return 0;
>
>   free_qprops:
> @@ -3308,8 +3310,6 @@ static int gaudi2_late_init(struct hl_device *hdev)
>struct gaudi2_device *gaudi2 = hdev->asic_specific;
>int rc;
>
> -  hdev->asic_prop.supports_advanced_cpucp_rc = true;
> -
>rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS,
>gaudi2->virt_msix_db_dma_addr);
>if (rc) {

Is this change in support of the others in this patch? Feels like this 
should be more than one patch (adding new version_cmp, removing old checks).


-Carl V.


Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers

2024-01-09 Thread Carl Vanderlip

On 1/8/2024 11:07 AM, Dmitry Baryshkov wrote:

On Mon, 8 Jan 2024 at 19:57, Carl Vanderlip  wrote:

On 1/5/2024 4:38 PM, Dmitry Baryshkov wrote:

On Sat, 6 Jan 2024 at 02:04, Carl Vanderlip  wrote:

On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote:

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 50b65ffc24b1..ef57586fbeca 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev,
return 0;
}

+#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU)
+bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver)
+{
+ /* If just a single driver is enabled, use it no matter what */
+ return true;
+}


This will cause both MDP/DPU probes to return -ENODEV, rather than
select the enabled one.


No. The code (e.g. for DPU) is:

 if (!msm_disp_drv_should_bind(>dev, true))
  return -ENODEV;

So the driver returns -ENODEV if msm_disp_drv_should_bind() returns
false. Which is logical from the function name point of view.



but msm_disp_drv_should_bind() is returning true in the #if !REACHABLE()
case?

at minimum the comment is incorrect since returning true causes the
driver to NOT be used.


No. Returning _false_ causes the driver to not be used.



Apologies... you are correct.

Reviewed-by: Carl Vanderlip 

-Carl V.


Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers

2024-01-08 Thread Carl Vanderlip




On 1/5/2024 4:38 PM, Dmitry Baryshkov wrote:

On Sat, 6 Jan 2024 at 02:04, Carl Vanderlip  wrote:



On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote:

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 50b65ffc24b1..ef57586fbeca 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev,
   return 0;
   }

+#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU)
+bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver)
+{
+ /* If just a single driver is enabled, use it no matter what */
+ return true;
+}


This will cause both MDP/DPU probes to return -ENODEV, rather than
select the enabled one.


No. The code (e.g. for DPU) is:

if (!msm_disp_drv_should_bind(>dev, true))
 return -ENODEV;

So the driver returns -ENODEV if msm_disp_drv_should_bind() returns
false. Which is logical from the function name point of view.



but msm_disp_drv_should_bind() is returning true in the #if !REACHABLE() 
case?


at minimum the comment is incorrect since returning true causes the
driver to NOT be used.

-Carl V.


Re: [PATCH v3 3/4] drm/msm: add a kernel param to select between MDP5 and DPU drivers

2024-01-05 Thread Carl Vanderlip



On 1/5/2024 3:34 PM, Dmitry Baryshkov wrote:

For some of the platforms (e.g. SDM660, SDM630, MSM8996, etc.) it is
possible to support this platform via the DPU driver (e.g. to provide
support for DP, multirect, etc). Add a modparam to be able to switch
between these two drivers.

All platforms supported by both drivers are by default handled by the
MDP5 driver. To let them be handled by the DPU driver pass the
`msm.prefer_mdp5=false` kernel param.

Reviewed-by: Stephen Boyd 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  3 +++
  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  3 +++
  drivers/gpu/drm/msm/msm_drv.c| 31 +++
  drivers/gpu/drm/msm/msm_drv.h|  1 +
  4 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index aa9e0ad33ebb..8f11a98491a1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -1276,6 +1276,9 @@ static int dpu_dev_probe(struct platform_device *pdev)
int irq;
int ret = 0;
  
+	if (!msm_disp_drv_should_bind(>dev, true))

+   return -ENODEV;
+
dpu_kms = devm_kzalloc(dev, sizeof(*dpu_kms), GFP_KERNEL);
if (!dpu_kms)
return -ENOMEM;
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 0827634664ae..43d05851c54d 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -866,6 +866,9 @@ static int mdp5_dev_probe(struct platform_device *pdev)
  
  	DBG("");
  
+	if (!msm_disp_drv_should_bind(>dev, false))

+   return -ENODEV;
+
mdp5_kms = devm_kzalloc(>dev, sizeof(*mdp5_kms), GFP_KERNEL);
if (!mdp5_kms)
return -ENOMEM;
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 50b65ffc24b1..ef57586fbeca 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -969,6 +969,37 @@ static int add_components_mdp(struct device *master_dev,
return 0;
  }
  
+#if !IS_REACHABLE(CONFIG_DRM_MSM_MDP5) || !IS_REACHABLE(CONFIG_DRM_MSM_DPU)

+bool msm_disp_drv_should_bind(struct device *dev, bool mdp5_driver)


s/mdp5_driver/dpu_driver/


+{
+   /* If just a single driver is enabled, use it no matter what */
+   return true;
+}


This will cause both MDP/DPU probes to return -ENODEV, rather than
select the enabled one.


+#else
+
+static bool prefer_mdp5 = true;
+MODULE_PARM_DESC(prefer_mdp5, "Select whether MDP5 or DPU driver should be 
preferred");
+module_param(prefer_mdp5, bool, 0444);
+
+/* list all platforms supported by both mdp5 and dpu drivers */
+static const char *const msm_mdp5_dpu_migration[] = {
+   NULL,
+};
+
+bool msm_disp_drv_should_bind(struct device *dev, bool dpu_driver)
+{
+   /* If it is not an MDP5 device, do not try MDP5 driver */
+   if (!of_device_is_compatible(dev->of_node, "qcom,mdp5"))
+   return dpu_driver;
+
+   /* If it is not in the migration list, use MDP5 */
+   if (!of_device_compatible_match(dev->of_node, msm_mdp5_dpu_migration))
+   return !dpu_driver;
+
+   return prefer_mdp5 ? !dpu_driver : dpu_driver;
+}
+#endif
+
  /*
   * We don't know what's the best binding to link the gpu with the drm device.
   * Fow now, we just hunt for all the possible gpus that we support, and add 
them
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 01e783130054..762e13e2df74 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -563,5 +563,6 @@ int msm_drv_probe(struct device *dev,
struct msm_kms *kms);
  void msm_kms_shutdown(struct platform_device *pdev);
  
+bool msm_disp_drv_should_bind(struct device *dev, bool dpu_driver);
  
  #endif /* __MSM_DRV_H__ */




Re: [PATCH 08/10] accel/ivpu: Disable buffer sharing among VPU contexts

2024-01-05 Thread Carl Vanderlip



On 1/5/2024 3:22 AM, Jacek Lawrynowicz wrote:

Imported buffer from another VPU context will now have just reference
increased and there will be a single sgt fixing above problem but
buffers still can't be shared among VPU contexts because each context
have its own MMU mapping and ivpu_bo supports only single MMU mapping.


This paragraph/sentence needs rewrite. Here's my take...

"
Buffers imported from another VPU context will now just increase 
reference count, leaving only a single sgt, fixing the problem above.
Buffers still can't be shared among VPU contexts because each has its 
own MMU mapping and ivpu_bo only supports single MMU mappings.

"

-Carl V.


Re: [PATCH] accel/qaic: Clean up integer overflow checking in map_user_pages()

2023-08-11 Thread Carl Vanderlip

On 8/10/2023 5:23 AM, Dan Carpenter wrote:

The encode_dma() function has some validation on in_trans->size but it
would be more clear to move those checks to find_and_map_user_pages().

The encode_dma() had two checks:

if (in_trans->addr + in_trans->size < in_trans->addr || !in_trans->size)
return -EINVAL;

The in_trans->addr variable is the starting address.  The in_trans->size
variable is the total size of the transfer.  The transfer can occur in
parts and the resources->xferred_dma_size tracks how many bytes we have
already transferred.

This patch introduces a new variable "remaining" which represents the
amount we want to transfer (in_trans->size) minus the amount we have
already transferred (resources->xferred_dma_size).

I have modified the check for if in_trans->size is zero to instead check
if in_trans->size is less than resources->xferred_dma_size.  If we have
already transferred more bytes than in_trans->size then there are negative
bytes remaining which doesn't make sense.  If there are zero bytes
remaining to be copied, just return success.

The check in encode_dma() checked that "addr + size" could not overflow
and barring a driver bug that should work, but it's easier to check if
we do this in parts.  First check that "in_trans->addr +
resources->xferred_dma_size" is safe.  Then check that "xfer_start_addr +
remaining" is safe.

My final concern was that we are dealing with u64 values but on 32bit
systems the kmalloc() function will truncate the sizes to 32 bits.  So
I calculated "total = in_trans->size + offset_in_page(xfer_start_addr);"
and returned -EINVAL if it were >= SIZE_MAX.  This will not affect 64bit
systems.

Fixes: 129776ac2e38 ("accel/qaic: Add control path")
Signed-off-by: Dan Carpenter 


Looks good to me.

Reviewed-by: Carl Vanderlip 


Re: [PATCH][next] accel/qaic: remove redundant assignment to pointer pexec

2023-07-26 Thread Carl Vanderlip



On 7/25/2023 4:40 AM, Colin Ian King wrote:

Pointer pexec is being assigned a value however it is never read. The
assignment is redundant and can be removed.

Signed-off-by: Colin Ian King 


Reviewed-by: Carl Vanderlip 


Re: [PATCH] MAINTAINERS: Add Carl/Pranjal as QAIC reviewers

2023-05-24 Thread Carl Vanderlip



On 5/23/2023 9:14 AM, Jeffrey Hugo wrote:

Carl and Pranjal have been reviewing the QAIC patches.  List them as
reviewers so that they are copied on all developments which will make
it easier for them to continue reviewing QAIC patches.

Signed-off-by: Jeffrey Hugo 
---
  MAINTAINERS | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e0b87d5aa2e..a0ec9ee090a4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17381,6 +17381,8 @@ F:  include/dt-bindings/clock/qcom,*
  
  QUALCOMM CLOUD AI (QAIC) DRIVER

  M:Jeffrey Hugo 
+R: Carl Vanderlip 
+R: Pranjal Ramajor Asha Kanojiya 
  L:linux-arm-...@vger.kernel.org
  L:dri-devel@lists.freedesktop.org
  S:Supported


ACK


Re: [PATCH v2] accel/qaic: initialize ret variable to 0

2023-05-18 Thread Carl Vanderlip

On 5/17/2023 9:56 AM, Jeffrey Hugo wrote:

From: Tom Rix 

clang static analysis reports
drivers/accel/qaic/qaic_data.c:610:2: warning: Undefined or garbage
   value returned to caller [core.uninitialized.UndefReturn]
 return ret;
 ^~


From a code analysis of the function, the ret variable is only set some

of the time but is always returned.  This suggests ret can return
uninitialized garbage. However BO allocation will ensure ret is always
set in reality.

Initialize ret to 0 to silence the warning.

Fixes: ff13be830333 ("accel/qaic: Add datapath")
Signed-off-by: Tom Rix 
[jhugo: Reword commit text]
Signed-off-by: Jeffrey Hugo 
---
  drivers/accel/qaic/qaic_data.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index 8ab26e64b231..e42c1f98 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -591,7 +591,7 @@ static int qaic_gem_object_mmap(struct drm_gem_object *obj, 
struct vm_area_struc
struct qaic_bo *bo = to_qaic_bo(obj);
unsigned long offset = 0;
struct scatterlist *sg;
-   int ret;
+   int ret = 0;
  
  	if (obj->import_attach)

return -EINVAL;



LGTM

Reviewed-by: Carl Vanderlip 


Re: [PATCH] accel/qaic: silence some uninitialized variable warnings

2023-05-12 Thread Carl Vanderlip

On 5/3/2023 3:41 AM, Dan Carpenter wrote:

Smatch complains that these are not initialized if get_cntl_version()
fails but we still print them in the debug message.  Not the end of
the world, but true enough.  Let's just initialize them to a dummy value
to make the checker happy.

Signed-off-by: Dan Carpenter 


LGTM

Reviewed-by: Carl Vanderlip