Re: [PATCH v5 15/16] drm/mediatek: modify mediatek-drm for mt8195 multi mmsys support
Hi Chun-Kuang, Thanks for the review. On Wed, 2021-09-29 at 08:03 +0800, Chun-Kuang Hu wrote: > > Hi, Nancy: > > Nancy.Lin 於 2021年9月6日 週一 下午3:15寫道: > > > > MT8195 have two mmsys. Modify drm for MT8195 multi-mmsys support. > > The two mmsys (vdosys0 and vdosys1) will bring up two drm drivers, > > only one drm driver register as the drm device. > > Each drm driver binds its own component. The first bind drm driver > > will allocate the drm device, and the last bind drm driver > > registers > > the drm device to drm core. Each crtc path is created with the > > corresponding drm driver data. > > > > Signed-off-by: Nancy.Lin > > --- > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 25 +- > > drivers/gpu/drm/mediatek/mtk_drm_crtc.h | 3 +- > > drivers/gpu/drm/mediatek/mtk_drm_drv.c | 360 +++- > > > > drivers/gpu/drm/mediatek/mtk_drm_drv.h | 8 +- > > 4 files changed, 320 insertions(+), 76 deletions(-) > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > index 474efb844249..68cb15c38c3f 100644 > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > @@ -737,21 +737,28 @@ static int > > mtk_drm_crtc_init_comp_planes(struct drm_device *drm_dev, > > } > > > > int mtk_drm_crtc_create(struct drm_device *drm_dev, > > - const enum mtk_ddp_comp_id *path, unsigned > > int path_len) > > + const enum mtk_ddp_comp_id *path, unsigned > > int path_len, > > + int priv_data_index) > > { > > struct mtk_drm_private *priv = drm_dev->dev_private; > > struct device *dev = drm_dev->dev; > > struct mtk_drm_crtc *mtk_crtc; > > unsigned int num_comp_planes = 0; > > - int pipe = priv->num_pipes; > > int ret; > > int i; > > bool has_ctm = false; > > uint gamma_lut_size = 0; > > + struct drm_crtc *tmp; > > + int crtc_i = 0; > > > > if (!path) > > return 0; > > > > + priv = priv->all_drm_private[priv_data_index]; > > + > > + drm_for_each_crtc(tmp, drm_dev) > > + crtc_i++; > > + > > for (i = 0; i < path_len; i++) { > > enum mtk_ddp_comp_id comp_id = path[i]; > > struct device_node *node; > > @@ -760,7 +767,7 @@ int mtk_drm_crtc_create(struct drm_device > > *drm_dev, > > if (!node) { > > dev_info(dev, > > "Not creating crtc %d because > > component %d is disabled or missing\n", > > -pipe, comp_id); > > +crtc_i, comp_id); > > return 0; > > } > > } > > @@ -816,25 +823,25 @@ int mtk_drm_crtc_create(struct drm_device > > *drm_dev, > > > > for (i = 0; i < mtk_crtc->ddp_comp_nr; i++) { > > ret = mtk_drm_crtc_init_comp_planes(drm_dev, > > mtk_crtc, i, > > - pipe); > > + crtc_i); > > if (ret) > > return ret; > > } > > > > - ret = mtk_drm_crtc_init(drm_dev, mtk_crtc, pipe); > > + ret = mtk_drm_crtc_init(drm_dev, mtk_crtc, crtc_i); > > if (ret < 0) > > return ret; > > > > if (gamma_lut_size) > > drm_mode_crtc_set_gamma_size(_crtc->base, > > gamma_lut_size); > > drm_crtc_enable_color_mgmt(_crtc->base, 0, has_ctm, > > gamma_lut_size); > > - priv->num_pipes++; > > mutex_init(_crtc->hw_lock); > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ) > > + i = (priv->data->mmsys_dev_num > 1) ? 0 : > > drm_crtc_index(_crtc->base); > > + > > mtk_crtc->cmdq_client = > > - cmdq_mbox_create(mtk_crtc->mmsys_dev, > > -drm_crtc_index(_crtc- > > >base)); > > + cmdq_mbox_create(mtk_crtc->mmsys_dev, i); > > if (IS_ERR(mtk_crtc->cmdq_client)) { > > dev_dbg(dev, "mtk_crtc %d failed to create mailbox > > client, writing register by CPU now\n", > > drm_crtc_index(_crtc->base)); > > @@ -844,7 +851,7 @@ int mtk_drm_crtc_create(struct drm_device > > *drm_dev, > > if (mtk_crtc->cmdq_client) { > > ret = of_property_read_u32_index(priv->mutex_node, > > "mediatek,gce- > > events", > > -drm_crtc_index( > > k_crtc->base), > > +i, > > _crtc- > > >cmdq_event); > > if (ret) { > > dev_dbg(dev, "mtk_crtc %d failed to get > > mediatek,gce-events property\n", > > diff --git
Re: Questions over DSI within DRM.
Hi, Thanks Laurent for reviving the thread, I have missed it entirely. On 03.10.2021 16:16, Laurent Pinchart wrote: Hello, Reviving a bit of an old thread. On Thu, Jul 15, 2021 at 11:50:22AM +0200, Maxime Ripard wrote: On Tue, Jul 06, 2021 at 05:44:58PM +0100, Dave Stevenson wrote: On Tue, 6 Jul 2021 at 16:13, Maxime Ripard wrote: On a similar theme, some devices want the clock lane in HS mode early so they can use it in place of an external oscillator, but the data lanes still in LP-11. There appears to be no way for the display/bridge to signal this requirement or it be achieved. You're right. A lng time ago, the omapdrm driver had an internal infrastructure that didn't use drm_bridge or drm_panel and instead required omapdrm-specific drivers for those components. It used to model the display pipeline in a different way than drm_bridge, with the sync explicitly setting the source state. A DSI sink could thus control its enable sequence, interleaving programming of the sink with control of the source. Migrating omapdrm to the drm_bridge model took a really large effort, which makes me believe that transitioning the whole subsystem to sink-controlled sources would be close to impossible. We could add DSI-specific operations, or add another enable bridge operation (post_pre_enable ? :-D). Neither would scale, but it may be enough. I haven't thought it through for all generic cases, but I suspect it's more a pre_pre_enable that is needed to initialise the PHY etc, probably from source to sink. I believe it could be implemented as a pre-pre-enable indeed. It feels like a bit of a hack, as the next time we need more fine-grained control over the startup sequence, we'll have to add a pre-pre-pre-enable. Given that the startup sequence requirements come from the sink device, it would make sense to let it explicitly control the initialization, instead of driving it from the source. I don't think we'll be able to rework the bridge API in that direction though, so I'm fine with a hack. As I remember I have suggested in similar discussion [1] adding to mipi_dsi_host_ops requested operations: power_on - power on DSI bus (do we really need it?) init - enter LP11 (or HS-stop state if I remember correctly) and then call them from the right place in DSI device, probably pre_enable callback. This way we could avoid polluting drm_bridge_ops, with DSI specific stuff. [1]: https://lore.kernel.org/dri-devel/6700c90f-d0e0-5cbf-1616-0c1d15844...@samsung.com/#t Sorry for addressing only this issue, but I need to read whole thread, to re-read whole thread, and today it is too late for me :) Regards Andrzej
572994bf18ff prevents system boot
Hi- After updating one of my test systems to v5.15-rc, I found that it becomes unresponsive during the later part of the boot process. A power-on reset is necessary to recover. I bisected to this commit: 572994bf18ff ("drm/ast: Zero is missing in detect function") Checking out v5.15-rc3 and reverting this commit enables the system to boot again. 0b:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30) (prog-if 00 [VGA controller]) DeviceName: ASPEED Video AST2400 Subsystem: Super Micro Computer Inc X10SRL-F Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR-
Re: Questions over DSI within DRM.
Hello, Reviving a bit of an old thread. On Thu, Jul 15, 2021 at 11:50:22AM +0200, Maxime Ripard wrote: > On Tue, Jul 06, 2021 at 05:44:58PM +0100, Dave Stevenson wrote: > > On Tue, 6 Jul 2021 at 16:13, Maxime Ripard wrote: > > > > > > On a similar theme, some devices want the clock lane in HS mode > > > > > > early > > > > > > so they can use it in place of an external oscillator, but the data > > > > > > lanes still in LP-11. There appears to be no way for the > > > > > > display/bridge to signal this requirement or it be achieved. > > > > > > > > > > You're right. A lng time ago, the omapdrm driver had an internal > > > > > infrastructure that didn't use drm_bridge or drm_panel and instead > > > > > required omapdrm-specific drivers for those components. It used to > > > > > model > > > > > the display pipeline in a different way than drm_bridge, with the sync > > > > > explicitly setting the source state. A DSI sink could thus control its > > > > > enable sequence, interleaving programming of the sink with control of > > > > > the source. > > > > > > > > > > Migrating omapdrm to the drm_bridge model took a really large effort, > > > > > which makes me believe that transitioning the whole subsystem to > > > > > sink-controlled sources would be close to impossible. We could add > > > > > DSI-specific operations, or add another enable bridge operation > > > > > (post_pre_enable ? :-D). Neither would scale, but it may be enough. > > > > > > > > I haven't thought it through for all generic cases, but I suspect it's > > > > more a pre_pre_enable that is needed to initialise the PHY etc, > > > > probably from source to sink. I believe it could be implemented as a pre-pre-enable indeed. It feels like a bit of a hack, as the next time we need more fine-grained control over the startup sequence, we'll have to add a pre-pre-pre-enable. Given that the startup sequence requirements come from the sink device, it would make sense to let it explicitly control the initialization, instead of driving it from the source. I don't think we'll be able to rework the bridge API in that direction though, so I'm fine with a hack. > > > > If the panel/bridge can set a flag that can be checked at this point > > > > for whether an early clock is required or not, I think that allows us > > > > to comply with the requirements for a large number of panels/bridges > > > > (LP-11 vs HS config for clock and or data lanes before pre_enable is > > > > called). > > > > > > > > pre_enable retains the current behaviour (initialise the chain from > > > > sink to source). > > > > enable then actually starts sending video and enabling outputs. > > > > > > Flags indeed seem like a more contained option. Another one could be to > > > have a mipi_dsi_host to (for example) power up the clock lane that would > > > be called by default before the bridge's enable, or at the downstream > > > bridge driver discretion before that. > > > > Which driver will that call? > > The parent DSI Host > > > An extreme example perhaps, but Toshiba make the TC358860 eDP to DSI > > bridge chip[1]. So the encoder will be eDP, but the DSI config needs > > to go to that bridge. Does that happen automatically within the > > framework? I guess so as the bridge will have called > > mipi_dsi_host_register for the DSI sink to attach to. > > In that case, whatever sink would be connected to the bridge would call > the bridge clock enable hook if it needs it in its pre_enable, or it > would be called automatically before enable if it doesn't > > Would that help? Sounds good to me, in theory at least (let's see what issues we'll run into in practice :-)). Has anyone given it a try, or is planning to ? > > Perhaps a new mipi_dsi_host function to configure the PHY is the > > easier solution. If it can allow the sink to request whatever > > combination of states from clock and data lanes that it fancies, then > > it can be as explicit as required for the initialisation sequence, and > > the host driver does its best to comply with the requests. > > I don't know, I'm not really fond in general of solutions that try to > cover any single case if we don't need it and / or have today an issue > with this. I'd rather have something that works for the particular > bridge we were discussing, see if it applies to other bridges and modify > it if it doesn't until it works for all our cases. Trying to reason in > all possible cases tend to lead to solutions that are difficult to > maintain and usually over-engineered. A DSI host clock enable operation or a DSI host PHY configuration operation both fit in the same place in the grand scheme of things, so I don't mind either. We should be able to easily move from a more specific operation to a more generic one if the need arises. > > I'd have a slight query over when and how the host would drop to ULPS > > or power off. It probably shouldn't be in post_disable as the sink > > hasn't had a chance to finalise everything in its
Re: [PATCH] drm: rcar-du: Don't create encoder for unconnected LVDS outputs
Hi Geert, On Tue, Sep 28, 2021 at 10:55:57AM +0200, Geert Uytterhoeven wrote: > On Mon, Aug 23, 2021 at 4:54 PM Laurent Pinchart wrote: > > On Mon, Aug 23, 2021 at 02:25:32PM +0200, Geert Uytterhoeven wrote: > > > On Sun, Aug 22, 2021 at 2:36 AM Laurent Pinchart wrote: > > > > On R-Car D3 and E3, the LVDS encoders provide the pixel clock to the DU, > > > > even when LVDS outputs are not used. For this reason, the rcar-lvds > > > > driver probes successfully on those platforms even if no further bridge > > > > or panel is connected to the LVDS output, in order to provide the > > > > rcar_lvds_clk_enable() and rcar_lvds_clk_disable() functions to the DU > > > > driver. > > > > > > > > If an LVDS output isn't connected, trying to create a DRM connector for > > > > the output will fail. Fix this by skipping connector creation in that > > > > case, and also skip creation of the DRM encoder as there's no point in > > > > an encoder without a connector. > > > > > > > > Fixes: e9e056949c92 ("drm: rcar-du: lvds: Convert to DRM panel bridge > > > > helper") > > > > Reported-by: Geert Uytterhoeven > > > > > > Can you please change that to > > > Reported-by: Geert Uytterhoeven > > > ? > > > > Sure thing. > > Thanks! > > > > > Signed-off-by: Laurent Pinchart > > > > > > > > > > Thanks, the scary warning on Ebisu-4D is gone, so > > > Tested-by: Geert Uytterhoeven > > > > > > Disclaimer: there are no displays connected to my Ebisu-4D. > > > > That's the best way to ensure the absence of display issues. It works > > great for camera testing too, if you also remove networking and storage > > :-) > > Any chance this fix can make it upstream? > The fix was created before the issue entered upstream in v5.15-rc1. Pull request sent. -- Regards, Laurent Pinchart
[GIT FIXES FOR v5.15] R-Car DU fix
Hi Dave and Daniel, The following changes since commit 78ea81417944fe03f48648eddeb8e8a8e513c4ad: Merge tag 'exynos-drm-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes (2021-10-01 18:14:39 +0200) are available in the Git repository at: git://linuxtv.org/pinchartl/media.git tags/du-fixes-20211003 for you to fetch changes up to 734d0b3b6afd44510b4752ea730685f89e107767: drm: rcar-du: Don't create encoder for unconnected LVDS outputs (2021-10-03 16:05:36 +0300) - R-Car DU fix for LVDS outputs on D3 and E3 SoCs Laurent Pinchart (1): drm: rcar-du: Don't create encoder for unconnected LVDS outputs drivers/gpu/drm/rcar-du/rcar_du_encoder.c | 16 drivers/gpu/drm/rcar-du/rcar_lvds.c | 11 +++ drivers/gpu/drm/rcar-du/rcar_lvds.h | 5 + 3 files changed, 28 insertions(+), 4 deletions(-) -- Regards, Laurent Pinchart
Re: [PATCH v4 00/24] drm/bridge: Make panel and bridge probe order consistent
Hi Maxime, On Thu, Sep 30, 2021 at 12:56:02AM +0300, Laurent Pinchart wrote: > On Fri, Sep 10, 2021 at 12:11:54PM +0200, Maxime Ripard wrote: > > Hi, > > > > We've encountered an issue with the RaspberryPi DSI panel that prevented the > > whole display driver from probing. > > > > The issue is described in detail in the commit 7213246a803f ("drm/vc4: dsi: > > Only register our component once a DSI device is attached"), but the basic > > idea > > is that since the panel is probed through i2c, there's no synchronization > > between its probe and the registration of the MIPI-DSI host it's attached > > to. > > > > We initially moved the component framework registration to the MIPI-DSI Host > > attach hook to make sure we register our component only when we have a DSI > > device attached to our MIPI-DSI host, and then use lookup our DSI device in > > our > > bind hook. > > > > However, all the DSI bridges controlled through i2c are only registering > > their > > associated DSI device in their bridge attach hook, meaning with our change > > above, we never got that far, and therefore ended up in the same situation > > than > > the one we were trying to fix for panels. > > > > The best practice to avoid those issues is to register its functions only > > after > > all its dependencies are live. We also shouldn't wait any longer than we > > should > > to play nice with the other components that are waiting for us, so in our > > case > > that would mean moving the DSI device registration to the bridge probe. > > > > I also had a look at all the DSI hosts, and it seems that exynos, kirin and > > msm > > would be affected by this and wouldn't probe anymore after those changes. > > Exynos and kirin seems to be simple enough for a mechanical change (that > > still > > requires to be tested), but the changes in msm seemed to be far more > > important > > and I wasn't confortable doing them. > > > > Let me know what you think, > > I've tested this series on my RPi CM4-based board, and there's a clear > improvement: the sn65dsi83 now probes successfully ! > > The downside is that I can now look at a panel that desperately refuses > to display anything. That's a separate issue, but it prevents me from > telling whether this series introduces regressions :-S I'll try to debug > that separately. I managed to (partly) fix that issue with a few backports from the RPi kernel, making me confident enough to say Tested-by: Laurent Pinchart for drivers/gpu/drm/bridge/ti-sn65dsi83.c drivers/gpu/drm/drm_bridge.c drivers/gpu/drm/drm_mipi_dsi.c include/drm/drm_mipi_dsi.h > Also, Kieran, would you be able to test this with the SN65DSI86 ? > > > --- > > > > Changes from v3: > > - Converted exynos and kirin > > - Converted all the affected bridge drivers > > - Reworded the documentation a bit > > > > Changes from v2: > > - Changed the approach as suggested by Andrzej, and aligned the bridge on > > the > > panel this time. > > - Fixed some typos > > > > Changes from v1: > > - Change the name of drm_of_get_next function to drm_of_get_bridge > > - Mention the revert of 87154ff86bf6 and squash the two patches that were > > reverting that commit > > - Add some documentation > > - Make drm_panel_attach and _detach succeed when no callback is there > > > > Maxime Ripard (24): > > drm/bridge: Add documentation sections > > drm/bridge: Document the probe issue with MIPI-DSI bridges > > drm/mipi-dsi: Create devm device registration > > drm/mipi-dsi: Create devm device attachment > > drm/bridge: adv7533: Switch to devm MIPI-DSI helpers > > drm/bridge: adv7511: Register and attach our DSI device at probe > > drm/bridge: anx7625: Switch to devm MIPI-DSI helpers > > drm/bridge: anx7625: Register and attach our DSI device at probe > > drm/bridge: lt8912b: Switch to devm MIPI-DSI helpers > > drm/bridge: lt8912b: Register and attach our DSI device at probe > > drm/bridge: lt9611: Switch to devm MIPI-DSI helpers > > drm/bridge: lt9611: Register and attach our DSI device at probe > > drm/bridge: lt9611uxc: Switch to devm MIPI-DSI helpers > > drm/bridge: lt9611uxc: Register and attach our DSI device at probe > > drm/bridge: ps8640: Switch to devm MIPI-DSI helpers > > drm/bridge: ps8640: Register and attach our DSI device at probe > > drm/bridge: sn65dsi83: Switch to devm MIPI-DSI helpers > > drm/bridge: sn65dsi83: Register and attach our DSI device at probe > > drm/bridge: sn65dsi86: Switch to devm MIPI-DSI helpers > > drm/bridge: sn65dsi86: Register and attach our DSI device at probe > > drm/bridge: tc358775: Switch to devm MIPI-DSI helpers > > drm/bridge: tc358775: Register and attach our DSI device at probe > > drm/kirin: dsi: Adjust probe order > > drm/exynos: dsi: Adjust probe order > > > > Documentation/gpu/drm-kms-helpers.rst| 12 +++ > > drivers/gpu/drm/bridge/adv7511/adv7511.h | 1 - > >
[PATCH] drm/i915: Prefer struct_size over open coded arithmetic
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case these are not actually dynamic sizes: all the operands involved in the calculation are constant values. However it is better to refactor them anyway, just to keep the open-coded math idiom out of code. So, add at the end of the struct i915_syncmap a union with two flexible array members (these arrays share the same memory layout). This is possible using the new DECLARE_FLEX_ARRAY macro. And then, use the struct_size() helper to do the arithmetic instead of the argument "size + count * size" in the kmalloc and kzalloc() functions. Also, take the opportunity to refactor the __sync_seqno and __sync_child making them more readable. This code was detected with the help of Coccinelle and audited and fixed manually. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker --- drivers/gpu/drm/i915/i915_syncmap.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_syncmap.c b/drivers/gpu/drm/i915/i915_syncmap.c index 60404dbb2e9f..a8d35491d05a 100644 --- a/drivers/gpu/drm/i915/i915_syncmap.c +++ b/drivers/gpu/drm/i915/i915_syncmap.c @@ -82,6 +82,10 @@ struct i915_syncmap { * struct i915_syncmap *child[KSYNCMAP]; * }; */ + union { + DECLARE_FLEX_ARRAY(u32, seqno); + DECLARE_FLEX_ARRAY(struct i915_syncmap *, child); + }; }; /** @@ -99,13 +103,13 @@ void i915_syncmap_init(struct i915_syncmap **root) static inline u32 *__sync_seqno(struct i915_syncmap *p) { GEM_BUG_ON(p->height); - return (u32 *)(p + 1); + return p->seqno; } static inline struct i915_syncmap **__sync_child(struct i915_syncmap *p) { GEM_BUG_ON(!p->height); - return (struct i915_syncmap **)(p + 1); + return p->child; } static inline unsigned int @@ -200,7 +204,7 @@ __sync_alloc_leaf(struct i915_syncmap *parent, u64 id) { struct i915_syncmap *p; - p = kmalloc(sizeof(*p) + KSYNCMAP * sizeof(u32), GFP_KERNEL); + p = kmalloc(struct_size(p, seqno, KSYNCMAP), GFP_KERNEL); if (unlikely(!p)) return NULL; @@ -282,7 +286,7 @@ static noinline int __sync_set(struct i915_syncmap **root, u64 id, u32 seqno) unsigned int above; /* Insert a join above the current layer */ - next = kzalloc(sizeof(*next) + KSYNCMAP * sizeof(next), + next = kzalloc(struct_size(next, child, KSYNCMAP), GFP_KERNEL); if (unlikely(!next)) return -ENOMEM; -- 2.25.1
[PATCH v7 2/2] habanalabs: add support for dma-buf exporter
From: Tomer Tayar Implement the calls to the dma-buf kernel api to create a dma-buf object backed by FD. We block the option to mmap the DMA-BUF object because we don't support DIRECT_IO and implicit P2P. We only implement support for explicit P2P through importing the FD of the DMA-BUF. In the export phase, we provide to the DMA-BUF object an array of pages that represent the device's memory area. During the map callback, we convert the array of pages into an SGT. We split/merge the pages according to the dma max segment size of the importer. To get the DMA address of the PCI bar, we use the dma_map_resources() kernel API, because our device memory is not backed by page struct and this API doesn't need page struct to map the physical address to a DMA address. We set the orig_nents member of the SGT to be 0, to indicate to other drivers that we don't support CPU mappings. Note that in Habanalabs's ASICs, the device memory is pinned and immutable. Therefore, there is no need for dynamic mappings and pinning callbacks. Also note that in GAUDI we don't have an MMU towards the device memory and the user works on physical addresses. Therefore, the user doesn't pass through the kernel driver to allocate memory there. As a result, only for GAUDI we receive from the user a device memory physical address (instead of a handle) and a size. We check the p2p distance using pci_p2pdma_distance_many() and refusing to map dmabuf in case the distance doesn't allow p2p. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Reviewed-by: Gal Pressman Reviewed-by: Greg Kroah-Hartman Acked-by: Daniel Vetter Signed-off-by: Oded Gabbay --- Changes in v7: - rename structure hl_dmabuf_wrapper to hl_dmabuf_priv - remove change that wasn't relevant to this patch - set orig_nents to 0 at end of alloc_sgt_from_device_pages() to avoid undoing that set in case of error - remove silent ignore of upper 32 bits of handle received from user in export_dmabuf_from_handle - replace all dev_err that triggered by user actions to dev_dbg - change alloc_sgt_from_device_pages() to return pointer to struct sg_table or ERR_PTR - add comment why we use DMA_ATTR_SKIP_CPU_SYNC in dmabuf unmap drivers/misc/habanalabs/Kconfig | 1 + drivers/misc/habanalabs/common/habanalabs.h | 22 + drivers/misc/habanalabs/common/memory.c | 514 +++- drivers/misc/habanalabs/gaudi/gaudi.c | 1 + drivers/misc/habanalabs/goya/goya.c | 1 + 5 files changed, 536 insertions(+), 3 deletions(-) diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig index 293d79811372..c82d2e7b2035 100644 --- a/drivers/misc/habanalabs/Kconfig +++ b/drivers/misc/habanalabs/Kconfig @@ -8,6 +8,7 @@ config HABANA_AI depends on PCI && HAS_IOMEM select GENERIC_ALLOCATOR select HWMON + select DMA_SHARED_BUFFER help Enables PCIe card driver for Habana's AI Processors (AIP) that are designed to accelerate Deep Learning inference and training workloads. diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index f8e23ca18a57..ba5a424a4f3e 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -26,6 +26,7 @@ #include #include #include +#include #define HL_NAME"habanalabs" @@ -1352,6 +1353,23 @@ struct hl_cs_counters_atomic { atomic64_t validation_drop_cnt; }; +/** + * struct hl_dmabuf_priv - a dma-buf private object. + * @dmabuf: pointer to dma-buf object. + * @ctx: pointer to the dma-buf owner's context. + * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported for + *memory allocation handle. + * @device_address: physical address of the device's memory. Relevant only + * if phys_pg_pack is NULL (dma-buf was exported from address). + * The total size can be taken from the dmabuf object. + */ +struct hl_dmabuf_priv { + struct dma_buf *dmabuf; + struct hl_ctx *ctx; + struct hl_vm_phys_pg_pack *phys_pg_pack; + uint64_tdevice_address; +}; + /** * struct hl_ctx - user/kernel context. * @mem_hash: holds mapping from virtual address to virtual memory area @@ -1662,6 +1680,7 @@ struct hl_vm_hw_block_list_node { * @npages: num physical pages in the pack. * @total_size: total size of all the pages in this list. * @mapping_cnt: number of shared mappings. + * @exporting_cnt: number of dma-buf exporting. * @asid: the context related to this list. * @page_size: size of each page in the pack. * @flags: HL_MEM_* flags related to this list. @@ -1676,6 +1695,7 @@ struct hl_vm_phys_pg_pack { u64 npages; u64 total_size; atomic_tmapping_cnt; + u32
[PATCH v7 1/2] habanalabs: define uAPI to export FD for DMA-BUF
User process might want to share the device memory with another driver/device, and to allow it to access it over PCIe (P2P). To enable this, we utilize the dma-buf mechanism and add a dma-buf exporter support, so the other driver can import the device memory and access it. The device memory is allocated using our existing allocation uAPI, where the user will get a handle that represents the allocation. The user will then need to call the new uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter. The driver will return a FD that represents the DMA-BUF object that was created to match that allocation. Signed-off-by: Oded Gabbay Reviewed-by: Tomer Tayar Reviewed-by: Greg Kroah-Hartman Acked-by: Daniel Vetter --- Changes in v7: - Change the type of the fd variable returned from IOCTL to be __s32 include/uapi/misc/habanalabs.h | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h index ccfcb4d188fc..f980fbfb1a0e 100644 --- a/include/uapi/misc/habanalabs.h +++ b/include/uapi/misc/habanalabs.h @@ -959,6 +959,10 @@ union hl_wait_cs_args { #define HL_MEM_OP_UNMAP3 /* Opcode to map a hw block */ #define HL_MEM_OP_MAP_BLOCK4 +/* Opcode to create DMA-BUF object for an existing device memory allocation + * and to export an FD of that DMA-BUF back to the caller + */ +#define HL_MEM_OP_EXPORT_DMABUF_FD 5 /* Memory flags */ #define HL_MEM_CONTIGUOUS 0x1 @@ -1030,11 +1034,26 @@ struct hl_mem_in { /* Virtual address returned from HL_MEM_OP_MAP */ __u64 device_virt_addr; } unmap; + + /* HL_MEM_OP_EXPORT_DMABUF_FD */ + struct { + /* Handle returned from HL_MEM_OP_ALLOC. In Gaudi, +* where we don't have MMU for the device memory, the +* driver expects a physical address (instead of +* a handle) in the device memory space. +*/ + __u64 handle; + /* Size of memory allocation. Relevant only for GAUDI */ + __u64 mem_size; + } export_dmabuf_fd; }; /* HL_MEM_OP_* */ __u32 op; - /* HL_MEM_* flags */ + /* HL_MEM_* flags. +* For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the +* DMA-BUF file/FD flags. +*/ __u32 flags; /* Context ID - Currently not in use */ __u32 ctx_id; @@ -1071,6 +1090,13 @@ struct hl_mem_out { __u32 pad; }; + + /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the +* DMA-BUF object that was created to describe a memory +* allocation on the device's memory space. The FD should be +* passed to the importer driver +*/ + __s32 fd; }; }; -- 2.17.1
[PATCH v7 0/2] Add p2p via dmabuf to habanalabs
Hi, I'm sending v7 after the latest review from Jason. All the changes are detailed in the commit messages. Dave, I'll appreciate if you can also a-b this patchset. Thanks, Oded Oded Gabbay (1): habanalabs: define uAPI to export FD for DMA-BUF Tomer Tayar (1): habanalabs: add support for dma-buf exporter drivers/misc/habanalabs/Kconfig | 1 + drivers/misc/habanalabs/common/habanalabs.h | 22 + drivers/misc/habanalabs/common/memory.c | 514 +++- drivers/misc/habanalabs/gaudi/gaudi.c | 1 + drivers/misc/habanalabs/goya/goya.c | 1 + include/uapi/misc/habanalabs.h | 28 +- 6 files changed, 563 insertions(+), 4 deletions(-) -- 2.17.1