Re: [Bug] VCHIQ functional test broken
On 15/05/2017 15:54, Stefan Wahren wrote: > Am 15.05.2017 um 16:29 schrieb Phil Elwell: >> On 13/05/2017 10:30, Russell King - ARM Linux wrote: >>> On Sat, May 13, 2017 at 11:07:28AM +0200, Stefan Wahren wrote: In the meantime this issue has been fixed by Phil [1]. >>> Right - definitely a driver bug. Mapping more memory for DMA than is >>> actually going to be DMA'd to and expecting data to be preserved is >>> really horrid. >> That feature was added during the upstreaming process, and as Stefan says >> there is an outstanding patch for it. >> Unfortunately i found another issue. If i enable CONFIG_HIGHMEM in the kernel config, the data during functional test gets corrupted. Phil said it's caused by the usage of get_user_pages() [2]. >>> Without knowing who "Phil" is in that thread, but... >>> >>>HIGHMEM is a problem because you can't use get_user_pages on pages in >>>HIGHMEM. >>> >>> is an interesting statement, and without any reasoning or evidence. >>> >>> I also believe it to be incorrect. get_user_pages() returns an array >>> of struct page pointers for the user memory, calling flush_dcache_page() >>> and flush_anon_page() on them to ensure that any kernel mapping is >>> coherent with what is in userspace. >>> >>> As far as returning the array of page pointers, get_user_pages() doesn't >>> care whether they're lowmem or highmem. >>> >>> flush_dcache_page() doesn't care either - if it wants to flush the page >>> and the page is a highmem page, it will temporarily map it before >>> flushing it. >>> >>> flush_anon_page() is a no-op for all non-aliasing caches. >>> >>> get_user_pages() works fine for whatever memory on other platforms and >>> drivers such as etnaviv, so I think this comes down to the vchip driver >>> doing things in ways that the kernel interfaces its using don't expect - >>> exactly like the "lets pass full pages to the DMA API" broken-ness. >> See previous comment. >> >>> I would like to hear the justification for that statement, but without >>> any justification, I assert that the statement is false. >> I am the Phil in question, and the off-the-cuff comment was the result of >> a hazy memory of issues encountered with VCHIQ bulk transfers as a Broadcom >> employee (which would have been on a 2.6 kernel). I suspect there may have >> been some use of kernel virtual addresses as an intermediate representation, >> but I no longer have access to that code. >> >> If get_user_pages is HIGHMEM-safe (and I can see why it would be), then the >> cause of the corruption Stefan saw is probably the special handling of >> unaligned reads, specifically: >> >> memcpy((char *)page_address(pages[0]) + >> pagelist->offset, >> fragments, >> head_bytes); > > > Btw shouldn't we use copy_from_user() at this place? I'm sure you mean copy_to_user(), and the answer is "it's complicated". Depending on the relative timing, this code can also be called from a kernel thread, so I doubt that would work. Phil ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
Am 15.05.2017 um 16:29 schrieb Phil Elwell: > On 13/05/2017 10:30, Russell King - ARM Linux wrote: >> On Sat, May 13, 2017 at 11:07:28AM +0200, Stefan Wahren wrote: >>> In the meantime this issue has been fixed by Phil [1]. >> Right - definitely a driver bug. Mapping more memory for DMA than is >> actually going to be DMA'd to and expecting data to be preserved is >> really horrid. > That feature was added during the upstreaming process, and as Stefan says > there is an outstanding patch for it. > >>> Unfortunately i found another issue. If i enable CONFIG_HIGHMEM in >>> the kernel config, the data during functional test gets corrupted. >>> Phil said it's caused by the usage of get_user_pages() [2]. >> Without knowing who "Phil" is in that thread, but... >> >>HIGHMEM is a problem because you can't use get_user_pages on pages in >>HIGHMEM. >> >> is an interesting statement, and without any reasoning or evidence. >> >> I also believe it to be incorrect. get_user_pages() returns an array >> of struct page pointers for the user memory, calling flush_dcache_page() >> and flush_anon_page() on them to ensure that any kernel mapping is >> coherent with what is in userspace. >> >> As far as returning the array of page pointers, get_user_pages() doesn't >> care whether they're lowmem or highmem. >> >> flush_dcache_page() doesn't care either - if it wants to flush the page >> and the page is a highmem page, it will temporarily map it before >> flushing it. >> >> flush_anon_page() is a no-op for all non-aliasing caches. >> >> get_user_pages() works fine for whatever memory on other platforms and >> drivers such as etnaviv, so I think this comes down to the vchip driver >> doing things in ways that the kernel interfaces its using don't expect - >> exactly like the "lets pass full pages to the DMA API" broken-ness. > See previous comment. > >> I would like to hear the justification for that statement, but without >> any justification, I assert that the statement is false. > I am the Phil in question, and the off-the-cuff comment was the result of > a hazy memory of issues encountered with VCHIQ bulk transfers as a Broadcom > employee (which would have been on a 2.6 kernel). I suspect there may have > been some use of kernel virtual addresses as an intermediate representation, > but I no longer have access to that code. > > If get_user_pages is HIGHMEM-safe (and I can see why it would be), then the > cause of the corruption Stefan saw is probably the special handling of > unaligned reads, specifically: > > memcpy((char *)page_address(pages[0]) + > pagelist->offset, > fragments, > head_bytes); Btw shouldn't we use copy_from_user() at this place? ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On 13/05/2017 10:30, Russell King - ARM Linux wrote: > On Sat, May 13, 2017 at 11:07:28AM +0200, Stefan Wahren wrote: >> In the meantime this issue has been fixed by Phil [1]. > > Right - definitely a driver bug. Mapping more memory for DMA than is > actually going to be DMA'd to and expecting data to be preserved is > really horrid. That feature was added during the upstreaming process, and as Stefan says there is an outstanding patch for it. >> Unfortunately i found another issue. If i enable CONFIG_HIGHMEM in >> the kernel config, the data during functional test gets corrupted. >> Phil said it's caused by the usage of get_user_pages() [2]. > > Without knowing who "Phil" is in that thread, but... > >HIGHMEM is a problem because you can't use get_user_pages on pages in >HIGHMEM. > > is an interesting statement, and without any reasoning or evidence. > > I also believe it to be incorrect. get_user_pages() returns an array > of struct page pointers for the user memory, calling flush_dcache_page() > and flush_anon_page() on them to ensure that any kernel mapping is > coherent with what is in userspace. > > As far as returning the array of page pointers, get_user_pages() doesn't > care whether they're lowmem or highmem. > > flush_dcache_page() doesn't care either - if it wants to flush the page > and the page is a highmem page, it will temporarily map it before > flushing it. > > flush_anon_page() is a no-op for all non-aliasing caches. > > get_user_pages() works fine for whatever memory on other platforms and > drivers such as etnaviv, so I think this comes down to the vchip driver > doing things in ways that the kernel interfaces its using don't expect - > exactly like the "lets pass full pages to the DMA API" broken-ness. See previous comment. > I would like to hear the justification for that statement, but without > any justification, I assert that the statement is false. I am the Phil in question, and the off-the-cuff comment was the result of a hazy memory of issues encountered with VCHIQ bulk transfers as a Broadcom employee (which would have been on a 2.6 kernel). I suspect there may have been some use of kernel virtual addresses as an intermediate representation, but I no longer have access to that code. If get_user_pages is HIGHMEM-safe (and I can see why it would be), then the cause of the corruption Stefan saw is probably the special handling of unaligned reads, specifically: memcpy((char *)page_address(pages[0]) + pagelist->offset, fragments, head_bytes); Phil ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Sat, May 13, 2017 at 11:07:28AM +0200, Stefan Wahren wrote: > In the meantime this issue has been fixed by Phil [1]. Right - definitely a driver bug. Mapping more memory for DMA than is actually going to be DMA'd to and expecting data to be preserved is really horrid. > Unfortunately i found another issue. If i enable CONFIG_HIGHMEM in > the kernel config, the data during functional test gets corrupted. > Phil said it's caused by the usage of get_user_pages() [2]. Without knowing who "Phil" is in that thread, but... HIGHMEM is a problem because you can't use get_user_pages on pages in HIGHMEM. is an interesting statement, and without any reasoning or evidence. I also believe it to be incorrect. get_user_pages() returns an array of struct page pointers for the user memory, calling flush_dcache_page() and flush_anon_page() on them to ensure that any kernel mapping is coherent with what is in userspace. As far as returning the array of page pointers, get_user_pages() doesn't care whether they're lowmem or highmem. flush_dcache_page() doesn't care either - if it wants to flush the page and the page is a highmem page, it will temporarily map it before flushing it. flush_anon_page() is a no-op for all non-aliasing caches. get_user_pages() works fine for whatever memory on other platforms and drivers such as etnaviv, so I think this comes down to the vchip driver doing things in ways that the kernel interfaces its using don't expect - exactly like the "lets pass full pages to the DMA API" broken-ness. I would like to hear the justification for that statement, but without any justification, I assert that the statement is false. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
> Stefan Wahrenhat am 24. April 2017 um 21:35 > geschrieben: > > > > > Russell King - ARM Linux hat am 24. April 2017 um > > 20:59 geschrieben: > > > > > > On Mon, Apr 24, 2017 at 07:42:27PM +0200, Stefan Wahren wrote: > > > > What I can't see is how changing flush_dcache_page() has possibly broken > > > > this: it seems to imply that you're getting flush_dcache_page() somehow > > > > called inbetween mapping it for DMA, using it for DMA and CPU accesses > > > > occuring. However, the driver doesn't call flush_dcache_page() other > > > > than through get_user_pages() - and since dma_map_sg() is used in that > > > > path, it should be fine. > > > > > > > > So, I don't have much to offer. > > > > > > > > However, flush_dcache_page() is probably still a tad heavy - it > > > > currently > > > > flushes to the point of coherency, but it's really about making sure > > > > that > > > > the user vs kernel mappings are coherent, not about device coherency. > > > > That's the role of the DMA API. > > > > > > Currently i don't care too much about performance. More important is to > > > fix this regression, because i'm not able to verify the function of this > > > driver efficiently. > > > > This is a driver in staging, and the reason its in staging is because it > > has problems. This is just another example of another problem it has... > > Yes, the regression is regrettable, but I don't think it's something to > > get hung up on. > > > > > I'm thinking about 2 options: > > > > > > 1) add a force parameter to flush_dcache_page() which make it possible > > >to override the new logic > > > 2) create a second version of flush_dcache_page() with the old behavior > > > > Neither, really, because, as I've already explained, flush_dcache_page() > > has nothing what so ever to do with DMA coherence - and if a driver > > breaks because we change its semantic slightly (but still in a compliant > > way) then it's uncovered a latent bug in _that_ driver. > > > > Rather than trying to "fix" flush_dcache_page() to work how this driver > > wants it to (which is a totally crazy thing to propose - we can't go > > shoe-horning driver specific behaviour into these generic flushing > > functions), it would be much better to work out why the driver has > > broken. > > I totally agree. I had the hope we could make a workaround within the driver > until we found the real issue. > > > > > I see the kernel driver has its own logging (using vchiq_log_info() etc?) > > maybe a starting point would be to compare the output from a working > > kernel with a non-working kernel to get more of an idea where the problem > > actually is? > > I will try ... > > > > > What I'm willing to do is to temporarily drop the offending patch for > > the next merge window to give more time for this driver to be debugged, > > but I will want to re-apply it after that merge window closes. > > Since this is already in 4.11, please keep it. At least this makes it easier > to reproduce. No need for more confusion and effort. > In the meantime this issue has been fixed by Phil [1]. Unfortunately i found another issue. If i enable CONFIG_HIGHMEM in the kernel config, the data during functional test gets corrupted. Phil said it's caused by the usage of get_user_pages() [2]. That makes me wonder, because there are a lot of gup users and it's hard to believe that they are also affected. Would that be hard to fix? In that case i tend to fix at least the dependency: diff --git a/drivers/staging/vc04_services/Kconfig b/drivers/staging/vc04_servic index 9e27636..6572903 100644 --- a/drivers/staging/vc04_services/Kconfig +++ b/drivers/staging/vc04_services/Kconfig @@ -2,7 +2,7 @@ menuconfig BCM_VIDEOCORE tristate "Broadcom VideoCore support" depends on HAS_DMA depends on OF - depends on RASPBERRYPI_FIRMWARE || (COMPILE_TEST && !RASPBERRYPI_FIRMWAR + depends on (RASPBERRYPI_FIRMWARE && !HIGHMEM) || (COMPILE_TEST && !RASPB default y help Support for Broadcom VideoCore services including Thanks Stefan [1] - http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-May/105443.html [2] - https://github.com/raspberrypi/linux/issues/1996 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
> Russell King - ARM Linuxhat am 24. April 2017 um > 20:59 geschrieben: > > > On Mon, Apr 24, 2017 at 07:42:27PM +0200, Stefan Wahren wrote: > > > What I can't see is how changing flush_dcache_page() has possibly broken > > > this: it seems to imply that you're getting flush_dcache_page() somehow > > > called inbetween mapping it for DMA, using it for DMA and CPU accesses > > > occuring. However, the driver doesn't call flush_dcache_page() other > > > than through get_user_pages() - and since dma_map_sg() is used in that > > > path, it should be fine. > > > > > > So, I don't have much to offer. > > > > > > However, flush_dcache_page() is probably still a tad heavy - it currently > > > flushes to the point of coherency, but it's really about making sure that > > > the user vs kernel mappings are coherent, not about device coherency. > > > That's the role of the DMA API. > > > > Currently i don't care too much about performance. More important is to > > fix this regression, because i'm not able to verify the function of this > > driver efficiently. > > This is a driver in staging, and the reason its in staging is because it > has problems. This is just another example of another problem it has... > Yes, the regression is regrettable, but I don't think it's something to > get hung up on. > > > I'm thinking about 2 options: > > > > 1) add a force parameter to flush_dcache_page() which make it possible > >to override the new logic > > 2) create a second version of flush_dcache_page() with the old behavior > > Neither, really, because, as I've already explained, flush_dcache_page() > has nothing what so ever to do with DMA coherence - and if a driver > breaks because we change its semantic slightly (but still in a compliant > way) then it's uncovered a latent bug in _that_ driver. > > Rather than trying to "fix" flush_dcache_page() to work how this driver > wants it to (which is a totally crazy thing to propose - we can't go > shoe-horning driver specific behaviour into these generic flushing > functions), it would be much better to work out why the driver has > broken. I totally agree. I had the hope we could make a workaround within the driver until we found the real issue. > > I see the kernel driver has its own logging (using vchiq_log_info() etc?) > maybe a starting point would be to compare the output from a working > kernel with a non-working kernel to get more of an idea where the problem > actually is? I will try ... > > What I'm willing to do is to temporarily drop the offending patch for > the next merge window to give more time for this driver to be debugged, > but I will want to re-apply it after that merge window closes. Since this is already in 4.11, please keep it. At least this makes it easier to reproduce. No need for more confusion and effort. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Mon, Apr 24, 2017 at 07:42:27PM +0200, Stefan Wahren wrote: > > What I can't see is how changing flush_dcache_page() has possibly broken > > this: it seems to imply that you're getting flush_dcache_page() somehow > > called inbetween mapping it for DMA, using it for DMA and CPU accesses > > occuring. However, the driver doesn't call flush_dcache_page() other > > than through get_user_pages() - and since dma_map_sg() is used in that > > path, it should be fine. > > > > So, I don't have much to offer. > > > > However, flush_dcache_page() is probably still a tad heavy - it currently > > flushes to the point of coherency, but it's really about making sure that > > the user vs kernel mappings are coherent, not about device coherency. > > That's the role of the DMA API. > > Currently i don't care too much about performance. More important is to > fix this regression, because i'm not able to verify the function of this > driver efficiently. This is a driver in staging, and the reason its in staging is because it has problems. This is just another example of another problem it has... Yes, the regression is regrettable, but I don't think it's something to get hung up on. > I'm thinking about 2 options: > > 1) add a force parameter to flush_dcache_page() which make it possible >to override the new logic > 2) create a second version of flush_dcache_page() with the old behavior Neither, really, because, as I've already explained, flush_dcache_page() has nothing what so ever to do with DMA coherence - and if a driver breaks because we change its semantic slightly (but still in a compliant way) then it's uncovered a latent bug in _that_ driver. Rather than trying to "fix" flush_dcache_page() to work how this driver wants it to (which is a totally crazy thing to propose - we can't go shoe-horning driver specific behaviour into these generic flushing functions), it would be much better to work out why the driver has broken. I see the kernel driver has its own logging (using vchiq_log_info() etc?) maybe a starting point would be to compare the output from a working kernel with a non-working kernel to get more of an idea where the problem actually is? What I'm willing to do is to temporarily drop the offending patch for the next merge window to give more time for this driver to be debugged, but I will want to re-apply it after that merge window closes. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
> Russell King - ARM Linuxhat am 24. April 2017 um > 18:40 geschrieben: > > > On Mon, Apr 24, 2017 at 06:12:09PM +0200, Stefan Wahren wrote: > > Am 20.04.2017 um 21:58 schrieb Rabin Vincent: > > > On Thu, Apr 20, 2017 at 11:27:38AM -0700, Eric Anholt wrote: > > >> I'm confused by what you're saying here. The driver has already been > > >> converted to not use dmac_map_area (commit > > >> cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, > > >> matching the radeon driver you give as a model as far as I can see. > > >> That commit is in v4.11-rc6 from Stefan's regression report. > > > Right. Sorry. I must have had an old tag checked out when I looked at > > > the driver earlier. The DMA API usage in the driver in v4.11-rc6 and > > > current master looks fine, except for one thing: > > > > > > The flush in flush_dcache_page() (from get_user_pages()) was done with a > > > v6_flush_kern_dcache_page() which always did a clean+invalidate while > > > the DMA API only does what is required by the direction, which is only a > > > invalidate for DMA_FROM_DEVICE. Since the driver calls dma_from_sg() on > > > the entire page, even if userspace sent in an offset into the page, > > > unrelated data in userspace may be thrown away. > > > > > > Does changing the dma API calls to always use DMA_BIDIRECTIONAL make the > > > test pass? > > > > Unfortunately not (at least not that simple). > > > > Do we need special DMA mapping attributes here ? Or a dma_sync_sg_for_* ? > > I had a look at the driver when you first reported the problem, but I > don't see an obvious problem. > > In drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c, I > see it using get_user_pages(), generating a scatterlist, which it then > uses dma_map_sg() with. It then goes on to populate the DMA coherent > buffer that it allocated with the DMA addresses of these buffers. > > The tear-down looks sane too - free_pagelist() looks like it correctly > unmaps the scatterlist, marks the pages dirty if necessary, puts the > pages and frees the DMA coherent memory. > > Since you're running on a PIPT cache, the only possible issue is whether > there's a lifetime mismatch between what happens here, and what you're > doing with the pages in userspace. All the rules wrt the DMA API apply > to these userspace pages, just as these same rules apply in kernel space. > Once you have called dma_map_sg() on them, any CPU access (whether > explicit or speculative) will cause cache lines to be populated, and > possibly marked dirty, which can have the effect of corrupting the data > unless it is unmapped prior to the accesses you care about. Just for the records, the link to the userspace program: https://github.com/raspberrypi/userland/blob/master/interface/vchiq_arm/vchiq_test.c Maybe there is an issue in the ioctl > > What I can't see is how changing flush_dcache_page() has possibly broken > this: it seems to imply that you're getting flush_dcache_page() somehow > called inbetween mapping it for DMA, using it for DMA and CPU accesses > occuring. However, the driver doesn't call flush_dcache_page() other > than through get_user_pages() - and since dma_map_sg() is used in that > path, it should be fine. > > So, I don't have much to offer. > > However, flush_dcache_page() is probably still a tad heavy - it currently > flushes to the point of coherency, but it's really about making sure that > the user vs kernel mappings are coherent, not about device coherency. > That's the role of the DMA API. Currently i don't care too much about performance. More important is to fix this regression, because i'm not able to verify the function of this driver efficiently. I'm thinking about 2 options: 1) add a force parameter to flush_dcache_page() which make it possible to override the new logic 2) create a second version of flush_dcache_page() with the old behavior But first of all i need to figure out which parts of flush_dcache_page() are really necessary. Many thanks anyway > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Mon, Apr 24, 2017 at 06:12:09PM +0200, Stefan Wahren wrote: > Am 20.04.2017 um 21:58 schrieb Rabin Vincent: > > On Thu, Apr 20, 2017 at 11:27:38AM -0700, Eric Anholt wrote: > >> I'm confused by what you're saying here. The driver has already been > >> converted to not use dmac_map_area (commit > >> cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, > >> matching the radeon driver you give as a model as far as I can see. > >> That commit is in v4.11-rc6 from Stefan's regression report. > > Right. Sorry. I must have had an old tag checked out when I looked at > > the driver earlier. The DMA API usage in the driver in v4.11-rc6 and > > current master looks fine, except for one thing: > > > > The flush in flush_dcache_page() (from get_user_pages()) was done with a > > v6_flush_kern_dcache_page() which always did a clean+invalidate while > > the DMA API only does what is required by the direction, which is only a > > invalidate for DMA_FROM_DEVICE. Since the driver calls dma_from_sg() on > > the entire page, even if userspace sent in an offset into the page, > > unrelated data in userspace may be thrown away. > > > > Does changing the dma API calls to always use DMA_BIDIRECTIONAL make the > > test pass? > > Unfortunately not (at least not that simple). > > Do we need special DMA mapping attributes here ? Or a dma_sync_sg_for_* ? I had a look at the driver when you first reported the problem, but I don't see an obvious problem. In drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c, I see it using get_user_pages(), generating a scatterlist, which it then uses dma_map_sg() with. It then goes on to populate the DMA coherent buffer that it allocated with the DMA addresses of these buffers. The tear-down looks sane too - free_pagelist() looks like it correctly unmaps the scatterlist, marks the pages dirty if necessary, puts the pages and frees the DMA coherent memory. Since you're running on a PIPT cache, the only possible issue is whether there's a lifetime mismatch between what happens here, and what you're doing with the pages in userspace. All the rules wrt the DMA API apply to these userspace pages, just as these same rules apply in kernel space. Once you have called dma_map_sg() on them, any CPU access (whether explicit or speculative) will cause cache lines to be populated, and possibly marked dirty, which can have the effect of corrupting the data unless it is unmapped prior to the accesses you care about. What I can't see is how changing flush_dcache_page() has possibly broken this: it seems to imply that you're getting flush_dcache_page() somehow called inbetween mapping it for DMA, using it for DMA and CPU accesses occuring. However, the driver doesn't call flush_dcache_page() other than through get_user_pages() - and since dma_map_sg() is used in that path, it should be fine. So, I don't have much to offer. However, flush_dcache_page() is probably still a tad heavy - it currently flushes to the point of coherency, but it's really about making sure that the user vs kernel mappings are coherent, not about device coherency. That's the role of the DMA API. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
Am 20.04.2017 um 21:58 schrieb Rabin Vincent: > On Thu, Apr 20, 2017 at 11:27:38AM -0700, Eric Anholt wrote: >> I'm confused by what you're saying here. The driver has already been >> converted to not use dmac_map_area (commit >> cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, >> matching the radeon driver you give as a model as far as I can see. >> That commit is in v4.11-rc6 from Stefan's regression report. > Right. Sorry. I must have had an old tag checked out when I looked at > the driver earlier. The DMA API usage in the driver in v4.11-rc6 and > current master looks fine, except for one thing: > > The flush in flush_dcache_page() (from get_user_pages()) was done with a > v6_flush_kern_dcache_page() which always did a clean+invalidate while > the DMA API only does what is required by the direction, which is only a > invalidate for DMA_FROM_DEVICE. Since the driver calls dma_from_sg() on > the entire page, even if userspace sent in an offset into the page, > unrelated data in userspace may be thrown away. > > Does changing the dma API calls to always use DMA_BIDIRECTIONAL make the > test pass? Unfortunately not (at least not that simple). Do we need special DMA mapping attributes here ? Or a dma_sync_sg_for_* ? ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
Rabin Vincentwrites: > On Thu, Apr 20, 2017 at 11:27:38AM -0700, Eric Anholt wrote: >> I'm confused by what you're saying here. The driver has already been >> converted to not use dmac_map_area (commit >> cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, >> matching the radeon driver you give as a model as far as I can see. >> That commit is in v4.11-rc6 from Stefan's regression report. > > Right. Sorry. I must have had an old tag checked out when I looked at > the driver earlier. The DMA API usage in the driver in v4.11-rc6 and > current master looks fine, except for one thing: > > The flush in flush_dcache_page() (from get_user_pages()) was done with a > v6_flush_kern_dcache_page() which always did a clean+invalidate while > the DMA API only does what is required by the direction, which is only a > invalidate for DMA_FROM_DEVICE. Since the driver calls dma_from_sg() on > the entire page, even if userspace sent in an offset into the page, > unrelated data in userspace may be thrown away. > > Does changing the dma API calls to always use DMA_BIDIRECTIONAL make the > test pass? Oh, that's a neat explanation for what might be wrong, and there seem to be tests trying to poke at that within the functional test code. Hopefully Stefan can try that out. signature.asc Description: PGP signature ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Thu, Apr 20, 2017 at 11:27:38AM -0700, Eric Anholt wrote: > I'm confused by what you're saying here. The driver has already been > converted to not use dmac_map_area (commit > cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, > matching the radeon driver you give as a model as far as I can see. > That commit is in v4.11-rc6 from Stefan's regression report. Right. Sorry. I must have had an old tag checked out when I looked at the driver earlier. The DMA API usage in the driver in v4.11-rc6 and current master looks fine, except for one thing: The flush in flush_dcache_page() (from get_user_pages()) was done with a v6_flush_kern_dcache_page() which always did a clean+invalidate while the DMA API only does what is required by the direction, which is only a invalidate for DMA_FROM_DEVICE. Since the driver calls dma_from_sg() on the entire page, even if userspace sent in an offset into the page, unrelated data in userspace may be thrown away. Does changing the dma API calls to always use DMA_BIDIRECTIONAL make the test pass? ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
Rabin Vincentwrites: > On Thu, Apr 13, 2017 at 11:29:15PM +0100, Russell King - ARM Linux wrote: >> > 00a19f3e25c0c40e0ec77f52d4841d23ad269169 is the first bad commit >> > commit 00a19f3e25c0c40e0ec77f52d4841d23ad269169 >> > Author: Rabin Vincent >> > Date: Tue Nov 8 09:21:19 2016 +0100 >> > >> > ARM: 8627/1: avoid cache flushing in flush_dcache_page() >> > >> > When the data cache is PIPT or VIPT non-aliasing, and cache operations >> > are broadcast by the hardware, we can always postpone the flush in >> > flush_dcache_page(). A similar change was done for ARM64 in commit >> > b5b6c9e9149d ("arm64: Avoid cache flushing in flush_dcache_page()"). >> > >> > Reviewed-by: Catalin Marinas >> > Signed-off-by: Rabin Vincent >> > Signed-off-by: Russell King >> > >> > It seems that staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm >> > relies on the behavior of flush_dcache_page before this patch get >> > applied. Any advices to fix this issues are appreciated. >> >> Any ideas why this causes a problem for this driver? From what I can see, >> it doesn't make use of flush_dcache_page(). > > The driver's create_pagelist() uses get_free_pages(), and > get_free_pages() calls flush_dcache_page(). > > The problem is that the driver fails to flush the pages which it > acquires via get_free_pages(). It's clear that the driver needs to do > it, since there is a flush in the is_vmalloc_addr() path in the same > function. The driver probably worked earlier because of the unecessary > flush in flush_dcache_page() which existed before this patch, but the > purpose of that flush was not DMA coherency and it was never guaranteed > that it would flush all the way to the point that devices could see the > data. > > See radeon_ttm_tt_pin_userptr() in drivers/gpu/drm/radeon/radeon_ttm.c > for an example of how a driver can ensure cache coherency using the DMA > mapping API if it intends to DMA from/to pages acquired by > get_free_pages(). > > The rest of the driver should also be converted to the DMA mapping API > instead of abusing the API's private functions (dmac_map_area etc.) I'm confused by what you're saying here. The driver has already been converted to not use dmac_map_area (commit cf9caf1929882b66922aee698e99e6c8f357bee5), and uses dma_map_sg instead, matching the radeon driver you give as a model as far as I can see. That commit is in v4.11-rc6 from Stefan's regression report. signature.asc Description: PGP signature ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
Hi Rabin, > Rabin Vincenthat am 14. April 2017 um 09:41 geschrieben: > > > On Thu, Apr 13, 2017 at 11:29:15PM +0100, Russell King - ARM Linux wrote: > > > 00a19f3e25c0c40e0ec77f52d4841d23ad269169 is the first bad commit > > > commit 00a19f3e25c0c40e0ec77f52d4841d23ad269169 > > > Author: Rabin Vincent > > > Date: Tue Nov 8 09:21:19 2016 +0100 > > > > > > ARM: 8627/1: avoid cache flushing in flush_dcache_page() > > > > > > When the data cache is PIPT or VIPT non-aliasing, and cache operations > > > are broadcast by the hardware, we can always postpone the flush in > > > flush_dcache_page(). A similar change was done for ARM64 in commit > > > b5b6c9e9149d ("arm64: Avoid cache flushing in flush_dcache_page()"). > > > > > > Reviewed-by: Catalin Marinas > > > Signed-off-by: Rabin Vincent > > > Signed-off-by: Russell King > > > > > > It seems that staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm > > > relies on the behavior of flush_dcache_page before this patch get > > > applied. Any advices to fix this issues are appreciated. > > > > Any ideas why this causes a problem for this driver? From what I can see, > > it doesn't make use of flush_dcache_page(). > > The driver's create_pagelist() uses get_free_pages(), and > get_free_pages() calls flush_dcache_page(). i think you mean get_user_pages(). > > The problem is that the driver fails to flush the pages which it > acquires via get_free_pages(). It's clear that the driver needs to do > it, since there is a flush in the is_vmalloc_addr() path in the same > function. The driver probably worked earlier because of the unecessary > flush in flush_dcache_page() which existed before this patch, but the > purpose of that flush was not DMA coherency and it was never guaranteed > that it would flush all the way to the point that devices could see the > data. > > See radeon_ttm_tt_pin_userptr() in drivers/gpu/drm/radeon/radeon_ttm.c > for an example of how a driver can ensure cache coherency using the DMA > mapping API if it intends to DMA from/to pages acquired by > get_free_pages(). Thanks for your explanation and the example, but i can't see the relevant difference. So i think i'm not the right person to fix this issue, but i will test any patches regarding to this issue. > > The rest of the driver should also be converted to the DMA mapping API > instead of abusing the API's private functions (dmac_map_area etc.) I will add this to the TODO file of the driver. Regards Stefan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Thu, Apr 13, 2017 at 11:29:15PM +0100, Russell King - ARM Linux wrote: > > 00a19f3e25c0c40e0ec77f52d4841d23ad269169 is the first bad commit > > commit 00a19f3e25c0c40e0ec77f52d4841d23ad269169 > > Author: Rabin Vincent> > Date: Tue Nov 8 09:21:19 2016 +0100 > > > > ARM: 8627/1: avoid cache flushing in flush_dcache_page() > > > > When the data cache is PIPT or VIPT non-aliasing, and cache operations > > are broadcast by the hardware, we can always postpone the flush in > > flush_dcache_page(). A similar change was done for ARM64 in commit > > b5b6c9e9149d ("arm64: Avoid cache flushing in flush_dcache_page()"). > > > > Reviewed-by: Catalin Marinas > > Signed-off-by: Rabin Vincent > > Signed-off-by: Russell King > > > > It seems that staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm > > relies on the behavior of flush_dcache_page before this patch get > > applied. Any advices to fix this issues are appreciated. > > Any ideas why this causes a problem for this driver? From what I can see, > it doesn't make use of flush_dcache_page(). The driver's create_pagelist() uses get_free_pages(), and get_free_pages() calls flush_dcache_page(). The problem is that the driver fails to flush the pages which it acquires via get_free_pages(). It's clear that the driver needs to do it, since there is a flush in the is_vmalloc_addr() path in the same function. The driver probably worked earlier because of the unecessary flush in flush_dcache_page() which existed before this patch, but the purpose of that flush was not DMA coherency and it was never guaranteed that it would flush all the way to the point that devices could see the data. See radeon_ttm_tt_pin_userptr() in drivers/gpu/drm/radeon/radeon_ttm.c for an example of how a driver can ensure cache coherency using the DMA mapping API if it intends to DMA from/to pages acquired by get_free_pages(). The rest of the driver should also be converted to the DMA mapping API instead of abusing the API's private functions (dmac_map_area etc.) ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
On Thu, Apr 13, 2017 at 07:41:48PM +0200, Stefan Wahren wrote: > > Stefan Wahrenhat am 11. April 2017 um 20:10 > > geschrieben: > > > > > > Hi, > > > > recently i found that vchiq_test -f doesn't work anymore with current > > mainline (4.11-rc6) and linux-next (20170404) on my Raspberry Pi Zero. The > > issue is always reproducible, but the error behavior isn't deterministic. > > Sometimes vchiq_test hangs and sometimes i get an error message from > > vchiq_test like this (but never errors from the kernel): > > > > $ vchiq_test -f 10 > > Functional test - iters:10 > > iteration 1 > > vchiq_test: 1502: expected callback reason VCHIQ_MESSAGE_AVAILABLE, got 1 > > vchiq_test: 1524: expected callback reason VCHIQ_BULK_TRANSMIT_DONE, got 5 > > vchiq_test: 863: func_error != 0 > > > > I didn't had the time to bisect, but at least 4.10 is safe. > > > > Okay, i've bisected this regression to this commit: > > 00a19f3e25c0c40e0ec77f52d4841d23ad269169 is the first bad commit > commit 00a19f3e25c0c40e0ec77f52d4841d23ad269169 > Author: Rabin Vincent > Date: Tue Nov 8 09:21:19 2016 +0100 > > ARM: 8627/1: avoid cache flushing in flush_dcache_page() > > When the data cache is PIPT or VIPT non-aliasing, and cache operations > are broadcast by the hardware, we can always postpone the flush in > flush_dcache_page(). A similar change was done for ARM64 in commit > b5b6c9e9149d ("arm64: Avoid cache flushing in flush_dcache_page()"). > > Reviewed-by: Catalin Marinas > Signed-off-by: Rabin Vincent > Signed-off-by: Russell King > > It seems that staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm > relies on the behavior of flush_dcache_page before this patch get > applied. Any advices to fix this issues are appreciated. Any ideas why this causes a problem for this driver? From what I can see, it doesn't make use of flush_dcache_page(). -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [Bug] VCHIQ functional test broken
> Stefan Wahrenhat am 11. April 2017 um 20:10 > geschrieben: > > > Hi, > > recently i found that vchiq_test -f doesn't work anymore with current > mainline (4.11-rc6) and linux-next (20170404) on my Raspberry Pi Zero. The > issue is always reproducible, but the error behavior isn't deterministic. > Sometimes vchiq_test hangs and sometimes i get an error message from > vchiq_test like this (but never errors from the kernel): > > $ vchiq_test -f 10 > Functional test - iters:10 > iteration 1 > vchiq_test: 1502: expected callback reason VCHIQ_MESSAGE_AVAILABLE, got 1 > vchiq_test: 1524: expected callback reason VCHIQ_BULK_TRANSMIT_DONE, got 5 > vchiq_test: 863: func_error != 0 > > I didn't had the time to bisect, but at least 4.10 is safe. > Okay, i've bisected this regression to this commit: 00a19f3e25c0c40e0ec77f52d4841d23ad269169 is the first bad commit commit 00a19f3e25c0c40e0ec77f52d4841d23ad269169 Author: Rabin Vincent Date: Tue Nov 8 09:21:19 2016 +0100 ARM: 8627/1: avoid cache flushing in flush_dcache_page() When the data cache is PIPT or VIPT non-aliasing, and cache operations are broadcast by the hardware, we can always postpone the flush in flush_dcache_page(). A similar change was done for ARM64 in commit b5b6c9e9149d ("arm64: Avoid cache flushing in flush_dcache_page()"). Reviewed-by: Catalin Marinas Signed-off-by: Rabin Vincent Signed-off-by: Russell King It seems that staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm relies on the behavior of flush_dcache_page before this patch get applied. Any advices to fix this issues are appreciated. Regards Stefan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel