Re: [PATCH] vt_buffer: drop console buffer copying optimisations
On Thu, Jan 29, 2015 at 03:40:33PM -0800, Linus Torvalds wrote: > On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie wrote: > > > > Linus, this came up a while back I finally got some confirmation > > that it fixes those servers. > > I'm certainly ok with this. which way should it go in? The users are: > > - drivers/tty/vt/vt.c (Greg KH, "tty layer") > > - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) > > and it might make sense to have *some* indication of how much worse > this makes fbcon performance in particular.. > > Greg/Tomi - the patch is removing this: > > #define scr_memcpyw(d, s, c) memcpy(d, s, c) > #define scr_memmovew(d, s, c) memmove(d, s, c) > #define VT_BUF_HAVE_MEMCPYW > #define VT_BUF_HAVE_MEMMOVEW > > from , because some stupid graphics cards > apparently cannot handle 64-bit accesses of regular memcpy/memmove. > > And on other setups, this will be the reverse: 8-bit accesses due to > using "rep movsb", which is the fast way to move/clear memory on > modern Intel CPU's, but is really wrong for MMIO where it will be slow > as hell. > > So just getting rid of the memcpy/memmove is likely the right thing in > general, since the fallbacks go this the traditional 16-bit-at-a-time > way. And getting rid of the memcpy _may_ speed things up. > > But if it slows things down, we might have to try something else. Like > saying "all cards we've ever seen have been ok with aligned 32-bit > accesses", and extend the open-coded scr_memcpy/memmove functions to > do that. > > Hmm? I can take this through the tty tree, but can I put it in linux-next and wait for the 3.20 merge window to give people who might notice a slow-down a chance to object? thanks, greg k-h -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] vt_buffer: drop console buffer copying optimisations
On 01/28/2015 11:11 PM, Dave Airlie wrote: > These two copy to/from VGA memory, however on the Silicon > Motion SMI750 VGA card on a 64-bit system cause console corruption. > > This is due to the hw being buggy and not handling a 64-bit transaction > correctly. > > We could try and create a 32-bit version of these routines, > but I'm not sure the optimisation is worth much today. > > Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826 Restricted link. > Tested-by: Huawei engineering. > Signed-off-by: Dave Airlie > --- > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. > > include/linux/vt_buffer.h | 4 > 1 file changed, 4 deletions(-) > > diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h > index 057db7d..f38c10b 100644 > --- a/include/linux/vt_buffer.h > +++ b/include/linux/vt_buffer.h > @@ -21,10 +21,6 @@ > #ifndef VT_BUF_HAVE_RW > #define scr_writew(val, addr) (*(addr) = (val)) > #define scr_readw(addr) (*(addr)) > -#define scr_memcpyw(d, s, c) memcpy(d, s, c) > -#define scr_memmovew(d, s, c) memmove(d, s, c) > -#define VT_BUF_HAVE_MEMCPYW > -#define VT_BUF_HAVE_MEMMOVEW > #endif > > #ifndef VT_BUF_HAVE_MEMSETW > -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] vt_buffer: drop console buffer copying optimisations
On 30 January 2015 at 10:03, Linus Torvalds wrote: > On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman > wrote: >> >> I can take this through the tty tree, but can I put it in linux-next and >> wait for the 3.20 merge window to give people who might notice a >> slow-down a chance to object? > > Yes. The problem only affects one (or a couple of) truly outrageously > bad graphics cards that are only used in servers (because they are > such crap that they wouldn't be acceptable anywhere else anyway), and > they have afaik never worked with 64-bit kernels, so it's not even a > regression. > > So it's worth fixing because it's a real - albeit very rare - problem > (especially since the enhanched rep instruction model of memcpy could > easily be *worse* than the 16-bit-at-a-time manual version), but I > wouldn't consider it anywhere near high priority. > Totally not a priority, it just finally got tested for RHEL so I wanted to make sure I posted it upstream before I forgot about it for months, I also filed: https://bugzilla.kernel.org/show_bug.cgi?id=92311 since the RH bug is private and full of crap, that bug contains a screenshot of the remote console to see what sort of crap it produces. Dave. -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] vt_buffer: drop console buffer copying optimisations
On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman wrote: > > I can take this through the tty tree, but can I put it in linux-next and > wait for the 3.20 merge window to give people who might notice a > slow-down a chance to object? Yes. The problem only affects one (or a couple of) truly outrageously bad graphics cards that are only used in servers (because they are such crap that they wouldn't be acceptable anywhere else anyway), and they have afaik never worked with 64-bit kernels, so it's not even a regression. So it's worth fixing because it's a real - albeit very rare - problem (especially since the enhanched rep instruction model of memcpy could easily be *worse* than the 16-bit-at-a-time manual version), but I wouldn't consider it anywhere near high priority. Linus -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] vt_buffer: drop console buffer copying optimisations
On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie wrote: > > Linus, this came up a while back I finally got some confirmation > that it fixes those servers. I'm certainly ok with this. which way should it go in? The users are: - drivers/tty/vt/vt.c (Greg KH, "tty layer") - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends) and it might make sense to have *some* indication of how much worse this makes fbcon performance in particular.. Greg/Tomi - the patch is removing this: #define scr_memcpyw(d, s, c) memcpy(d, s, c) #define scr_memmovew(d, s, c) memmove(d, s, c) #define VT_BUF_HAVE_MEMCPYW #define VT_BUF_HAVE_MEMMOVEW from , because some stupid graphics cards apparently cannot handle 64-bit accesses of regular memcpy/memmove. And on other setups, this will be the reverse: 8-bit accesses due to using "rep movsb", which is the fast way to move/clear memory on modern Intel CPU's, but is really wrong for MMIO where it will be slow as hell. So just getting rid of the memcpy/memmove is likely the right thing in general, since the fallbacks go this the traditional 16-bit-at-a-time way. And getting rid of the memcpy _may_ speed things up. But if it slows things down, we might have to try something else. Like saying "all cards we've ever seen have been ok with aligned 32-bit accesses", and extend the open-coded scr_memcpy/memmove functions to do that. Hmm? Linus -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel