Re: [PATCH] vt_buffer: drop console buffer copying optimisations

2015-01-29 Thread Greg Kroah-Hartman
On Thu, Jan 29, 2015 at 03:40:33PM -0800, Linus Torvalds wrote:
> On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie  wrote:
> >
> > Linus, this came up a while back I finally got some confirmation
> > that it fixes those servers.
> 
> I'm certainly ok with this. which way should it go in? The users are:
> 
>  - drivers/tty/vt/vt.c (Greg KH, "tty layer")
> 
>  - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)
> 
> and it might make sense to have *some* indication of how much worse
> this makes fbcon performance in particular..
> 
> Greg/Tomi - the patch is removing this:
> 
>   #define scr_memcpyw(d, s, c) memcpy(d, s, c)
>   #define scr_memmovew(d, s, c) memmove(d, s, c)
>   #define VT_BUF_HAVE_MEMCPYW
>   #define VT_BUF_HAVE_MEMMOVEW
> 
> from , because some stupid graphics cards
> apparently cannot handle 64-bit accesses of regular memcpy/memmove.
> 
> And on other setups, this will be the reverse: 8-bit accesses due to
> using "rep movsb", which is the fast way to move/clear memory on
> modern Intel CPU's, but is really wrong for MMIO where it will be slow
> as hell.
> 
> So just getting rid of the memcpy/memmove is likely the right thing in
> general, since the fallbacks go this the traditional 16-bit-at-a-time
> way. And getting rid of the memcpy _may_ speed things up.
> 
> But if it slows things down, we might have to try something else. Like
> saying "all cards we've ever seen have been ok with aligned 32-bit
> accesses", and extend the open-coded scr_memcpy/memmove functions to
> do that.
> 
> Hmm?

I can take this through the tty tree, but can I put it in linux-next and
wait for the 3.20 merge window to give people who might notice a
slow-down a chance to object?

thanks,

greg k-h

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] vt_buffer: drop console buffer copying optimisations

2015-01-29 Thread Peter Hurley
On 01/28/2015 11:11 PM, Dave Airlie wrote:
> These two copy to/from VGA memory, however on the Silicon
> Motion SMI750 VGA card on a 64-bit system cause console corruption.
> 
> This is due to the hw being buggy and not handling a 64-bit transaction
> correctly.
> 
> We could try and create a 32-bit version of these routines,
> but I'm not sure the optimisation is worth much today.
> 
> Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1132826

Restricted link.


> Tested-by: Huawei engineering.
> Signed-off-by: Dave Airlie 
> ---
> 
> Linus, this came up a while back I finally got some confirmation
> that it fixes those servers.
> 
>  include/linux/vt_buffer.h | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/include/linux/vt_buffer.h b/include/linux/vt_buffer.h
> index 057db7d..f38c10b 100644
> --- a/include/linux/vt_buffer.h
> +++ b/include/linux/vt_buffer.h
> @@ -21,10 +21,6 @@
>  #ifndef VT_BUF_HAVE_RW
>  #define scr_writew(val, addr) (*(addr) = (val))
>  #define scr_readw(addr) (*(addr))
> -#define scr_memcpyw(d, s, c) memcpy(d, s, c)
> -#define scr_memmovew(d, s, c) memmove(d, s, c)
> -#define VT_BUF_HAVE_MEMCPYW
> -#define VT_BUF_HAVE_MEMMOVEW
>  #endif
>  
>  #ifndef VT_BUF_HAVE_MEMSETW
> 


--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] vt_buffer: drop console buffer copying optimisations

2015-01-29 Thread Dave Airlie
On 30 January 2015 at 10:03, Linus Torvalds
 wrote:
> On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman
>  wrote:
>>
>> I can take this through the tty tree, but can I put it in linux-next and
>> wait for the 3.20 merge window to give people who might notice a
>> slow-down a chance to object?
>
> Yes. The problem only affects one (or a couple of) truly outrageously
> bad graphics cards that are only used in servers (because they are
> such crap that they wouldn't be acceptable anywhere else anyway), and
> they have afaik never worked with 64-bit kernels, so it's not even a
> regression.
>
> So it's worth fixing because it's a real - albeit very rare - problem
> (especially since the enhanched rep instruction model of memcpy could
> easily be *worse* than the 16-bit-at-a-time manual version), but I
> wouldn't consider it anywhere near high priority.
>
Totally not a priority, it just finally got tested for RHEL so I wanted to
make sure I posted it upstream before I forgot about it for months,

I also filed:
https://bugzilla.kernel.org/show_bug.cgi?id=92311

since the RH bug is private and full of crap, that bug contains
a screenshot of the remote console to see what sort of crap it produces.

Dave.

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] vt_buffer: drop console buffer copying optimisations

2015-01-29 Thread Linus Torvalds
On Thu, Jan 29, 2015 at 3:57 PM, Greg Kroah-Hartman
 wrote:
>
> I can take this through the tty tree, but can I put it in linux-next and
> wait for the 3.20 merge window to give people who might notice a
> slow-down a chance to object?

Yes. The problem only affects one (or a couple of) truly outrageously
bad graphics cards that are only used in servers (because they are
such crap that they wouldn't be acceptable anywhere else anyway), and
they have afaik never worked with 64-bit kernels, so it's not even a
regression.

So it's worth fixing because it's a real - albeit very rare - problem
(especially since the enhanched rep instruction model of memcpy could
easily be *worse* than the 16-bit-at-a-time manual version), but I
wouldn't consider it anywhere near high priority.

 Linus

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] vt_buffer: drop console buffer copying optimisations

2015-01-29 Thread Linus Torvalds
On Wed, Jan 28, 2015 at 8:11 PM, Dave Airlie  wrote:
>
> Linus, this came up a while back I finally got some confirmation
> that it fixes those servers.

I'm certainly ok with this. which way should it go in? The users are:

 - drivers/tty/vt/vt.c (Greg KH, "tty layer")

 - drivers/video/console/* (fbcon people: Tomi Valkeinen and friends)

and it might make sense to have *some* indication of how much worse
this makes fbcon performance in particular..

Greg/Tomi - the patch is removing this:

  #define scr_memcpyw(d, s, c) memcpy(d, s, c)
  #define scr_memmovew(d, s, c) memmove(d, s, c)
  #define VT_BUF_HAVE_MEMCPYW
  #define VT_BUF_HAVE_MEMMOVEW

from , because some stupid graphics cards
apparently cannot handle 64-bit accesses of regular memcpy/memmove.

And on other setups, this will be the reverse: 8-bit accesses due to
using "rep movsb", which is the fast way to move/clear memory on
modern Intel CPU's, but is really wrong for MMIO where it will be slow
as hell.

So just getting rid of the memcpy/memmove is likely the right thing in
general, since the fallbacks go this the traditional 16-bit-at-a-time
way. And getting rid of the memcpy _may_ speed things up.

But if it slows things down, we might have to try something else. Like
saying "all cards we've ever seen have been ok with aligned 32-bit
accesses", and extend the open-coded scr_memcpy/memmove functions to
do that.

Hmm?

   Linus

--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel