[PATCH v2 3/3] tty: hvc: hvc_write() fix break condition
Commit 550ddadcc758 ("tty: hvc: hvc_write() may sleep") broke the termination condition in case the driver stops accepting characters. This can result in unnecessary polling of the busy driver. Restore it by testing the hvc_push return code. Tested-by: Matteo Croce Tested-by: Jason Gunthorpe Tested-by: Leon Romanovsky Signed-off-by: Nicholas Piggin --- drivers/tty/hvc/hvc_console.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index bacf9b73ec98..27284a2dcd2b 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -522,6 +522,8 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count return -EIO; while (count > 0) { + int ret = 0; + spin_lock_irqsave(>lock, flags); rsize = hp->outbuf_size - hp->n_outbuf; @@ -537,10 +539,13 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count } if (hp->n_outbuf > 0) - hvc_push(hp); + ret = hvc_push(hp); spin_unlock_irqrestore(>lock, flags); + if (!ret) + break; + if (count) { if (hp->n_outbuf > 0) hvc_flush(hp); -- 2.18.0
[PATCH v2 2/3] tty: hvc: hvc_poll() fix read loop batching
Commit ec97eaad1383 ("tty: hvc: hvc_poll() break hv read loop") removes get_chars batching entirely, which slows down large console operations like paste -- virtio console "feels worse than a 9600 baud serial line," reports Matteo. This adds back batching in a more latency friendly way. If the caller can sleep then we try to fill the entire flip buffer, releasing the lock and scheduling between each iteration. If it can not sleep, then batches are limited to 128 bytes. Matteo confirms this fixes the performance problem. Latency testing the powerpc OPAL console with OpenBMC UART with a large paste shows about 0.25ms latency, which seems reasonable. 10ms latencies were typical for this case before the latency breaking work, so we still see most of the benefit. kopald-12040d.h.5us : hvc_poll <-hvc_handle_interrupt kopald-12040d.h.5us : __hvc_poll <-hvc_handle_interrupt kopald-12040d.h.5us : _raw_spin_lock_irqsave <-__hvc_poll kopald-12040d.h.5us : tty_port_tty_get <-__hvc_poll kopald-12040d.h.6us : _raw_spin_lock_irqsave <-tty_port_tty_get kopald-12040d.h.6us : _raw_spin_unlock_irqrestore <-tty_port_tty_get kopald-12040d.h.6us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h.7us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h.7us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 36us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 36us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 36us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 65us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 65us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 66us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 94us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 95us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 95us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 124us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 124us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 125us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 154us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 154us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 154us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 183us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 184us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 184us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 213us : tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 213us : __tty_buffer_request_room <-__hvc_poll kopald-12040d.h. 213us+: opal_get_chars <-__hvc_poll kopald-12040d.h. 242us : _raw_spin_unlock_irqrestore <-__hvc_poll kopald-12040d.h. 242us : tty_flip_buffer_push <-__hvc_poll kopald-12040d.h. 243us : queue_work_on <-tty_flip_buffer_push kopald-12040d.h. 243us : tty_kref_put <-__hvc_poll kopald-12040d.h. 243us : hvc_kick <-hvc_handle_interrupt kopald-12040d.h. 243us : wake_up_process <-hvc_kick kopald-12040d.h. 244us : try_to_wake_up <-hvc_kick kopald-12040d.h. 244us : _raw_spin_lock_irqsave <-try_to_wake_up kopald-12040d.h. 244us : _raw_spin_unlock_irqrestore <-try_to_wake_up Reported-by: Matteo Croce Tested-by: Matteo Croce Tested-by: Jason Gunthorpe Tested-by: Leon Romanovsky Signed-off-by: Nicholas Piggin --- drivers/tty/hvc/hvc_console.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index c917749708d2..bacf9b73ec98 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -623,6 +623,15 @@ static int hvc_chars_in_buffer(struct tty_struct *tty) #define MAX_TIMEOUT(2000) static u32 timeout = MIN_TIMEOUT; +/* + * Maximum number of bytes to get from the console driver if hvc_poll is + * called from driver (and can't sleep). Any more than this and we break + * and start polling with khvcd. This value was derived from from an OpenBMC + * console with the OPAL driver that results in about 0.25ms interrupts off + * latency. + */ +#define HVC_ATOMIC_READ_MAX128 + #define HVC_POLL_READ 0x0001 #define HVC_POLL_WRITE 0x0002 @@ -669,8 +678,8 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) if (!hp->irq_requested) poll_mask |= HVC_POLL_READ; + read_again: /* Read data if any */ - count = tty_buffer_request_room(>port, N_INBUF); /* If flip is full, just reschedule a later read */ @@ -717,7 +726,18 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) #endif /* CONFIG_MAGIC_SYSRQ */ tty_insert_flip_char(>port, buf[i], 0); } - read_total = n; + read_total += n; + + if (may_sleep) { +
[PATCH v2 1/3] tty: hvc: hvc_poll() fix read loop hang
Commit ec97eaad1383 ("tty: hvc: hvc_poll() break hv read loop") causes the virtio console to hang at times (e.g., if you paste a bunch of characters to it. The reason is that get_chars must return 0 before we can be sure the driver will kick or poll input again, but this change only scheduled a poll if get_chars had returned a full count. Change this to poll on any > 0 count. Reported-by: Matteo Croce Reported-by: Jason Gunthorpe Tested-by: Matteo Croce Tested-by: Jason Gunthorpe Tested-by: Leon Romanovsky Signed-off-by: Nicholas Piggin --- drivers/tty/hvc/hvc_console.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 5414c4a87bea..c917749708d2 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -717,10 +717,13 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) #endif /* CONFIG_MAGIC_SYSRQ */ tty_insert_flip_char(>port, buf[i], 0); } - if (n == count) - poll_mask |= HVC_POLL_READ; read_total = n; + /* +* Latency break, schedule another poll immediately. +*/ + poll_mask |= HVC_POLL_READ; + out: /* Wakeup write queue if necessary */ if (hp->do_wakeup) { -- 2.18.0
[PATCH v2 0/3] tty: hvc: latency break regression fixes
Re-sending this one with the used-uinitialized warning in patch 3 fixed. Greg these patches are needed to fix regressions in this merge window, please consider them for your tty tree. Thanks, Nick Nicholas Piggin (3): tty: hvc: hvc_poll() fix read loop hang tty: hvc: hvc_poll() fix read loop batching tty: hvc: hvc_write() fix break condition drivers/tty/hvc/hvc_console.c | 38 ++- 1 file changed, 33 insertions(+), 5 deletions(-) -- 2.18.0
Re: [PATCH net] powerpc: use big endian to hash len and proto in csum_ipv6_magic
Xin Long a écrit : The function csum_ipv6_magic doesn't convert len and proto to big endian before doing ipv6 csum hash, which is not consistent with RFC and other arches. Jianlin found it when ICMPv6 packets from other hosts were dropped in the powerpc64 system. This patch is to fix it by using instruction 'lwbrx' to do this conversion in powerpc32/64 csum_ipv6_magic. Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly") Reported-by: Jianlin Shi Signed-off-by: Xin Long --- arch/powerpc/lib/checksum_32.S | 4 arch/powerpc/lib/checksum_64.S | 4 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index aa22406..7d3446e 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -325,6 +325,10 @@ _GLOBAL(csum_ipv6_magic) adder0, r0, r9 lwz r11, 12(r4) adder0, r0, r10 + STWX_BE r5, 0, r1 + lwz r5, 0(r1) + STWX_BE r6, 0, r1 + lwz r6, 0(r1) PPC32 doesn't support little endian, so nothing to do here. add r5, r5, r6 /* assumption: len + proto doesn't carry */ adder0, r0, r11 adder0, r0, r5 diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S index 886ed94..302e732 100644 --- a/arch/powerpc/lib/checksum_64.S +++ b/arch/powerpc/lib/checksum_64.S @@ -439,6 +439,10 @@ EXPORT_SYMBOL(csum_partial_copy_generic) _GLOBAL(csum_ipv6_magic) ld r8, 0(r3) ld r9, 8(r3) + STWX_BE r5, 0, r1 + lwz r5, 0(r1) + STWX_BE r6, 0, r1 + lwz r6, 0(r1) add r5, r5, r6 This is overkill. For LE it should be enough to rotate r5 by 8 bits after the sum. Best place to do it would be after ld r11 I think. Christophe addcr0, r8, r9 ld r10, 0(r4) -- 2.1.0
[PATCH net] powerpc: use big endian to hash len and proto in csum_ipv6_magic
The function csum_ipv6_magic doesn't convert len and proto to big endian before doing ipv6 csum hash, which is not consistent with RFC and other arches. Jianlin found it when ICMPv6 packets from other hosts were dropped in the powerpc64 system. This patch is to fix it by using instruction 'lwbrx' to do this conversion in powerpc32/64 csum_ipv6_magic. Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly") Reported-by: Jianlin Shi Signed-off-by: Xin Long --- arch/powerpc/lib/checksum_32.S | 4 arch/powerpc/lib/checksum_64.S | 4 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index aa22406..7d3446e 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -325,6 +325,10 @@ _GLOBAL(csum_ipv6_magic) adder0, r0, r9 lwz r11, 12(r4) adder0, r0, r10 + STWX_BE r5, 0, r1 + lwz r5, 0(r1) + STWX_BE r6, 0, r1 + lwz r6, 0(r1) add r5, r5, r6 /* assumption: len + proto doesn't carry */ adder0, r0, r11 adder0, r0, r5 diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S index 886ed94..302e732 100644 --- a/arch/powerpc/lib/checksum_64.S +++ b/arch/powerpc/lib/checksum_64.S @@ -439,6 +439,10 @@ EXPORT_SYMBOL(csum_partial_copy_generic) _GLOBAL(csum_ipv6_magic) ld r8, 0(r3) ld r9, 8(r3) + STWX_BE r5, 0, r1 + lwz r5, 0(r1) + STWX_BE r6, 0, r1 + lwz r6, 0(r1) add r5, r5, r6 addcr0, r8, r9 ld r10, 0(r4) -- 2.1.0