[PATCH v2] net: make getname() functions return length rather than use int* parameter
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. textdata bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko CC: David S. Miller CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-blueto...@vger.kernel.org CC: linux-decnet-u...@lists.sourceforge.net CC: linux-wirel...@vger.kernel.org CC: linux-r...@vger.kernel.org CC: linux-s...@vger.kernel.org CC: linux-...@vger.kernel.org CC: linux-...@vger.kernel.org --- drivers/infiniband/hw/usnic/usnic_transport.c | 5 ++-- drivers/isdn/mISDN/socket.c | 5 ++-- drivers/net/ppp/pppoe.c | 6 ++--- drivers/net/ppp/pptp.c| 6 ++--- drivers/scsi/iscsi_tcp.c | 14 +-- drivers/soc/qcom/qmi_interface.c | 3 +-- drivers/staging/ipx/af_ipx.c | 6 ++--- drivers/staging/irda/net/af_irda.c| 8 +++--- drivers/staging/lustre/lnet/lnet/lib-socket.c | 7 +++--- drivers/target/iscsi/iscsi_target_login.c | 18 +++--- drivers/vhost/net.c | 7 +++--- fs/dlm/lowcomms.c | 7 +++--- fs/ocfs2/cluster/tcp.c| 6 ++--- include/linux/net.h | 8 +++--- include/net/inet_common.h | 2 +- include/net/ipv6.h| 2 +- include/net/sock.h| 2 +- net/appletalk/ddp.c | 5 ++-- net/atm/pvc.c | 5 ++-- net/atm/svc.c | 5 ++-- net/ax25/af_ax25.c| 4 +-- net/bluetooth/hci_sock.c | 4 +-- net/bluetooth/l2cap_sock.c| 5 ++-- net/bluetooth/rfcomm/sock.c | 5 ++-- net/bluetooth/sco.c | 5 ++-- net/can/raw.c | 6 ++--- net/core/sock.c | 5 ++-- net/decnet/af_decnet.c| 6 ++--- net/ipv4/af_inet.c| 5 ++-- net/ipv6/af_inet6.c | 5 ++-- net/iucv/af_iucv.c| 5 ++-- net/l2tp/l2tp_ip.c| 5 ++-- net/l2tp/l2tp_ip6.c | 5 ++-- net/l2tp/l2tp_ppp.c | 5 ++-- net/llc/af_llc.c | 5 ++-- net/netlink/af_netlink.c | 5 ++-- net/netrom/af_netrom.c| 9 --- net/nfc/llcp_sock.c | 5 ++-- net/packet/af_packet.c| 10 +++- net/phonet/socket.c | 5 ++-- net/qrtr/qrtr.c | 5 ++-- net/rds/af_rds.c | 5 ++-- net/rds/tcp.c | 7 ++ net/rose/af_rose.c| 5 ++-- net/sctp/ipv6.c | 8 +++--- net/smc/af_smc.c | 11 - net/socket.c | 35 +-- net/sunrpc/clnt.c | 6 ++--- net/sunrpc/svcsock.c | 13 ++ net/sunrpc/xprtsock.c | 3 +-- net/tipc/socket.c | 5 ++-- net/unix/af_unix.c| 10 net/vmw_vsock/af_vsock.c | 4 +-- net/x25/af_x25.c | 4 +-- security/tomoyo/network.c | 5 ++-- 55 files changed, 159 insertions(+), 203 deletio
Re: [PATCH] net: make getname() functions return length rather than use int* parameter
On 02/12/2018 06:47 PM, David Miller wrote: From: Denys Vlasenko Date: Mon, 12 Feb 2018 15:15:18 +0100 Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. textdata bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko Please do an allmodconfig build, there are still some conversions you missed: security/tomoyo/network.c: In function ‘tomoyo_socket_listen_permission’: security/tomoyo/network.c:658:19: warning: passing argument 3 of ‘sock->ops->getname’ makes integer from pointer without a cast [-Wint-conversion] &addr, &addr_len, 0); ^ security/tomoyo/network.c:658:19: note: expected ‘int’ but argument is of type ‘int *’ security/tomoyo/network.c:657:21: error: too many arguments to function ‘sock->ops->getname’ const int error = sock->ops->getname(sock, (struct sockaddr *) ^~~~ fs/dlm/lowcomms.c: In function ‘lowcomms_error_report’: fs/dlm/lowcomms.c:495:6: error: too many arguments to function ‘kernel_getpeername’ kernel_getpeername(con->sock, (struct sockaddr *)&saddr, &buflen)) { ^~ fs/dlm/lowcomms.c: In function ‘tcp_accept_from_sock’: fs/dlm/lowcomms.c:761:7: warning: passing argument 3 of ‘newsock->ops->getname’ makes integer from pointer without a cast [-Wint-conversion] &len, 2)) { ^ fs/dlm/lowcomms.c:761:7: note: expected ‘int’ but argument is of type ‘int *’ fs/dlm/lowcomms.c:760:6: error: too many arguments to function ‘newsock->ops->getname’ if (newsock->ops->getname(newsock, (struct sockaddr *)&peeraddr, ^~~ Sorry. Will send updated patch.
[PATCH] net: make getname() functions return length rather than use int* parameter
Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. textdata bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko CC: David S. Miller CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-blueto...@vger.kernel.org CC: linux-decnet-u...@lists.sourceforge.net CC: linux-wirel...@vger.kernel.org CC: linux-r...@vger.kernel.org CC: linux-s...@vger.kernel.org CC: linux-...@vger.kernel.org CC: linux-...@vger.kernel.org --- drivers/isdn/mISDN/socket.c| 5 ++--- drivers/net/ppp/pppoe.c| 6 ++ drivers/net/ppp/pptp.c | 6 ++ drivers/scsi/iscsi_tcp.c | 14 +++--- drivers/soc/qcom/qmi_interface.c | 3 +-- drivers/staging/ipx/af_ipx.c | 6 ++ drivers/staging/irda/net/af_irda.c | 8 +++- include/linux/net.h| 8 +++- include/net/inet_common.h | 2 +- include/net/ipv6.h | 2 +- include/net/sock.h | 2 +- net/appletalk/ddp.c| 5 ++--- net/atm/pvc.c | 5 ++--- net/atm/svc.c | 5 ++--- net/ax25/af_ax25.c | 4 ++-- net/bluetooth/hci_sock.c | 4 ++-- net/bluetooth/l2cap_sock.c | 5 ++--- net/bluetooth/rfcomm/sock.c| 5 ++--- net/bluetooth/sco.c| 5 ++--- net/can/raw.c | 6 ++ net/core/sock.c| 5 +++-- net/decnet/af_decnet.c | 6 ++ net/ipv4/af_inet.c | 5 ++--- net/ipv6/af_inet6.c| 5 ++--- net/iucv/af_iucv.c | 5 ++--- net/l2tp/l2tp_ip.c | 5 ++--- net/l2tp/l2tp_ip6.c| 5 ++--- net/l2tp/l2tp_ppp.c| 5 ++--- net/llc/af_llc.c | 5 ++--- net/netlink/af_netlink.c | 5 ++--- net/netrom/af_netrom.c | 9 + net/nfc/llcp_sock.c| 5 ++--- net/packet/af_packet.c | 10 -- net/phonet/socket.c| 5 ++--- net/qrtr/qrtr.c| 5 ++--- net/rds/af_rds.c | 5 ++--- net/rds/tcp.c | 7 ++- net/rose/af_rose.c | 5 ++--- net/sctp/ipv6.c| 8 net/smc/af_smc.c | 7 +++ net/socket.c | 35 +-- net/sunrpc/clnt.c | 6 +++--- net/sunrpc/svcsock.c | 13 - net/sunrpc/xprtsock.c | 3 +-- net/tipc/socket.c | 5 ++--- net/unix/af_unix.c | 10 +- net/vmw_vsock/af_vsock.c | 4 ++-- net/x25/af_x25.c | 4 ++-- 48 files changed, 132 insertions(+), 171 deletions(-) diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c index c5603d1a07d6..1f8f489b4167 100644 --- a/drivers/isdn/mISDN/socket.c +++ b/drivers/isdn/mISDN/socket.c @@ -560,7 +560,7 @@ data_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_len) static int data_sock_getname(struct socket *sock, struct sockaddr *addr, - int *addr_len, int peer) + int peer) { struct sockaddr_mISDN *maddr = (struct sockaddr_mISDN *) addr; struct sock *sk = sock->sk; @@ -570,14 +570,13 @@ data_sock_getname(struct socket *sock, struct sockaddr *addr, lock_sock(sk); - *addr_len = sizeof(*maddr); maddr->family = AF_ISDN; maddr->dev = _pms(sk)->dev->id; maddr->channel = _pms(sk)->ch.nr; maddr->sapi = _pms(sk)->ch.addr & 0xff; maddr->tei = (_pms(sk)->ch.addr >> 8) & 0xff; release_sock(sk); - return 0; + return sizeof(*maddr); } static const struct proto_ops data_sock_ops = { diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index 5aa59f41bf8c
bug in "PCI: Support INTx masking on ConnectX-4 with firmware x.14.1100+"?
+ /* Reading from resource space should be 32b aligned */ + fw_maj_min = ioread32be(fw_ver); + fw_sub_min = ioread32be(fw_ver + 1); + fw_major = fw_maj_min & 0x; + fw_minor = fw_maj_min >> 16; + fw_subminor = fw_sub_min & 0x; Maybe second read should be ioread32be(fw_ver + 4)?
Re: [PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes
On 06/21/2017 09:24 PM, Marcelo Ricardo Leitner wrote: On Wed, Jun 21, 2017 at 07:03:27PM +0200, Denys Vlasenko wrote: This function compiles to 147 bytes of machine code. 13 callsites. I'm no expert on SCTP events, but quick reading of SCTP docs tells me that SCTP events are not happening on every packet. They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such. Does not look performance critical. Signed-off-by: Denys Vlasenko CC: Vlad Yasevich CC: Neil Horman CC: David Miller CC: linux-s...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org Acked-by: Marcelo Ricardo Leitner Just wondering, are you conducting further research on this topic? Because we probably could use such review on SCTP stack. Here is the list of sctp inlines which has any (however small) benefit when deinlined. filename:lineno:inline_name:lines_of_source_code:saved_bytes_by_deinlining:size_of_code_of_deinlined_function include/net/sctp/command.h:228:sctp_add_cmd_sf:7:8306:38 net/sctp/ulpevent.c:91:sctp_ulpevent_set_owner:13:1616:147 include/net/sctp/sctp.h:414:sctp_skb_set_owner_r:10:934:75 net/sctp/input.c:840:sctp_hash_key:13:896:359 net/sctp/input.c:823:sctp_hash_obj:13:704:409 include/net/sctp/checksum.h:60:sctp_compute_cksum:13:595:85 net/sctp/input.c:800:sctp_hash_cmp:18:320:124 net/sctp/socket.c:117:sctp_wspace:19:256:76 include/net/sctp/sctp.h:272:sctp_max_rto:7:204:68 net/sctp/socket.c:173:sctp_verify_addr:15:192:72 include/net/sctp/checksum.h:46:sctp_csum_update:4:147:21 include/net/sctp/sctp.h:519:param_type2af:8:134:35 include/net/sctp/sctp.h:399:sctp_list_dequeue:7:123:59 include/net/sctp/sctp.h:596:sctp_transport_dst_check:4:120:60 include/net/sctp/sctp.h:435:sctp_frag_point:12:65:65 net/sctp/outqueue.c:82:sctp_outq_dequeue_data:10:64:87 include/net/sctp/command.h:243:sctp_next_cmd:4:64:37 include/net/sctp/sm.h:347:sctp_data_size:6:19:19 For .c files, the patches are trivial and I have them auto-generated, I'll send them to you privately to save you the mechanical work. Unfortunately, for inlines in .h files this does not work (a human is needed to decide where to more the function). Of course, not every deinlining makes sense.
[PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes
This function compiles to 147 bytes of machine code. 13 callsites. I'm no expert on SCTP events, but quick reading of SCTP docs tells me that SCTP events are not happening on every packet. They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such. Does not look performance critical. Signed-off-by: Denys Vlasenko CC: Vlad Yasevich CC: Neil Horman CC: David Miller CC: linux-s...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- Changed since v1: * reindented function argument list net/sctp/ulpevent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index ec2b3e0..bc3f495 100644 --- a/net/sctp/ulpevent.c +++ b/net/sctp/ulpevent.c @@ -88,8 +88,8 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent *event) /* Hold the association in case the msg_name needs read out of * the association. */ -static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event, - const struct sctp_association *asoc) +static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event, + const struct sctp_association *asoc) { struct sctp_chunk *chunk = event->chunk; struct sk_buff *skb; -- 1.8.3.1
[PATCH] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes
This function compiles to 147 bytes of machine code. 13 callsites. I'm no expert on SCTP events, but quick reading of SCTP docs tells me that SCTP events are not happening on every packet. They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such. Does not look performance critical. Signed-off-by: Denys Vlasenko CC: Vlad Yasevich CC: Neil Horman CC: David Miller CC: linux-s...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- net/sctp/ulpevent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index ec2b3e0..049aa5a 100644 --- a/net/sctp/ulpevent.c +++ b/net/sctp/ulpevent.c @@ -88,7 +88,7 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent *event) /* Hold the association in case the msg_name needs read out of * the association. */ -static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event, +static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event, const struct sctp_association *asoc) { struct sctp_chunk *chunk = event->chunk; -- 1.8.3.1
[PATCH] liquidio: stop using huge static buffer, save 4096k in .data
Only compile-tested - I don't have the hardware. >From code inspection, octeon_pci_write_core_mem() appears to be safe wrt unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned anyway. Signed-off-by: Denys Vlasenko CC: Felix Manlunas CC: Prasad Kanneganti CC: Derek Chickles CC: David Miller CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- drivers/net/ethernet/cavium/liquidio/octeon_console.c | 6 +- drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c | 4 ++-- drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h | 2 +- 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c b/drivers/net/ethernet/cavium/liquidio/octeon_console.c index 53f38d0..e08f760 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c @@ -724,13 +724,11 @@ static int octeon_console_read(struct octeon_device *oct, u32 console_num, } #define FBUF_SIZE (4 * 1024 * 1024) -u8 fbuf[FBUF_SIZE]; int octeon_download_firmware(struct octeon_device *oct, const u8 *data, size_t size) { int ret = 0; - u8 *p = fbuf; u32 crc32_result; u64 load_addr; u32 image_len; @@ -805,10 +803,8 @@ int octeon_download_firmware(struct octeon_device *oct, const u8 *data, else size = FBUF_SIZE; - memcpy(p, data, size); - /* download the image */ - octeon_pci_write_core_mem(oct, load_addr, p, (u32)size); + octeon_pci_write_core_mem(oct, load_addr, data, (u32)size); data += size; rem -= (u32)size; diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c index 5cd96e7..4c85ae6 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c @@ -167,10 +167,10 @@ octeon_pci_read_core_mem(struct octeon_device *oct, void octeon_pci_write_core_mem(struct octeon_device *oct, u64 coreaddr, - u8 *buf, + const u8 *buf, u32 len) { - __octeon_pci_rw_core_mem(oct, coreaddr, buf, len, 0); + __octeon_pci_rw_core_mem(oct, coreaddr, (u8 *)buf, len, 0); } u64 octeon_read_device_mem64(struct octeon_device *oct, u64 coreaddr) diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h index bae2fdd..47a3ff5 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h +++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h @@ -66,7 +66,7 @@ octeon_pci_read_core_mem(struct octeon_device *oct, void octeon_pci_write_core_mem(struct octeon_device *oct, u64 coreaddr, - u8 *buf, + const u8 *buf, u32 len); #endif -- 2.9.2
Huge static buffer in liquidio
Hello, I would like to discuss this commit: commit d3d7e6c65f75de18ced10a98595a847f9f95f0ce Author: Raghu Vatsavayi Date: Tue Jun 21 22:53:07 2016 -0700 liquidio: Firmware image download This patch has firmware image related changes for: firmware release upon failure, support latest firmware version and firmware download in 4MB chunks. Here is a part of it: +u8 fbuf[4 * 1024 * 1024]; ^^^ what?? [also, why it is not static?] + int octeon_download_firmware(struct octeon_device *oct, const u8 *data, size_t size) { int ret = 0; - u8 *p; - u8 *buffer; + u8 *p = fbuf; Don't you think that using 4 megabytes of static buffer *just for the firmware upload* is not a good practice? Further down the patch it's obvious that the buffer is not even needed, because the firmware is already in memory, the "data" and "size" parameters of this function point to it. The meat of the change of the FW download is this (deleted some debug messages code): - buffer = kmemdup(data, size, GFP_KERNEL); - if (!buffer) - return -ENOMEM; - - p = buffer + sizeof(struct octeon_firmware_file_header); + data += sizeof(struct octeon_firmware_file_header); /* load all images */ for (i = 0; i < be32_to_cpu(h->num_images); i++) { load_addr = be64_to_cpu(h->desc[i].addr); image_len = be32_to_cpu(h->desc[i].len); - /* download the image */ - octeon_pci_write_core_mem(oct, load_addr, p, image_len); + /* Write in 4MB chunks*/ + rem = image_len; - p += image_len; + while (rem) { + if (rem < (4 * 1024 * 1024)) + size = rem; + else + size = 4 * 1024 * 1024; + + memcpy(p, data, size); + + /* download the image */ + octeon_pci_write_core_mem(oct, load_addr, p, (u32)size); + + data += size; + rem -= (u32)size; + load_addr += size; + } } + dev_info(&oct->pci_dev->dev, "Writing boot command: %s\n", +h->bootcmd); /* Invoke the bootcmd */ ret = octeon_console_send_cmd(oct, h->bootcmd, 50); -done_downloading: - kfree(buffer); octeon_pci_write_core_mem() takes spinlock around copy op, I take this was the reason for this change: reduce IRQ-disabled time. Do you actually need an intermediate fbuf[] buffer here? (IOW: can't you just write data to PCI from memory area pointed by "data" ptr?) If there is indeed a reason for intermediate buffer, why did you drop the approach of having a temporary kmalloc'ed buffer and switches to a static (and *huge*) buffer? In my opinion, 4 Mbyte buffer is an overkill in any case. A buffer of ~4..16 Kbyte one would work just fine - it's not like you need to squeeze last 0.5% of speed when you upload firmware.
[PATCH] e1000e: prevent division by zero if TIMINCA is zero
Users report that under VMWare, er32(TIMINCA) returns zero. This causes division by zero at init time as follows: ==>incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK; for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) { /* latch SYSTIMH on read of SYSTIML */ systim_next = (cycle_t)er32(SYSTIML); systim_next |= (cycle_t)er32(SYSTIMH) << 32; time_delta = systim_next - systim; temp = time_delta; > rem = do_div(temp, incvalue); This change makes kernel survive this, and users report that NIC does work after this change. Since on real hardware incvalue is never zero, this should not affect real hardware use case. Signed-off-by: Denys Vlasenko CC: Jeff Kirsher CC: "Ruinskiy, Dima" CC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org CC: LKML --- drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 269087c..0626935 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4315,7 +4315,8 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) time_delta = systim_next - systim; temp = time_delta; - rem = do_div(temp, incvalue); + /* VMWare users have seen incvalue of zero, don't div / 0 */ + rem = incvalue ? do_div(temp, incvalue) : (time_delta != 0); systim = systim_next; -- 1.8.1.4
Re: [PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64
On 04/19/2016 10:57 PM, Jeff Kirsher wrote: > On Tue, 2016-04-19 at 14:34 +0200, Denys Vlasenko wrote: >> "incvalue" variable holds a result of "er32(TIMINCA) & >> E1000_TIMINCA_INCVALUE_MASK" >> and used in "do_div(temp, incvalue)" as a divisor. >> >> Thus, "u64 incvalue" declaration is probably a mistake. >> Even though it seems to be a harmless one, let's fix it. >> >> Signed-off-by: Denys Vlasenko >> CC: Jeff Kirsher >> CC: Jesse Brandeburg >> CC: Shannon Nelson >> CC: Carolyn Wyborny >> CC: Don Skidmore >> CC: Bruce Allan >> CC: John Ronciak >> CC: Mitch Williams >> CC: David S. Miller >> CC: LKML >> CC: netdev@vger.kernel.org >> --- >> drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) > > First of all, trimmed down the recipient list since almost all of the > reviewers you added have nothing to do with e1000e. I took the list here, MAINTAINERS: INTEL ETHERNET DRIVERS M: Jeff Kirsher R: Jesse Brandeburg R: Shannon Nelson R: Carolyn Wyborny R: Don Skidmore R: Bruce Allan R: John Ronciak R: Mitch Williams No more specific information who's e1000e reviewer. Sorry.
[PATCH 2/3] e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check
If two consecutive reads of the counter are the same, it is also not an overflow. "systimel_1 < systimel_2" should be "systimel_1 <= systimel_2". Before the patch, we could perform an *erroneous* correction: Let's say that systimel_1 == systimel_2 == 0x. "systimel_1 < systimel_2" is false, we think it's an overflow, we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented, and use "(systimeh << 32) + systimel_2" value which is 2^32 too large. Signed-off-by: Denys Vlasenko CC: Jeff Kirsher CC: Jesse Brandeburg CC: Shannon Nelson CC: Carolyn Wyborny CC: Don Skidmore CC: Bruce Allan CC: John Ronciak CC: Mitch Williams CC: David S. Miller CC: LKML CC: netdev@vger.kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 967311b..99d0e6e 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4287,7 +4287,7 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) systimeh = er32(SYSTIMH); systimel_2 = er32(SYSTIML); /* Check for overflow. If there was no overflow, use the values */ - if (systimel_1 < systimel_2) { + if (systimel_1 <= systimel_2) { systim = (cycle_t)systimel_1; systim |= (cycle_t)systimeh << 32; } else { -- 1.8.1.4
[PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64
"incvalue" variable holds a result of "er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)" as a divisor. Thus, "u64 incvalue" declaration is probably a mistake. Even though it seems to be a harmless one, let's fix it. Signed-off-by: Denys Vlasenko CC: Jeff Kirsher CC: Jesse Brandeburg CC: Shannon Nelson CC: Carolyn Wyborny CC: Don Skidmore CC: Bruce Allan CC: John Ronciak CC: Mitch Williams CC: David S. Miller CC: LKML CC: netdev@vger.kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9b4ec13..967311b 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4300,7 +4300,8 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) } if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) { - u64 incvalue, time_delta, rem, temp; + u64 time_delta, rem, temp; + u32 incvalue; int i; /* errata for 82574/82583 possible bad bits read from SYSTIMH/L -- 1.8.1.4
[PATCH 3/3] e1000e: e1000e_cyclecounter_read(): do overflow check only if needed
SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0] er32(SYSTIML) are probably moderately expensive (they are pci bus reads). Can we avoid one of them? Yes, we can. If the SYSTIML value we see is smaller than 0xff00, the overflow into SYSTIMH would require at least two increments. We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order. Even if one increment happens between them, the overflow into SYSTIMH is impossible, and we can avoid doing another er32(SYSTIML) read and overflow check. Signed-off-by: Denys Vlasenko CC: Jeff Kirsher CC: Jesse Brandeburg CC: Shannon Nelson CC: Carolyn Wyborny CC: Don Skidmore CC: Bruce Allan CC: John Ronciak CC: Mitch Williams CC: David S. Miller CC: LKML CC: netdev@vger.kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 99d0e6e..6f17f89 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4275,7 +4275,7 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter, cc); struct e1000_hw *hw = &adapter->hw; - u32 systimel_1, systimel_2, systimeh; + u32 systimel, systimeh; cycle_t systim, systim_next; /* SYSTIMH latching upon SYSTIML read does not work well. * This means that if SYSTIML overflows after we read it but before @@ -4283,21 +4283,21 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) * will experience a huge non linear increment in the systime value * to fix that we test for overflow and if true, we re-read systime. */ - systimel_1 = er32(SYSTIML); + systimel = er32(SYSTIML); systimeh = er32(SYSTIMH); - systimel_2 = er32(SYSTIML); - /* Check for overflow. If there was no overflow, use the values */ - if (systimel_1 <= systimel_2) { - systim = (cycle_t)systimel_1; - systim |= (cycle_t)systimeh << 32; - } else { - /* There was an overflow, read again SYSTIMH, and use -* systimel_2 -*/ - systimeh = er32(SYSTIMH); - systim = (cycle_t)systimel_2; - systim |= (cycle_t)systimeh << 32; + /* Is systimel is so large that overflow is possible? */ + if (systimel >= (u32)0x - E1000_TIMINCA_INCVALUE_MASK) { + u32 systimel_2 = er32(SYSTIML); + if (systimel > systimel_2) { + /* There was an overflow, read again SYSTIMH, and use +* systimel_2 +*/ + systimeh = er32(SYSTIMH); + systimel = systimel_2; + } } + systim = (cycle_t)systimel; + systim |= (cycle_t)systimeh << 32; if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) { u64 time_delta, rem, temp; -- 1.8.1.4
[PATCH] drivers/net/ethernet/jme.c: Deinline jme_reset_mac_processor, save 2816 bytes
This function compiles to 895 bytes of machine code. Clearly, this isn't a time-critical function. For one, it has a number of udelay(1) calls. Signed-off-by: Denys Vlasenko CC: David S. Miller CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org --- drivers/net/ethernet/jme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c index 3ddf657..711cb19 100644 --- a/drivers/net/ethernet/jme.c +++ b/drivers/net/ethernet/jme.c @@ -222,7 +222,7 @@ jme_clear_ghc_reset(struct jme_adapter *jme) jwrite32f(jme, JME_GHC, jme->reg_ghc); } -static inline void +static void jme_reset_mac_processor(struct jme_adapter *jme) { static const u32 mask[WAKEUP_FRAME_MASK_DWNR] = {0, 0, 0, 0}; -- 2.1.0
[PATCH] net: force inlining of netif_tx_start/stop_queue, sock_hold, __sock_put
Sometimes gcc mysteriously doesn't inline very small functions we expect to be inlined. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Arguably, gcc should do better, but gcc people aren't willing to invest time into it, asking to use __always_inline instead. With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os, the following functions get deinlined many times. netif_tx_stop_queue: 207 copies, 590 calls: 55 push %rbp 48 89 e5mov%rsp,%rbp f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi) 5d pop%rbp c3 retq netif_tx_start_queue: 47 copies, 111 calls 55 push %rbp 48 89 e5mov%rsp,%rbp f0 80 a7 e0 01 00 00 fe lock andb $0xfe,0x1e0(%rdi) 5d pop%rbp c3 retq sock_hold: 39 copies, 124 calls 55 push %rbp 48 89 e5mov%rsp,%rbp f0 ff 87 80 00 00 00lock incl 0x80(%rdi) 5d pop%rbp c3 retq __sock_put: 6 copies, 13 calls 55 push %rbp 48 89 e5mov%rsp,%rbp f0 ff 8f 80 00 00 00lock decl 0x80(%rdi) 5d pop%rbp c3 retq This patch fixes this via s/inline/__always_inline/. Code size decrease after the patch is ~2.5k: text data bss dec hex filename 56719876 56364551 36196352 149280779 8e5d80b vmlinux_before 56717440 56364551 36196352 149278343 8e5ce87 vmlinux Signed-off-by: Denys Vlasenko CC: David S. Miller CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org CC: netfilter-de...@vger.kernel.org --- include/linux/netdevice.h | 4 ++-- include/net/sock.h| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cb0d5d0..f924ddc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2801,7 +2801,7 @@ static inline void netif_tx_schedule_all(struct net_device *dev) netif_schedule_queue(netdev_get_tx_queue(dev, i)); } -static inline void netif_tx_start_queue(struct netdev_queue *dev_queue) +static __always_inline void netif_tx_start_queue(struct netdev_queue *dev_queue) { clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state); } @@ -2851,7 +2851,7 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev) } } -static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue) +static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue) { set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state); } diff --git a/include/net/sock.h b/include/net/sock.h index 255d3e0..fd15eb1 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -564,7 +564,7 @@ static inline bool __sk_del_node_init(struct sock *sk) modifications. */ -static inline void sock_hold(struct sock *sk) +static __always_inline void sock_hold(struct sock *sk) { atomic_inc(&sk->sk_refcnt); } @@ -572,7 +572,7 @@ static inline void sock_hold(struct sock *sk) /* Ungrab socket in the context, which assumes that socket refcnt cannot hit zero, f.e. it is true in context of any socketcall. */ -static inline void __sock_put(struct sock *sk) +static __always_inline void __sock_put(struct sock *sk) { atomic_dec(&sk->sk_refcnt); } -- 1.8.1.4
Re: [PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements
On 09/28/2015 05:32 PM, David Laight wrote: > From: Eric Dumazet >> Sent: 28 September 2015 15:27 >> On Mon, 2015-09-28 at 14:12 +, David Laight wrote: >>> From: Neil Horman >>>> Sent: 28 September 2015 14:51 >>>> On Mon, Sep 28, 2015 at 02:34:04PM +0200, Denys Vlasenko wrote: >>>>> Seemingly innocuous sctp_trans_state_to_prio_map[] array >>>>> is way bigger than it looks, since >>>>> "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" ! >>>>> >>>>> This patch replaces it with switch() statement. >>> >>> What about just adding 1 (and masking) before indexing the array? >>> That might require a static inline function with a local static array. >>> >>> Or define the array as (say) [16] and just mask the state before using >>> it as an index? >> >> Just let the compiler do its job, instead of obfuscating source. >> >> Compilers can transform a switch into an (optimal) table if it is really >> a gain. > > The compiler can choose between a jump table and nested ifs for a switch > statement. I've never seen it convert one into a data array index. I don't know why people are fixated on a lookup table here. For just four possible values, the amount of generated code is less than one Icache cacheline. Instruction cachelines are efficiently prefetched and branches are predicted on all modern CPUs. Possible data access for lookup table can not be prefetched as efficiently. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements
Seemingly innocuous sctp_trans_state_to_prio_map[] array is way bigger than it looks, since "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" ! This patch replaces it with switch() statement. Signed-off-by: Denys Vlasenko CC: Vlad Yasevich CC: Neil Horman CC: Marcelo Ricardo Leitner CC: linux-s...@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- Changes since v1: tweaked comments net/sctp/associola.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 197c3f5..dae51ac 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1208,20 +1208,22 @@ void sctp_assoc_update(struct sctp_association *asoc, * within this document. * * Our basic strategy is to round-robin transports in priorities - * according to sctp_state_prio_map[] e.g., if no such + * according to sctp_trans_score() e.g., if no such * transport with state SCTP_ACTIVE exists, round-robin through * SCTP_UNKNOWN, etc. You get the picture. */ -static const u8 sctp_trans_state_to_prio_map[] = { - [SCTP_ACTIVE] = 3,/* best case */ - [SCTP_UNKNOWN] = 2, - [SCTP_PF] = 1, - [SCTP_INACTIVE] = 0,/* worst case */ -}; - static u8 sctp_trans_score(const struct sctp_transport *trans) { - return sctp_trans_state_to_prio_map[trans->state]; + switch (trans->state) { + case SCTP_ACTIVE: + return 3; /* best case */ + case SCTP_UNKNOWN: + return 2; + case SCTP_PF: + return 1; + default: /* case SCTP_INACTIVE */ + return 0; /* worst case */ + } } static struct sctp_transport *sctp_trans_elect_tie(struct sctp_transport *trans1, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] jhash: Deinline jhash and jhash2
This patch deinlines jhash and jhash2. It also removes rhashtable_jhash2(key, length, seed) because it was merely calling jhash2(key, length, seed). With this .config: http://busybox.net/~vda/kernel_config, after deinlining these functions have sizes and callsite counts as follows: jhash: 297 bytes, 111 calls jhash2: 205 bytes, 136 calls Total size decrease is about 33,000 bytes: text data bss dec hex filename 90663567 17221960 36659200 144544727 89d93d7 vmlinux5 90630370 17221864 36659200 144511434 89d11ca vmlinux.after Signed-off-by: Denys Vlasenko CC: Thomas Graf CC: David Miller CC: Tom Herbert CC: Alexander Duyck CC: Jozsef Kadlecsik CC: Herbert Xu CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- Changes in v2: created a new source file, jhash.c Changes in v3: do not deinline __jhash_nwords; use EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL include/linux/jhash.h | 89 +-- lib/Makefile | 2 +- lib/jhash.c | 113 ++ lib/rhashtable.c | 13 +++--- 4 files changed, 123 insertions(+), 94 deletions(-) create mode 100644 lib/jhash.c diff --git a/include/linux/jhash.h b/include/linux/jhash.h index 348c6f4..847c1aa 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -57,93 +57,8 @@ /* An arbitrary initial parameter */ #define JHASH_INITVAL 0xdeadbeef -/* jhash - hash an arbitrary key - * @k: sequence of bytes as key - * @length: the length of the key - * @initval: the previous hash, or an arbitray value - * - * The generic version, hashes an arbitrary sequence of bytes. - * No alignment or length assumptions are made about the input key. - * - * Returns the hash value of the key. The result depends on endianness. - */ -static inline u32 jhash(const void *key, u32 length, u32 initval) -{ - u32 a, b, c; - const u8 *k = key; - - /* Set up the internal state */ - a = b = c = JHASH_INITVAL + length + initval; - - /* All but the last block: affect some 32 bits of (a,b,c) */ - while (length > 12) { - a += __get_unaligned_cpu32(k); - b += __get_unaligned_cpu32(k + 4); - c += __get_unaligned_cpu32(k + 8); - __jhash_mix(a, b, c); - length -= 12; - k += 12; - } - /* Last block: affect all 32 bits of (c) */ - /* All the case statements fall through */ - switch (length) { - case 12: c += (u32)k[11]<<24; - case 11: c += (u32)k[10]<<16; - case 10: c += (u32)k[9]<<8; - case 9: c += k[8]; - case 8: b += (u32)k[7]<<24; - case 7: b += (u32)k[6]<<16; - case 6: b += (u32)k[5]<<8; - case 5: b += k[4]; - case 4: a += (u32)k[3]<<24; - case 3: a += (u32)k[2]<<16; - case 2: a += (u32)k[1]<<8; - case 1: a += k[0]; -__jhash_final(a, b, c); - case 0: /* Nothing left to add */ - break; - } - - return c; -} - -/* jhash2 - hash an array of u32's - * @k: the key which must be an array of u32's - * @length: the number of u32's in the key - * @initval: the previous hash, or an arbitray value - * - * Returns the hash value of the key. - */ -static inline u32 jhash2(const u32 *k, u32 length, u32 initval) -{ - u32 a, b, c; - - /* Set up the internal state */ - a = b = c = JHASH_INITVAL + (length<<2) + initval; - - /* Handle most of the key */ - while (length > 3) { - a += k[0]; - b += k[1]; - c += k[2]; - __jhash_mix(a, b, c); - length -= 3; - k += 3; - } - - /* Handle the last 3 u32's: all the case statements fall through */ - switch (length) { - case 3: c += k[2]; - case 2: b += k[1]; - case 1: a += k[0]; - __jhash_final(a, b, c); - case 0: /* Nothing left to add */ - break; - } - - return c; -} - +u32 jhash(const void *key, u32 length, u32 initval); +u32 jhash2(const u32 *k, u32 length, u32 initval); /* __jhash_nwords - hash exactly 3, 2 or 1 word(s) */ static inline u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval) diff --git a/lib/Makefile b/lib/Makefile index 6897b52..978be53 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -26,7 +26,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ -percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o +percpu-refcount.o percpu_ida.o jhash.o rhashtable.o reciprocal_div.o obj-y += string_helpers.o obj-$(CONFIG_TEST_S
Re: [PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords
On 07/16/2015 08:17 PM, David Miller wrote: > From: Tom Herbert > Date: Thu, 16 Jul 2015 08:43:25 -0700 > >> On Thu, Jul 16, 2015 at 5:40 AM, Denys Vlasenko wrote: >>> This patch deinlines jhash, jhash2 and __jhash_nwords. >>> >>> It also removes rhashtable_jhash2(key, length, seed) >>> because it was merely calling jhash2(key, length, seed). >>> >>> With this .config: http://busybox.net/~vda/kernel_config, >>> after deinlining these functions have sizes and callsite counts >>> as follows: >>> >>> __jhash_nwords: 72 bytes, 75 calls >>> jhash: 297 bytes, 111 calls >>> jhash2: 205 bytes, 136 calls >>> >> jhash is used in several places in the critical data path. Does the >> decrease in text size justify performance impact of not inlining it? > > Tom took the words right out of my mouth. > > Denys, you keep making deinlining changes like this all the time, like > a robot. But I never see you make any effort to look into the performance > nor code generation ramifications of your changes. The performance impact of the call/ret pair on modern x86 CPUs is only 5 cycles. To make a real difference in execution speed, the function has to be less than 100 bytes of code. jhash and jhash2, at more than 200 bytes of code, are definitely far too large for inlining. Here is the smaller of two, jhash2: : 8d 84 b2 ef be ad delea-0x21524111(%rdx,%rsi,4),%eax 55 push %rbp 89 c1 mov%eax,%ecx 48 89 e5mov%rsp,%rbp 89 c2 mov%eax,%edx eb 62 jmp81a0d204 03 47 08add0x8(%rdi),%eax 03 17 add(%rdi),%edx 83 ee 03sub$0x3,%esi 03 4f 04add0x4(%rdi),%ecx 48 83 c7 0c add$0xc,%rdi 41 89 c0mov%eax,%r8d 29 c2 sub%eax,%edx 41 c1 c8 1c ror$0x1c,%r8d 01 c8 add%ecx,%eax 44 31 c2xor%r8d,%edx 41 89 d0mov%edx,%r8d 29 d1 sub%edx,%ecx 01 c2 add%eax,%edx 41 c1 c8 1a ror$0x1a,%r8d 41 31 c8xor%ecx,%r8d 44 89 c1mov%r8d,%ecx 44 29 c0sub%r8d,%eax 41 01 d0add%edx,%r8d c1 c9 18ror$0x18,%ecx 31 c1 xor%eax,%ecx 89 c8 mov%ecx,%eax 29 ca sub%ecx,%edx 46 8d 0c 01 lea(%rcx,%r8,1),%r9d c1 c8 10ror$0x10,%eax 31 d0 xor%edx,%eax 89 c1 mov%eax,%ecx 41 29 c0sub%eax,%r8d 42 8d 14 08 lea(%rax,%r9,1),%edx c1 c9 0dror$0xd,%ecx 44 31 c1xor%r8d,%ecx 89 c8 mov%ecx,%eax 41 29 c9sub%ecx,%r9d 01 d1 add%edx,%ecx c1 c8 1cror$0x1c,%eax 44 31 c8xor%r9d,%eax 83 fe 03cmp$0x3,%esi 77 99 ja 81a0d1a2 83 fe 02cmp$0x2,%esi 74 0e je 81a0d21c 83 fe 03cmp$0x3,%esi 74 06 je 81a0d219 ff ce dec%esi 75 45 jne81a0d25c eb 06 jmp81a0d21f 03 47 08add0x8(%rdi),%eax 03 4f 04add0x4(%rdi),%ecx 89 ce mov%ecx,%esi 03 17 add(%rdi),%edx 31 c8 xor%ecx,%eax c1 ce 12ror$0x12,%esi 29 f0 sub%esi,%eax 89 c6 mov%eax,%esi 31 c2 xor%eax,%edx c1 ce 15ror$0x15,%esi 29 f2 sub%esi,%edx 89 d6 mov%edx,%esi 31 d1 xor%edx,%ecx c1 ce 07ror$0x7,%esi 29 f1 sub%esi,%ecx 89 ce mov%ecx,%esi 31 c8 xor%ecx,%eax c1 ce 10ror$0x10,%esi 29 f0 sub%esi,%eax 89 c6 mov%eax,%esi 31 c2 xor%eax,%ed
[PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords
This patch deinlines jhash, jhash2 and __jhash_nwords. It also removes rhashtable_jhash2(key, length, seed) because it was merely calling jhash2(key, length, seed). With this .config: http://busybox.net/~vda/kernel_config, after deinlining these functions have sizes and callsite counts as follows: __jhash_nwords: 72 bytes, 75 calls jhash: 297 bytes, 111 calls jhash2: 205 bytes, 136 calls Total size decrease is about 38,000 bytes: text data bss dec hex filename 90663567 17221960 36659200 144544727 89d93d7 vmlinux5 90625577 17221864 36659200 144506641 89cff11 vmlinux.after Signed-off-by: Denys Vlasenko CC: Thomas Graf CC: Alexander Duyck CC: Jozsef Kadlecsik CC: Herbert Xu CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- Changes in v2: created a new source file, jhash.c include/linux/jhash.h | 123 + lib/Makefile | 2 +- lib/jhash.c | 149 ++ lib/rhashtable.c | 13 +++-- 4 files changed, 160 insertions(+), 127 deletions(-) create mode 100644 lib/jhash.c diff --git a/include/linux/jhash.h b/include/linux/jhash.h index 348c6f4..0b3f55d 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -31,131 +31,14 @@ /* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ #define jhash_mask(n) (jhash_size(n)-1) -/* __jhash_mix -- mix 3 32-bit values reversibly. */ -#define __jhash_mix(a, b, c) \ -{ \ - a -= c; a ^= rol32(c, 4); c += b; \ - b -= a; b ^= rol32(a, 6); a += c; \ - c -= b; c ^= rol32(b, 8); b += a; \ - a -= c; a ^= rol32(c, 16); c += b; \ - b -= a; b ^= rol32(a, 19); a += c; \ - c -= b; c ^= rol32(b, 4); b += a; \ -} - -/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ -#define __jhash_final(a, b, c) \ -{ \ - c ^= b; c -= rol32(b, 14); \ - a ^= c; a -= rol32(c, 11); \ - b ^= a; b -= rol32(a, 25); \ - c ^= b; c -= rol32(b, 16); \ - a ^= c; a -= rol32(c, 4); \ - b ^= a; b -= rol32(a, 14); \ - c ^= b; c -= rol32(b, 24); \ -} - /* An arbitrary initial parameter */ #define JHASH_INITVAL 0xdeadbeef -/* jhash - hash an arbitrary key - * @k: sequence of bytes as key - * @length: the length of the key - * @initval: the previous hash, or an arbitray value - * - * The generic version, hashes an arbitrary sequence of bytes. - * No alignment or length assumptions are made about the input key. - * - * Returns the hash value of the key. The result depends on endianness. - */ -static inline u32 jhash(const void *key, u32 length, u32 initval) -{ - u32 a, b, c; - const u8 *k = key; - - /* Set up the internal state */ - a = b = c = JHASH_INITVAL + length + initval; - - /* All but the last block: affect some 32 bits of (a,b,c) */ - while (length > 12) { - a += __get_unaligned_cpu32(k); - b += __get_unaligned_cpu32(k + 4); - c += __get_unaligned_cpu32(k + 8); - __jhash_mix(a, b, c); - length -= 12; - k += 12; - } - /* Last block: affect all 32 bits of (c) */ - /* All the case statements fall through */ - switch (length) { - case 12: c += (u32)k[11]<<24; - case 11: c += (u32)k[10]<<16; - case 10: c += (u32)k[9]<<8; - case 9: c += k[8]; - case 8: b += (u32)k[7]<<24; - case 7: b += (u32)k[6]<<16; - case 6: b += (u32)k[5]<<8; - case 5: b += k[4]; - case 4: a += (u32)k[3]<<24; - case 3: a += (u32)k[2]<<16; - case 2: a += (u32)k[1]<<8; - case 1: a += k[0]; -__jhash_final(a, b, c); - case 0: /* Nothing left to add */ - break; - } - - return c; -} - -/* jhash2 - hash an array of u32's - * @k: the key which must be an array of u32's - * @length: the number of u32's in the key - * @initval: the previous hash, or an arbitray value - * - * Returns the hash value of the key. - */ -static inline u32 jhash2(const u32 *k, u32 length, u32 initval) -{ - u32 a, b, c; - - /* Set up the internal state */ - a = b = c = JHASH_INITVAL + (length<<2) + initval; - - /* Handle most of the key */ - while (length > 3) { - a += k[0]; - b += k[1]; - c += k[2]; - __jhash_mix(a, b, c); - length -= 3; - k += 3; - } - - /* Handle the last 3 u32's: all the case statements fall through */ - switch
[PATCH] mac80211: Deinline rate_control_rate_init, rate_control_rate_update
With this .config: http://busybox.net/~vda/kernel_config, after deinlining these functions have sizes and callsite counts as follows: rate_control_rate_init: 554 bytes, 8 calls rate_control_rate_update: 1596 bytes, 5 calls Total size reduction: about 11 kbytes. Signed-off-by: Denys Vlasenko CC: John Linville CC: Michal Kazior CC: Johannes Berg Cc: linux-wirel...@vger.kernel.org Cc: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- net/mac80211/rate.c | 59 net/mac80211/rate.h | 60 +++-- 2 files changed, 62 insertions(+), 57 deletions(-) diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c index fda33f9..03687d2 100644 --- a/net/mac80211/rate.c +++ b/net/mac80211/rate.c @@ -29,6 +29,65 @@ module_param(ieee80211_default_rc_algo, charp, 0644); MODULE_PARM_DESC(ieee80211_default_rc_algo, "Default rate control algorithm for mac80211 to use"); +void rate_control_rate_init(struct sta_info *sta) +{ + struct ieee80211_local *local = sta->sdata->local; + struct rate_control_ref *ref = sta->rate_ctrl; + struct ieee80211_sta *ista = &sta->sta; + void *priv_sta = sta->rate_ctrl_priv; + struct ieee80211_supported_band *sband; + struct ieee80211_chanctx_conf *chanctx_conf; + + ieee80211_sta_set_rx_nss(sta); + + if (!ref) + return; + + rcu_read_lock(); + + chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf); + if (WARN_ON(!chanctx_conf)) { + rcu_read_unlock(); + return; + } + + sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band]; + + spin_lock_bh(&sta->rate_ctrl_lock); + ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista, + priv_sta); + spin_unlock_bh(&sta->rate_ctrl_lock); + rcu_read_unlock(); + set_sta_flag(sta, WLAN_STA_RATE_CONTROL); +} + +void rate_control_rate_update(struct ieee80211_local *local, + struct ieee80211_supported_band *sband, + struct sta_info *sta, u32 changed) +{ + struct rate_control_ref *ref = local->rate_ctrl; + struct ieee80211_sta *ista = &sta->sta; + void *priv_sta = sta->rate_ctrl_priv; + struct ieee80211_chanctx_conf *chanctx_conf; + + if (ref && ref->ops->rate_update) { + rcu_read_lock(); + + chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf); + if (WARN_ON(!chanctx_conf)) { + rcu_read_unlock(); + return; + } + + spin_lock_bh(&sta->rate_ctrl_lock); + ref->ops->rate_update(ref->priv, sband, &chanctx_conf->def, + ista, priv_sta, changed); + spin_unlock_bh(&sta->rate_ctrl_lock); + rcu_read_unlock(); + } + drv_sta_rc_update(local, sta->sdata, &sta->sta, changed); +} + int ieee80211_rate_control_register(const struct rate_control_ops *ops) { struct rate_control_alg *alg; diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h index 25c9be5..624fe5b 100644 --- a/net/mac80211/rate.h +++ b/net/mac80211/rate.h @@ -71,64 +71,10 @@ rate_control_tx_status_noskb(struct ieee80211_local *local, spin_unlock_bh(&sta->rate_ctrl_lock); } -static inline void rate_control_rate_init(struct sta_info *sta) -{ - struct ieee80211_local *local = sta->sdata->local; - struct rate_control_ref *ref = sta->rate_ctrl; - struct ieee80211_sta *ista = &sta->sta; - void *priv_sta = sta->rate_ctrl_priv; - struct ieee80211_supported_band *sband; - struct ieee80211_chanctx_conf *chanctx_conf; - - ieee80211_sta_set_rx_nss(sta); - - if (!ref) - return; - - rcu_read_lock(); - - chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf); - if (WARN_ON(!chanctx_conf)) { - rcu_read_unlock(); - return; - } - - sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band]; - - spin_lock_bh(&sta->rate_ctrl_lock); - ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista, - priv_sta); - spin_unlock_bh(&sta->rate_ctrl_lock); - rcu_read_unlock(); - set_sta_flag(sta, WLAN_STA_RATE_CONTROL); -} - -static inline void rate_control_rate_update(struct ieee80211_local *local, +void rate_control_rate_init(struct sta_info *sta); +void rate_control_rate_update(struct ieee80211_local *local, struct ieee80211_supported_band *sband, -
[PATCH] mac80211: Deinline drv_sta_state
With this .config: http://busybox.net/~vda/kernel_config, after deinlining the function size is 3132 bytes and there are 7 callsites. Total size reduction: about 20 kbytes. Signed-off-by: Denys Vlasenko CC: John Linville CC: Michal Kazior Cc: Johannes Berg Cc: linux-wirel...@vger.kernel.org Cc: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- net/mac80211/Makefile | 1 + net/mac80211/driver-ops.c | 41 + net/mac80211/driver-ops.h | 29 ++--- 3 files changed, 44 insertions(+), 27 deletions(-) create mode 100644 net/mac80211/driver-ops.c diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile index 3275f01..783e891 100644 --- a/net/mac80211/Makefile +++ b/net/mac80211/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_MAC80211) += mac80211.o # mac80211 objects mac80211-y := \ main.o status.o \ + driver-ops.o \ sta_info.o \ wep.o \ wpa.o \ diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c new file mode 100644 index 000..267c3b1 --- /dev/null +++ b/net/mac80211/driver-ops.c @@ -0,0 +1,41 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include +#include "ieee80211_i.h" +#include "trace.h" +#include "driver-ops.h" + +__must_check +int drv_sta_state(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct sta_info *sta, + enum ieee80211_sta_state old_state, + enum ieee80211_sta_state new_state) +{ + int ret = 0; + + might_sleep(); + + sdata = get_bss_sdata(sdata); + if (!check_sdata_in_driver(sdata)) + return -EIO; + + trace_drv_sta_state(local, sdata, &sta->sta, old_state, new_state); + if (local->ops->sta_state) { + ret = local->ops->sta_state(&local->hw, &sdata->vif, &sta->sta, + old_state, new_state); + } else if (old_state == IEEE80211_STA_AUTH && + new_state == IEEE80211_STA_ASSOC) { + ret = drv_sta_add(local, sdata, &sta->sta); + if (ret == 0) + sta->uploaded = true; + } else if (old_state == IEEE80211_STA_ASSOC && + new_state == IEEE80211_STA_AUTH) { + drv_sta_remove(local, sdata, &sta->sta); + } + trace_drv_return_int(local, ret); + return ret; +} diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index 32a2e70..02d9133 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -573,37 +573,12 @@ static inline void drv_sta_pre_rcu_remove(struct ieee80211_local *local, trace_drv_return_void(local); } -static inline __must_check +__must_check int drv_sta_state(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, struct sta_info *sta, enum ieee80211_sta_state old_state, - enum ieee80211_sta_state new_state) -{ - int ret = 0; - - might_sleep(); - - sdata = get_bss_sdata(sdata); - if (!check_sdata_in_driver(sdata)) - return -EIO; - - trace_drv_sta_state(local, sdata, &sta->sta, old_state, new_state); - if (local->ops->sta_state) { - ret = local->ops->sta_state(&local->hw, &sdata->vif, &sta->sta, - old_state, new_state); - } else if (old_state == IEEE80211_STA_AUTH && - new_state == IEEE80211_STA_ASSOC) { - ret = drv_sta_add(local, sdata, &sta->sta); - if (ret == 0) - sta->uploaded = true; - } else if (old_state == IEEE80211_STA_ASSOC && - new_state == IEEE80211_STA_AUTH) { - drv_sta_remove(local, sdata, &sta->sta); - } - trace_drv_return_int(local, ret); - return ret; -} + enum ieee80211_sta_state new_state); static inline void drv_sta_rc_update(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}
On Tue, Jul 14, 2015 at 2:38 PM, Sergei Shtylyov wrote: >> +#define IWL_READ_WRITE(static_inline) \ >> +static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \ >> +{ \ >> + trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val); \ >> + iwl_trans_write8(trans, ofs, val); \ >> +} \ > [...] >> >> +#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING) >> +IWL_READ_WRITE(static inline) > >Not static_inline? Yes. Here we are putting two words, "static inline", in front of every function definition. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}
With CONFIG_IWLWIFI_DEVICE_TRACING=y, these functions are rather large, too big for inlining. With this .config: http://busybox.net/~vda/kernel_config, after uninlining these functions have sizes and callsite counts as follows: iwl_read32 475 bytes, 51 callsites iwl_write32 477 bytes, 90 callsites iwl_write8 493 bytes, 3 callsites Reduction in size is about 74,000 bytes: text data bss dec hex filename 90758147 17226024 36659200 144643371 89f152b vmlinux0 90687995 17221928 36659200 144569123 89df323 vmlinux.after Signed-off-by: Denys Vlasenko CC: Johannes Berg CC: Emmanuel Grumbach Cc: "John W. Linville" Cc: Intel Linux Wireless Cc: linux-wirel...@vger.kernel.org Cc: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org --- drivers/net/wireless/iwlwifi/iwl-io.c | 7 +++ drivers/net/wireless/iwlwifi/iwl-io.h | 39 +-- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-io.c b/drivers/net/wireless/iwlwifi/iwl-io.c index 27c66e4..aed121e 100644 --- a/drivers/net/wireless/iwlwifi/iwl-io.c +++ b/drivers/net/wireless/iwlwifi/iwl-io.c @@ -38,6 +38,13 @@ #define IWL_POLL_INTERVAL 10 /* microseconds */ +#if defined(CONFIG_IWLWIFI_DEVICE_TRACING) +IWL_READ_WRITE( /*not inlined*/ ) +IWL_EXPORT_SYMBOL(iwl_write8); +IWL_EXPORT_SYMBOL(iwl_write32); +IWL_EXPORT_SYMBOL(iwl_read32); +#endif + int iwl_poll_bit(struct iwl_trans *trans, u32 addr, u32 bits, u32 mask, int timeout) { diff --git a/drivers/net/wireless/iwlwifi/iwl-io.h b/drivers/net/wireless/iwlwifi/iwl-io.h index 705d12c..3c9d2a8 100644 --- a/drivers/net/wireless/iwlwifi/iwl-io.h +++ b/drivers/net/wireless/iwlwifi/iwl-io.h @@ -32,24 +32,31 @@ #include "iwl-devtrace.h" #include "iwl-trans.h" -static inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) -{ - trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val); - iwl_trans_write8(trans, ofs, val); -} - -static inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val) -{ - trace_iwlwifi_dev_iowrite32(trans->dev, ofs, val); - iwl_trans_write32(trans, ofs, val); +#define IWL_READ_WRITE(static_inline) \ +static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \ +{ \ + trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val); \ + iwl_trans_write8(trans, ofs, val); \ +} \ +static_inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val) \ +{ \ + trace_iwlwifi_dev_iowrite32(trans->dev, ofs, val); \ + iwl_trans_write32(trans, ofs, val); \ +} \ +static_inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs) \ +{ \ + u32 val = iwl_trans_read32(trans, ofs); \ + trace_iwlwifi_dev_ioread32(trans->dev, ofs, val); \ + return val; \ } -static inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs) -{ - u32 val = iwl_trans_read32(trans, ofs); - trace_iwlwifi_dev_ioread32(trans->dev, ofs, val); - return val; -} +#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING) +IWL_READ_WRITE(static inline) +#else +void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val); +void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val); +u32 iwl_read32(struct iwl_trans *trans, u32 ofs); +#endif static inline void iwl_set_bit(struct iwl_trans *trans, u32 reg, u32 mask) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] 3c59x: Fix shared IRQ handling
As its first order of business, boomerang_interrupt() checks whether the device really has any pending interrupts. If it does not, it does nothing and returns, but it still returns IRQ_HANDLED. This is wrong: interrupt was not handled, IRQ handlers of other devices sharing this IRQ line need to be called. vortex_interrupt() has it right: it returns IRQ_NONE in this case via IRQ_RETVAL(0). Do the same in boomerang_interrupt(). Signed-off-by: Denys Vlasenko CC: David S. Miller CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org --- drivers/net/ethernet/3com/3c59x.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c index 41095eb..c11d6fc 100644 --- a/drivers/net/ethernet/3com/3c59x.c +++ b/drivers/net/ethernet/3com/3c59x.c @@ -2382,6 +2384,7 @@ boomerang_interrupt(int irq, void *dev_id) void __iomem *ioaddr; int status; int work_done = max_interrupt_work; + int handled = 0; ioaddr = vp->ioaddr; @@ -2400,6 +2403,7 @@ boomerang_interrupt(int irq, void *dev_id) if ((status & IntLatch) == 0) goto handler_exit; /* No interrupt: shared IRQs can cause this */ + handled = 1; if (status == 0x) { /* h/w no longer present (hotplug)? */ if (vortex_debug > 1) @@ -2501,7 +2505,7 @@ boomerang_interrupt(int irq, void *dev_id) handler_exit: vp->handling_irq = 0; spin_unlock(&vp->lock); - return IRQ_HANDLED; + return IRQ_RETVAL(handled); } static int vortex_rx(struct net_device *dev) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Wed, Apr 1, 2015 at 9:48 PM, David Miller wrote: > D.S. Ljungmark (1): > ipv6: Don't reduce hop limit for an interface https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6fd99094de2b83d1d4c8457f2c83483b2828e75a I was testing this change and apparently it doesn't close the hole. The python script I use to send RAs: #!/usr/bin/env python import sys import time import scapy.all from scapy.layers.inet6 import * ip = IPv6() # ip.dst = 'ff02::1' ip.dst = sys.argv[1] icmp = ICMPv6ND_RA() icmp.chlim = 1 for x in range(10): send(ip/icmp) time.sleep(1) # ./ipv6-hop-limit.py fe80::21e:37ff:fed0:5006 . Sent 1 packets. ...<10 times>... Sent 1 packets. After I do this, on the targeted machine I check hop_limits: # for f in /proc/sys/net/ipv6/conf/*/hop_limit; do echo -n $f:; cat $f; done /proc/sys/net/ipv6/conf/all/hop_limit:64 /proc/sys/net/ipv6/conf/default/hop_limit:64 /proc/sys/net/ipv6/conf/enp0s25/hop_limit:1 <=== THIS /proc/sys/net/ipv6/conf/lo/hop_limit:64 /proc/sys/net/ipv6/conf/wlp3s0/hop_limit:64 As you see, the interface which received RAs still lowered its hop_limit to 1. I take it means that the bug is still present (right? I'm not a network guy...). I triple-checked that I do run the kernel with the fix. Further investigation shows that the code touched by the fix is not even reached, hop_limit is changed elsewhere. I'm willing to test additional patches. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
On 04/17/2015 07:42 PM, Eric Dumazet wrote: > On Fri, 2015-04-17 at 19:05 +0200, Denys Vlasenko wrote: >> How do you expect one to find excessively large inlines, >> if not on allyesconfig build? > > Tuning kernel sources based on allyesconfig build _size_ only is > terrible. We could build an interpreter based kernel and maybe reduce > its size by 50% who knows... > > You are saying that all inline should be removed, since it is obvious > kernel size _will_ be smaller. I am not saying that. That would be stupid. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netns: remove BUG_ONs from net_generic()
This inline has ~500 callsites. On 04/14/2015 08:37 PM, David Miller wrote: > That BUG_ON() was added 7 years ago, and I don't remember it ever > triggering or helping us diagnose something, so just remove it and > keep the function inlined. On x86 allyesconfig build: text data bss dec hex filename 82447071 22255384 20627456 125329911 77861f7 vmlinux4 82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime Signed-off-by: Denys Vlasenko CC: Eric W. Biederman CC: David S. Miller CC: Jan Engelhardt CC: Jiri Pirko CC: linux-ker...@vger.kernel.org CC: netdev@vger.kernel.org --- include/net/netns/generic.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h index 0931618..70e1585 100644 --- a/include/net/netns/generic.h +++ b/include/net/netns/generic.h @@ -38,11 +38,9 @@ static inline void *net_generic(const struct net *net, int id) rcu_read_lock(); ng = rcu_dereference(net->gen); - BUG_ON(id == 0 || id > ng->len); ptr = ng->ptr[id - 1]; rcu_read_unlock(); - BUG_ON(!ptr); return ptr; } #endif -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
On 04/16/2015 02:38 PM, Eric Dumazet wrote: > On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote: > >> However, without BUG_ONs, function is still a bit big >> on PREEMPT configs. > > Only on allyesconfig builds, that nobody use but to prove some points > about code size. How do you expect one to find excessively large inlines, if not on allyesconfig build? Only by using allyesconfig, I can measure how many calls are there in the kernel. (grepping source is utterly unreliable due to nested inlines and macros). For the record: I am not using the _full_ allyesconfig, I do disable some debugging options which clearly aren't ever enabled on production systems. E.g. in my config: # CONFIG_DEBUG_KMEMLEAK_TEST is not set # CONFIG_KASAN is not set etc. > If you look at net_generic(), it is mostly used from code that is > normally in 3 modules. How many people really load them ? > > net/tipc : 91 call sites > net/sunrpc : 57 > fs/nfsd & fs/lockd : 183 > > Then few remaining uses in tunnels. Grepping is far from reliable. The above missed more than half of all calls. I disassembed vmlinux after deinlining, there are nearly 500 calls of net_generic(). > As we suggested, please just remove the BUG_ON(). Going to send the patch in a minute. -- vda -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netns: deinline net_generic()
Hi David, Eric, As you may have surmised, this patch wasn't a result of me looking at networking code; rather, it is a result of semi-automated search for huge inlines. The last step of this process would be the submission of patches. I am expecting a range of responses from maintainers: enthusiastic, "no reply", "go away, I don't care about code size", and everything in between. I can't invest a large amount of effort trying to push every deinlining patch through. If someone, for example, would flat out refuse to fix a huge inline in his driver, so be it. I will be happy to run at least a few iterations over a patch if maintainers do see a merit in deinlining, but have some reservations wrt performance. Your response falls into the latter case (you aren't refusing the patch outright, but want it to be changed in some way). It would help if you tell me how I should change the patches. (I also hope to have a semi-generic way of addressing performance concerns for future deinlining patch submissions.) On 04/14/2015 08:37 PM, David Miller wrote: > From: Denys Vlasenko > Date: Tue, 14 Apr 2015 14:25:11 +0200 > >> On x86 allyesconfig build: >> The function compiles to 130 bytes of machine code. >> It has 493 callsites. >> Total reduction of vmlinux size: 27906 bytes. >> >>text data bss dec hex filename >> 82447071 22255384 20627456 125329911 77861f7 vmlinux4 >> 82419165 22255384 20627456 125302005 777f4f5 vmlinux5 >> >> Signed-off-by: Denys Vlasenko > > That BUG_ON() was added 7 years ago, and I don't remember it ever > triggering or helping us diagnose something, so just remove it and > keep the function inlined. There are two BUG_ONs. I'll remove both of them in v2. Let me know if you want something else. However, without BUG_ONs, function is still a bit big on PREEMPT configs. Among almost 500 users, many probably aren't hot paths. How about having an inlined version, say, "fast_net_generic()", and a deinlined one, and using one or another as appropriate. Is this ok with you? On 04/14/2015 03:19 PM, Eric Dumazet wrote: > On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote: >> On x86 allyesconfig build: >> The function compiles to 130 bytes of machine code. >> It has 493 callsites. >> Total reduction of vmlinux size: 27906 bytes. >> >>text data bss dec hex filename >> 82447071 22255384 20627456 125329911 77861f7 vmlinux4 >> 82419165 22255384 20627456 125302005 777f4f5 vmlinux5 > > This sounds a big hammer to me. > > These savings probably comes from the BUG_ON() that could simply be > removed. > The second one for sure has no purpose. First one looks defensive. > > For a typical (non allyesconfig) kernel, net_generic() would translate > to : > > return net->gen[id - 1] > > Tunnels need this in fast path, so I presume we could introduce > net_generic_rcu() to keep this stuff inlined where it matters. > > static inline void *net_generic_rcu(const struct net *net, int id) > { > struct net_generic *ng = rcu_dereference(net->gen); > > return ng->ptr[id - 1]; > } I looked at the code and I'm not feeling confident enough to find places where replacing net_generic() with net_generic_rcu() is okay. Would it be okay if I add net_generic_rcu() as you suggest, but leave it to you (network people) to switch to it where appropriate? -- vda -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip[6]_tables.c: remove some inlines
On Monday 31 December 2007 17:00, Patrick McHardy wrote: > Denys Vlasenko wrote: > > ip[6]_tables.c seem to abuse inline. > > > > This patch removes inlines except those which are used > > by packet matching code and thus are performance-critical. > > > > Some people also consider the ruleset replacement path performance > critical, but overall I agree with your patch. I'm travelling currently > though so it will take a few days until I'll apply it. > > Do you have some numbers that show the actual difference these > changes make? Before: $ size */*/*/ip*tables*.o textdata bss dec hex filename 6402 500 1669181b06 net/ipv4/netfilter/ip_tables.o 7130 500 1676461dde net/ipv6/netfilter/ip6_tables.o After: $ size */*/*/ip*tables*.o textdata bss dec hex filename 6307 500 1668231aa7 net/ipv4/netfilter/ip_tables.o 7010 500 1675261d66 net/ipv6/netfilter/ip6_tables.o -- vda -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip[6]_tables.c: remove some inlines
On Friday 28 December 2007 15:30, Patrick McHardy wrote: > >> This clashes with my pending patches, which I'll push upstream > >> today. I also spent some time resyncing ip_tables and ip6_tables > >> so a diff of both (with some sed'ing) shows only the actual > >> differences, so please update ip6_tables as well. > > > > I would be happy to rediff the patch. > > > > Which tree should I sync against in order to not collide > > with your changes? > > The net-2.6.25.git tree contains all my changes. ip[6]_tables.c seem to abuse inline. This patch removes inlines except those which are used by packet matching code and thus are performance-critical. Please take this patch into netfilter queue. Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]> -- vda diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index f5b66ec..982b7f9 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -75,6 +75,7 @@ do {\ Hence the start of any table is given by get_table() below. */ /* Returns whether matches rule or not. */ +/* Performance critical - called for every packet */ static inline bool ip_packet_match(const struct iphdr *ip, const char *indev, @@ -153,7 +154,7 @@ ip_packet_match(const struct iphdr *ip, return true; } -static inline bool +static bool ip_checkentry(const struct ipt_ip *ip) { if (ip->flags & ~IPT_F_MASK) { @@ -183,8 +184,9 @@ ipt_error(struct sk_buff *skb, return NF_DROP; } -static inline -bool do_match(struct ipt_entry_match *m, +/* Performance critical - called for every packet */ +static inline bool +do_match(struct ipt_entry_match *m, const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -199,6 +201,7 @@ bool do_match(struct ipt_entry_match *m, return false; } +/* Performance critical */ static inline struct ipt_entry * get_entry(void *base, unsigned int offset) { @@ -206,6 +209,7 @@ get_entry(void *base, unsigned int offset) } /* All zeroes == unconditional rule. */ +/* Mildly perf critical (only if packet tracing is on) */ static inline int unconditional(const struct ipt_ip *ip) { @@ -221,7 +225,7 @@ unconditional(const struct ipt_ip *ip) #if defined(CONFIG_NETFILTER_XT_TARGET_TRACE) || \ defined(CONFIG_NETFILTER_XT_TARGET_TRACE_MODULE) -static const char *hooknames[] = { +static const char *const hooknames[] = { [NF_INET_PRE_ROUTING] = "PREROUTING", [NF_INET_LOCAL_IN] = "INPUT", [NF_INET_FORWARD] = "FORWARD", @@ -235,7 +239,7 @@ enum nf_ip_trace_comments { NF_IP_TRACE_COMMENT_POLICY, }; -static const char *comments[] = { +static const char *const comments[] = { [NF_IP_TRACE_COMMENT_RULE] = "rule", [NF_IP_TRACE_COMMENT_RETURN] = "return", [NF_IP_TRACE_COMMENT_POLICY] = "policy", @@ -251,6 +255,7 @@ static struct nf_loginfo trace_loginfo = { }, }; +/* Mildly perf critical (only if packet tracing is on) */ static inline int get_chainname_rulenum(struct ipt_entry *s, struct ipt_entry *e, char *hookname, char **chainname, @@ -567,7 +572,7 @@ mark_source_chains(struct xt_table_info *newinfo, return 1; } -static inline int +static int cleanup_match(struct ipt_entry_match *m, unsigned int *i) { if (i && (*i)-- == 0) @@ -579,7 +584,7 @@ cleanup_match(struct ipt_entry_match *m, unsigned int *i) return 0; } -static inline int +static int check_entry(struct ipt_entry *e, const char *name) { struct ipt_entry_target *t; @@ -600,7 +605,8 @@ check_entry(struct ipt_entry *e, const char *name) return 0; } -static inline int check_match(struct ipt_entry_match *m, const char *name, +static int +check_match(struct ipt_entry_match *m, const char *name, const struct ipt_ip *ip, unsigned int hookmask, unsigned int *i) { @@ -623,7 +629,7 @@ static inline int check_match(struct ipt_entry_match *m, const char *name, return ret; } -static inline int +static int find_check_match(struct ipt_entry_match *m, const char *name, const struct ipt_ip *ip, @@ -652,7 +658,7 @@ err: return ret; } -static inline int check_target(struct ipt_entry *e, const char *name) +static int check_target(struct ipt_entry *e, const char *name) { struct ipt_entry_target *t; struct xt_target *target; @@ -673,7 +679,7 @@ static inline int check_target(struct ipt_entry *e, const char *name) return ret; } -static inline int +static int find_check_entry(struct ipt_entry *e, const char *name, unsigned int size, unsigned int *i) { @@ -717,7 +723,7 @@ find_check_entry(struct ipt_entry *e, const char *name, unsigned int size, return ret; } -static inline int +static int check_entry_size_and_hooks(struct ipt_entry *e, struct xt_table_info *newinfo, unsigned char *base, @@ -760,7 +766,7
Re: [PATCH] net/ipv4/netfilter/ip_tables.c: remove some inlines
On Monday 17 December 2007 14:47, Patrick McHardy wrote: > Please CC netfilter-devel on netfilter patches. > > Denys Vlasenko wrote: > > Hi Patrick, Harald, > > > > I was working on unrelated problem and noticed that ip_tables.c > > seem to abuse inline. I prepared a patch which removes inlines > > except those which are used by packet matching code > > (and thus are really performance-critical). > > I added comments explaining that remaining inlines are > > performance critical. > > > > Result as reported by size: > > > >textdata bss dec hex filename > > - 6451 380 8869191b07 ip_tables.o > > + 6339 348 7267591a67 ip_tables.o > > > > Please take this patch into netfilter queue. > > This clashes with my pending patches, which I'll push upstream > today. I also spent some time resyncing ip_tables and ip6_tables > so a diff of both (with some sed'ing) shows only the actual > differences, so please update ip6_tables as well. I would be happy to rediff the patch. Which tree should I sync against in order to not collide with your changes? -- vda -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/ipv4/netfilter/ip_tables.c: remove some inlines
Hi Patrick, Harald, I was working on unrelated problem and noticed that ip_tables.c seem to abuse inline. I prepared a patch which removes inlines except those which are used by packet matching code (and thus are really performance-critical). I added comments explaining that remaining inlines are performance critical. Result as reported by size: textdata bss dec hex filename - 6451 380 8869191b07 ip_tables.o + 6339 348 7267591a67 ip_tables.o Please take this patch into netfilter queue. Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]> -- vda diff -urpN linux-2.6.org/net/ipv4/netfilter/ip_tables.c linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c --- linux-2.6.org/net/ipv4/netfilter/ip_tables.c 2007-12-14 10:46:37.0 -0800 +++ linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c 2007-12-16 12:37:46.0 -0800 @@ -74,6 +74,7 @@ do {\ Hence the start of any table is given by get_table() below. */ /* Returns whether matches rule or not. */ +/* Performance critical - called for every packet */ static inline int ip_packet_match(const struct iphdr *ip, const char *indev, @@ -152,7 +153,7 @@ ip_packet_match(const struct iphdr *ip, return 1; } -static inline bool +static bool ip_checkentry(const struct ipt_ip *ip) { if (ip->flags & ~IPT_F_MASK) { @@ -182,6 +183,7 @@ ipt_error(struct sk_buff *skb, return NF_DROP; } +/* Performance critical - called for every packet */ static inline bool do_match(struct ipt_entry_match *m, const struct sk_buff *skb, @@ -198,6 +200,7 @@ bool do_match(struct ipt_entry_match *m, return false; } +/* Performance critical */ static inline struct ipt_entry * get_entry(void *base, unsigned int offset) { @@ -205,6 +208,7 @@ get_entry(void *base, unsigned int offse } /* All zeroes == unconditional rule. */ +/* Mildly perf critical (only if packet tracing is on) */ static inline int unconditional(const struct ipt_ip *ip) { @@ -219,7 +223,7 @@ unconditional(const struct ipt_ip *ip) #if defined(CONFIG_NETFILTER_XT_TARGET_TRACE) || \ defined(CONFIG_NETFILTER_XT_TARGET_TRACE_MODULE) -static const char *hooknames[] = { +static const char *const hooknames[] = { [NF_IP_PRE_ROUTING] = "PREROUTING", [NF_IP_LOCAL_IN] = "INPUT", [NF_IP_FORWARD] = "FORWARD", @@ -233,7 +237,7 @@ enum nf_ip_trace_comments { NF_IP_TRACE_COMMENT_POLICY, }; -static const char *comments[] = { +static const char *const comments[] = { [NF_IP_TRACE_COMMENT_RULE] = "rule", [NF_IP_TRACE_COMMENT_RETURN] = "return", [NF_IP_TRACE_COMMENT_POLICY] = "policy", @@ -249,6 +253,7 @@ static struct nf_loginfo trace_loginfo = }, }; +/* Mildly perf critical (only if packet tracing is on) */ static inline int get_chainname_rulenum(struct ipt_entry *s, struct ipt_entry *e, char *hookname, char **chainname, @@ -567,7 +572,7 @@ mark_source_chains(struct xt_table_info return 1; } -static inline int +static int cleanup_match(struct ipt_entry_match *m, unsigned int *i) { if (i && (*i)-- == 0) @@ -579,7 +584,7 @@ cleanup_match(struct ipt_entry_match *m, return 0; } -static inline int +static int check_entry(struct ipt_entry *e, const char *name) { struct ipt_entry_target *t; @@ -599,7 +604,7 @@ check_entry(struct ipt_entry *e, const c return 0; } -static inline int check_match(struct ipt_entry_match *m, const char *name, +static int check_match(struct ipt_entry_match *m, const char *name, const struct ipt_ip *ip, unsigned int hookmask, unsigned int *i) { @@ -622,7 +627,7 @@ static inline int check_match(struct ipt return ret; } -static inline int +static int find_check_match(struct ipt_entry_match *m, const char *name, const struct ipt_ip *ip, @@ -651,7 +656,7 @@ err: return ret; } -static inline int check_target(struct ipt_entry *e, const char *name) +static int check_target(struct ipt_entry *e, const char *name) { struct ipt_entry_target *t; struct xt_target *target; @@ -672,7 +677,7 @@ static inline int check_target(struct ip return ret; } -static inline int +static int find_check_entry(struct ipt_entry *e, const char *name, unsigned int size, unsigned int *i) { @@ -716,7 +721,7 @@ find_check_entry(struct ipt_entry *e, co return ret; } -static inline int +static int check_entry_size_and_hooks(struct ipt_entry *e, struct xt_table_info *newinfo, unsigned char *base, @@ -759,7 +764,7 @@ check_entry_size_and_hooks(struct ipt_en return 0; } -static inline int +static int cleanup_entry(struct ipt_entry *e, unsigned int *i) { struct ipt_entry_target *t; @@ -1293,7 +1298,7 @@ __do_replace(const char *name, unsigned get_counters(oldinfo, counters); /* Decrease module usage counts and free resource */ loc_cpu_old_entry = oldinfo->entries[raw_smp_processor_id()]; - IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldi
Re: [BUG] New Kernel Bugs
On Wednesday 14 November 2007 00:27, Adrian Bunk wrote: > You missed the following in my email: > "we slowly scare them away due to the many bug reports without any > reaction." > > The problem is that bug reports take time. If you go away from easy > things like compile errors then even things like describing what does > no longer work, ideally producing a scenario where you can reproduce it > and verifying whether it was present in previous kernels can easily take > many hours that are spent before the initial bug report. > > If the bug report then gets ignored we discourage the person who sent > the bug report to do any work related to the kernel again. Cannot agree more. I am in a similar position right now. My patch to aic7xxx driver was ubmitted four times with not much reaction from scsi guys. Finally they replied and asked to rediff it against their git tree. I did that and sent patches back. No reply since then. And mind you, the patch is not trying to do anything complex, it mostly moves code around, removes 'inline', adds 'const'. What should I think about it? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tuesday 13 November 2007 11:57, Gabriel C wrote: > > The main problem is finding experienced developers who spend time on > > looking into bug reports. > > There are already. IMO the problem is the development model. > > There are tons new features in each new kernel release and 'tons new bugs' > which are not fixed during the release cycle nor in the .XX stable kernels. > > Maybe after XX kernel releases there should be one just with bug-fixes > _without_ any new features , eg: cleaning bugs from bugzilla , know > regressions , cleaning up code , removing broken drivers and the like. Won't work. You cannot force people to work on things they don't find interesting, long-term. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tuesday 13 November 2007 10:56, Adrian Bunk wrote: > On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote: > > On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote: > > > Btw, I used to test every -mm kernel. But since I've switched distros > > > (gentoo->ubuntu) > > > and I have less time, I feel it's harder to test -rc or -mm kernels (I > > > know this isn't a lkml problem > > > but more a distro problem, but I would love having an ubuntu blessed > > > repo with current dev kernel > > > for the latest stable ubuntu release). > > > > There are two parts to this. One is a Ubuntu development kernel which > > we can give to large numbers of people to expand our testing pool. > > But if we don't do a better job of responding to bug reports that > > would be generated by expanded testing this won't necessarily help us. > >... > > The main problem aren't missing testers [1] - we already have relatively > experienced people testing kernels and/or reporting bugs, and we slowly > scare them away due to the many bug reports without any reaction. > > The main problem is finding experienced developers who spend time on > looking into bug reports. > > Getting many relatively unexperienced users (who need more guidance for > debugging issues) as additional testers is therefore IMHO not > necessarily a good idea. And where experienced developrs are coming from? They are not born with Linux kernel skills. They grow up from within user base. Bigger user base -> more developers (eventually) -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tuesday 13 November 2007 07:08, Mark Lord wrote: > Ingo Molnar wrote: > .. > > > This is all QA-101 that _cannot be argued against on a rational basis_, > > it's just that these sorts of things have been largely ignored for > > years, in favor of the all-too-easy "open source means many eyeballs and > > that is our QA" answer, which is a _good_ answer but by far not the most > > intelligent answer! Today "many eyeballs" is simply not good enough and > > nature (and other OS projects) will route us around if we dont change. > > .. > > QA-101 and "many eyeballs" are not at all in opposition. > The latter is how we find out about bugs on uncommon hardware, > and the former is what we need to track them and overall quality. > > A HUGE problem I have with current "efforts", is that once someone > reports a bug, the onus seems to be 99% on the *reporter* to find > the exact line of code or commit. Ghad what a repressive method. This is the only method that scales. Developer has only 24 hours in each day, and sometimes he needs to eat, sleep, and maybe even pay attention to e.g. his kids. But bug reporters are much more numerous and they have more hours in one day combined. BUT - it means that developers should try to increase user base, not scare users away. > And if the "developer" who broke the damn thing, or who at least > "claims" to be supporting that code, cannot "reproduce" the bug, > they drop it completely. Developer should let reporter know that reporter needs to help a bit here. Sometimes a bit of hand holding is needed, but it pays off because you breed more qualified testers/bug reporters. > Contrast that flawed approach with how Linus does things.. > he thinks through the symptoms, matches them to the code, > and figures out what the few possibilities might be, > and feeds back some trial balloon patches for the bug reporter to try. > > MUCH better. > > And remember, *I'm* an old-time Linux kernel developer.. just think about > the people reporting bugs who haven't been around here since 1992.. Yes. Developers should not grow more and more unhelpful and arrogant towards their users just because inexperienced users send incomplete/poorly written bug reports. They need to provide help, not humiliate/ignore. I think we agree here. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 breaks powerpc mysteriously
On Friday 12 October 2007 05:09, Paul Mackerras wrote: > > I supposed a hacky fix is to add __KERNEL__ ifdef protection around > > zlib_inflate_blob() and those troublesome includes. A nicer fix is > > That would do, but I don't see why zlib_inflate_blob had to be added > to inflate.c rather than being in a new file under lib/zlib_inflate. Done. > If nothing else, that bloats configs that currently use inflate.c (in > the kernel) but don't need zlib_inflate_blob() - which would be all > the existing uses. :) I did it. However, in general I prefer to have a better linker, one which is capable of eliminate unused fuctions/data on the per-object basis, not per .o file. Patches to implement that are sent to Sam Ravnborg. Hopefully they will make it to mainline. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 breaks powerpc mysteriously
On Friday 12 October 2007 03:45, David Miller wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Thu, 11 Oct 2007 19:22:33 -0700 > > > With net-2.6.24 (pulled yesterday) applied: > > > > g5:/usr/src/25> ml arch/powerpc/boot/inflate.o > > Using ARCH=powerpc CROSS_COMPILE= > > CHK include/linux/version.h > > CHK include/linux/utsrelease.h > > CALLscripts/checksyscalls.sh > > BOOTCC arch/powerpc/boot/inflate.o > > arch/powerpc/boot/inflate.c:920:19: errno.h: No such file or directory > > arch/powerpc/boot/inflate.c:921:18: slab.h: No such file or directory > > arch/powerpc/boot/inflate.c:922:21: vmalloc.h: No such file or directory > > arch/powerpc/boot/inflate.c: In function `zlib_inflate_blob': > > arch/powerpc/boot/inflate.c:928: error: syntax error before '*' token > > The only thing we touched in zlib is in the patch below. > > I suspect the lib/zlib_inflate/inflate.c changes, I had no idea that > some pieces of code try to use this into userspace. > > I supposed a hacky fix is to add __KERNEL__ ifdef protection around > zlib_inflate_blob() and those troublesome includes. A nicer fix is > probably to change the zlib_inflate_blob() interface to pass in > pointers to alloc() and free() routines instead of calling kernel ones > directly. > > Denys? > > commit 8336793baf962163c9fab5a3f39614295fdbab27 > Author: Denys Vlasenko <[EMAIL PROTECTED]> > Date: Sun Sep 30 17:56:49 2007 -0700 > > [ZLIB]: Move bnx2 driver gzip unpacker into zlib. > > Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]> > Acked-by: Michael Chan <[EMAIL PROTECTED]> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> Please find updated patch in attachment. Compile-tested. -- vda --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -2761,48 +2761,6 @@ bnx2_set_rx_mode(struct net_device *dev) spin_unlock_bh(&bp->phy_lock); } -/* To be moved to generic lib/ */ -static int -bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len) -{ - struct z_stream_s *strm; - int rc; - - /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) - * is stripped */ - - rc = -ENOMEM; - strm = kmalloc(sizeof(*strm), GFP_KERNEL); - if (strm == NULL) - goto gunzip_nomem2; - strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); - if (strm->workspace == NULL) - goto gunzip_nomem3; - - strm->next_in = zbuf; - strm->avail_in = len; - strm->next_out = gunzip_buf; - strm->avail_out = sz; - - rc = zlib_inflateInit2(strm, -MAX_WBITS); - if (rc == Z_OK) { - rc = zlib_inflate(strm, Z_FINISH); - /* after Z_FINISH, only Z_STREAM_END is "we unpacked it all" */ - if (rc == Z_STREAM_END) - rc = sz - strm->avail_out; - else - rc = -EINVAL; - zlib_inflateEnd(strm); - } else - rc = -EINVAL; - - kfree(strm->workspace); -gunzip_nomem3: - kfree(strm); -gunzip_nomem2: - return rc; -} - static void load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len, u32 rv2p_proc) @@ -2858,7 +2816,7 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, struct fw_info *fw) text = vmalloc(FW_BUF_SIZE); if (!text) return -ENOMEM; - rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len); + rc = zlib_inflate_blob(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len); if (rc < 0) { vfree(text); return rc; @@ -2935,14 +2893,14 @@ bnx2_init_cpus(struct bnx2 *bp) text = vmalloc(FW_BUF_SIZE); if (!text) return -ENOMEM; - rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1)); + rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1)); if (rc < 0) { vfree(text); goto init_cpu_err; } load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1); - rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2)); + rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2)); if (rc < 0) { vfree(text); goto init_cpu_err; --- a/include/linux/zlib.h +++ b/include/linux/zlib.h @@ -82,7 +82,7 @@ struct internal_state; typedef struct z_stream_s { -Byte*next_in; /* next input byte */ +const Byte *next_in; /* next input byte */ uInt avail_in; /* number of bytes available at next_in */ uLongtotal_in; /* total nb of input bytes read so far */ @@ -699,4 +699,8 @@ extern int zlib_inflateInit2(z_streamp strm, int windowBits); struct internal_state {int dummy;}; /* hack for buggy compilers */ #endif +/* Utility function: initialize zlib, unpack binary blob, clean up zlib, + * return len or negative error code. */ +extern int zlib_inflate_blob(void *dst, unsigned dst_sz, const void *src, unsigned src_sz); + #endif /* _ZLIB_H */ --- a/lib/zlib_inflate/i
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote: > > On Fri, 21 Sep 2007, Denys Vlasenko wrote: > > > On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote: > >> On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said: > >> > >>> I plan to use gzip compression on following drivers' firmware, > >>> if patches will be accepted: > >>> > >>>textdata bss dec hex filename > >>> 17653 109968 240 127861 1f375 drivers/net/acenic.o > >>>6628 120448 4 127080 1f068 drivers/net/dgrs.o > >>> ^^ > >> > >> Should this be redone to use the existing firmware loading framework to > >> load the firmware instead? > > > > Not in every case. > > > > For example, bnx2 maintainer says that driver and > > firmware are closely tied for his driver. IOW: you upgrade kernel > > and your NIC is not working anymore. > > Firmware may come with a kernel. We have a "install modules", we can also > add "install firmware". Install where? I boot my machine over NFS, and it has no hard drive. > > Another argument is to make kernel be able to bring up NICs > > without needing firmware images in initramfs/initrd/hard drive. > > It is not possible to bring up things like FC or WiFi without firmware, > what special is in classic NICs? Nothing. It is just not (yet?) decreed from The Very Top that all and every firmware image should be loaded using request_firmware(). Also people may want to gzip something else than firmware. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 21:13, Andi Kleen wrote: > Denys Vlasenko <[EMAIL PROTECTED]> writes: > > > > I plan to use gzip compression on following drivers' firmware, > > if patches will be accepted: > > > >textdata bss dec hex filename > > 17653 109968 240 127861 1f375 drivers/net/acenic.o > >6628 120448 4 127080 1f068 drivers/net/dgrs.o > > ^^ > > Just change the makefiles to always install gzip'ed modules > modutils knows how to unzip them on the fly. But I compile net/* into bzImage. I like netbooting :) -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote: > On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said: > > > I plan to use gzip compression on following drivers' firmware, > > if patches will be accepted: > > > >textdata bss dec hex filename > > 17653 109968 240 127861 1f375 drivers/net/acenic.o > >6628 120448 4 127080 1f068 drivers/net/dgrs.o > > ^^ > > Should this be redone to use the existing firmware loading framework to > load the firmware instead? Not in every case. For example, bnx2 maintainer says that driver and firmware are closely tied for his driver. IOW: you upgrade kernel and your NIC is not working anymore. Another argument is to make kernel be able to bring up NICs without needing firmware images in initramfs/initrd/hard drive. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 18:49, David Miller wrote: > From: Denys Vlasenko <[EMAIL PROTECTED]> > Date: Fri, 21 Sep 2007 18:03:55 +0100 > > > Do patches look ok to you? > > I'm travelling so I haven't looked closely yet :-) > > Michael can take a look and I'll try to do so as well > tonight. Good. I plan to use gzip compression on following drivers' firmware, if patches will be accepted: textdata bss dec hex filename 17653 109968 240 127861 1f375 drivers/net/acenic.o 6628 120448 4 127080 1f068 drivers/net/dgrs.o ^^ -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bnx2: factor out gzip unpacker
On Friday 21 September 2007 17:14, David Miller wrote: > From: Denys Vlasenko <[EMAIL PROTECTED]> > Date: Fri, 21 Sep 2007 12:01:24 +0100 > > > Hi Jeff, > > BNX2 and TG3 patches goes through Michael Chan and myself, > and I usually merge them in instead of Jeff. Didn't know that, sorry. Do patches look ok to you? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] bnx2: move gzip unpacker to zlib
On Friday 21 September 2007 12:01, Denys Vlasenko wrote: > I will move this code > out of the driver and into zlib in follow-on patch. No, I won't. I accidentally attached both patches to first email, you can find it there. Sorry. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] bnx2: factor out gzip unpacker
Hi Jeff, This patch modifies gzip unpacking code in bnx2 driver so that it does not depend on bnx2 internals. I will move this code out of the driver and into zlib in follow-on patch. It can be useful in other drivers which need to store firmwares or any other relatively big binary blobs - fonts, cursor bitmaps, whatever. Patch is run tested by Michael Chan (driver author). Michael, can you add your ACK? Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]> -- vda diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c --- linux-2.6.23-rc6/drivers/net/bnx2.c 2007-09-14 00:08:11.0 +0100 +++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c 2007-09-21 11:44:03.0 +0100 @@ -52,6 +52,8 @@ #include "bnx2_fw.h" #include "bnx2_fw2.h" +#define FW_BUF_SIZE 0x8000 + #define DRV_MODULE_NAME "bnx2" #define PFX DRV_MODULE_NAME ": " #define DRV_MODULE_VERSION "1.6.4" @@ -2767,89 +2769,45 @@ bnx2_set_rx_mode(struct net_device *dev) spin_unlock_bh(&bp->phy_lock); } -#define FW_BUF_SIZE 0x8000 - +/* To be moved to generic lib/ */ static int -bnx2_gunzip_init(struct bnx2 *bp) +bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len) { - if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL) - goto gunzip_nomem1; + struct z_stream_s *strm; + int rc; - if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL) - goto gunzip_nomem2; + /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) + * is stripped */ - bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); - if (bp->strm->workspace == NULL) + rc = -ENOMEM; + strm = kmalloc(sizeof(*strm), GFP_KERNEL); + if (strm == NULL) + goto gunzip_nomem2; + strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); + if (strm->workspace == NULL) goto gunzip_nomem3; - return 0; + strm->next_in = zbuf; + strm->avail_in = len; + strm->next_out = gunzip_buf; + strm->avail_out = sz; + + rc = zlib_inflateInit2(strm, -MAX_WBITS); + if (rc == Z_OK) { + rc = zlib_inflate(strm, Z_FINISH); + /* after Z_FINISH, only Z_STREAM_END is "we unpacked it all" */ + if (rc == Z_STREAM_END) + rc = sz - strm->avail_out; + else + rc = -EINVAL; + zlib_inflateEnd(strm); + } else + rc = -EINVAL; + kfree(strm->workspace); gunzip_nomem3: - kfree(bp->strm); - bp->strm = NULL; - + kfree(strm); gunzip_nomem2: - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - -gunzip_nomem1: - printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for " - "uncompression.\n", bp->dev->name); - return -ENOMEM; -} - -static void -bnx2_gunzip_end(struct bnx2 *bp) -{ - kfree(bp->strm->workspace); - - kfree(bp->strm); - bp->strm = NULL; - - if (bp->gunzip_buf) { - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - } -} - -static int -bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen) -{ - int n, rc; - - /* check gzip header */ - if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED)) - return -EINVAL; - - n = 10; - -#define FNAME 0x8 - if (zbuf[3] & FNAME) - while ((zbuf[n++] != 0) && (n < len)); - - bp->strm->next_in = zbuf + n; - bp->strm->avail_in = len - n; - bp->strm->next_out = bp->gunzip_buf; - bp->strm->avail_out = FW_BUF_SIZE; - - rc = zlib_inflateInit2(bp->strm, -MAX_WBITS); - if (rc != Z_OK) - return rc; - - rc = zlib_inflate(bp->strm, Z_FINISH); - - *outlen = FW_BUF_SIZE - bp->strm->avail_out; - *outbuf = bp->gunzip_buf; - - if ((rc != Z_OK) && (rc != Z_STREAM_END)) - printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n", - bp->dev->name, bp->strm->msg); - - zlib_inflateEnd(bp->strm); - - if (rc == Z_STREAM_END) - return 0; - return rc; } @@ -2902,22 +2860,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_ /* Load the Text area. */ offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base); if (fw->gz_text) { - u32 text_len; - void *text; - - rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text, - &text_len); - if (rc) - return rc; - - fw->text = text; - } - if (fw->gz_text) { + u32 *text; int j; + text = vmalloc(FW_BUF_SIZE); + if (!text) + return -ENOMEM; + rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len); + if (rc < 0) { + vfree(text); + return rc; + } for (j = 0; j < (fw->text_len / 4); j++, offset += 4) { - REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j])); + REG_WR_IND(bp, offset, cpu_to_le32(text[j])); } + vfree(text); } /* Load the Data area. */ @@ -2979,27 +2936,27 @@ bnx2_init_cpus(struct bnx2 *bp) { struct cpu_reg cpu_reg; struct fw_info *fw; - int rc = 0; + int rc; void *text; - u32 text_len; - -
Re: bnx2 dirver's firmware images
On Wednesday 19 September 2007 22:43, Michael Chan wrote: > On Wed, 2007-09-19 at 21:29 +0100, Denys Vlasenko wrote: > > > Are you saying that you successfully run-tested it? > > I've only reviewed the code. Let's resolve these issues first before > testing the code. Please test these two patches. I updated them according to your comments. -- vda diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c --- linux-2.6.23-rc6/drivers/net/bnx2.c 2007-09-14 00:08:11.0 +0100 +++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c 2007-09-20 15:47:06.0 +0100 @@ -52,6 +52,8 @@ #include "bnx2_fw.h" #include "bnx2_fw2.h" +#define FW_BUF_SIZE 0x8000 + #define DRV_MODULE_NAME "bnx2" #define PFX DRV_MODULE_NAME ": " #define DRV_MODULE_VERSION "1.6.4" @@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev) spin_unlock_bh(&bp->phy_lock); } -#define FW_BUF_SIZE 0x8000 - +/* To be moved to generic lib/ */ static int -bnx2_gunzip_init(struct bnx2 *bp) +bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf) { - if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL) - goto gunzip_nomem1; + struct z_stream_s *strm; + int rc; - if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL) - goto gunzip_nomem2; + /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) + * is stripped */ - bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); - if (bp->strm->workspace == NULL) + rc = -ENOMEM; + strm = kmalloc(sizeof(*strm), GFP_KERNEL); + if (strm == NULL) + goto gunzip_nomem2; + strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); + if (strm->workspace == NULL) goto gunzip_nomem3; - return 0; + strm->next_in = zbuf; + strm->avail_in = len; + strm->next_out = gunzip_buf; + strm->avail_out = sz; + + rc = zlib_inflateInit2(strm, -MAX_WBITS); + if (rc == Z_OK) { + rc = zlib_inflate(strm, Z_FINISH); + if (rc == Z_OK) + rc = sz - strm->avail_out; + else + rc = -EINVAL; + zlib_inflateEnd(strm); + } else + rc = -EINVAL; + kfree(strm->workspace); gunzip_nomem3: - kfree(bp->strm); - bp->strm = NULL; - + kfree(strm); gunzip_nomem2: - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - -gunzip_nomem1: - printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for " - "uncompression.\n", bp->dev->name); - return -ENOMEM; -} - -static void -bnx2_gunzip_end(struct bnx2 *bp) -{ - kfree(bp->strm->workspace); - - kfree(bp->strm); - bp->strm = NULL; - - if (bp->gunzip_buf) { - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - } -} - -static int -bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen) -{ - int n, rc; - - /* check gzip header */ - if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED)) - return -EINVAL; - - n = 10; - -#define FNAME 0x8 - if (zbuf[3] & FNAME) - while ((zbuf[n++] != 0) && (n < len)); - - bp->strm->next_in = zbuf + n; - bp->strm->avail_in = len - n; - bp->strm->next_out = bp->gunzip_buf; - bp->strm->avail_out = FW_BUF_SIZE; - - rc = zlib_inflateInit2(bp->strm, -MAX_WBITS); - if (rc != Z_OK) - return rc; - - rc = zlib_inflate(bp->strm, Z_FINISH); - - *outlen = FW_BUF_SIZE - bp->strm->avail_out; - *outbuf = bp->gunzip_buf; - - if ((rc != Z_OK) && (rc != Z_STREAM_END)) - printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n", - bp->dev->name, bp->strm->msg); - - zlib_inflateEnd(bp->strm); - - if (rc == Z_STREAM_END) - return 0; - return rc; } @@ -2902,22 +2859,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_ /* Load the Text area. */ offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base); if (fw->gz_text) { - u32 text_len; - void *text; - - rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text, - &text_len); - if (rc) - return rc; - - fw->text = text; - } - if (fw->gz_text) { + u32 *text; int j; + text = vmalloc(FW_BUF_SIZE); + if (!text) + return -ENOMEM; + rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len); + if (rc < 0) { + vfree(text); + return rc; + } for (j = 0; j < (fw->text_len / 4); j++, offset += 4) { - REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j])); + REG_WR_IND(bp, offset, cpu_to_le32(text[j])); } + vfree(text); } /* Load the Data area. */ @@ -2979,27 +2935,27 @@ bnx2_init_cpus(struct bnx2 *bp) { struct cpu_reg cpu_reg; struct fw_info *fw; - int rc = 0; + int rc; void *text; - u32 text_len; - - if ((rc = bnx2_gunzip_init(bp)) != 0) - return rc; /* Initialize the RV2P processor. */ - rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), &t
Re: bnx2 dirver's firmware images
On Wednesday 19 September 2007 22:00, Michael Chan wrote: > On Wed, 2007-09-19 at 09:30 +0100, Denys Vlasenko wrote: > + /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) > +* is stripped, 32-bit unpacked size (LE) is prepended instead */ > + sz = *zbuf++; > + sz = (sz << 8) + *zbuf++; > + sz = (sz << 8) + *zbuf++; > + sz = (sz << 8) + *zbuf++; > > I don't have a problem with removing the gzip header. It doesn't > contain very useful information other than a valid header for sanity > check. But I don't think we need to arbitrarily add the unpacked size > in front of the gzipped data. The driver knows the size (e.g. the size > of RAM on the chip) and should pass it to the call. The driver should > also allocate the memory for the unpacked data instead of allocating the > memory inside the call and freeing it by the caller. For example, the > driver may need to use pci_alloc_consistent() if the firmware is to be > DMA'ed to the chip. > > Other than that, everything else looks fine. Thanks. Are you saying that you successfully run-tested it? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bnx2 dirver's firmware images
On Tuesday 18 September 2007 21:05, Michael Chan wrote: > The bnx2 firmware changes quite frequently. A new driver quite often > requires new firmware to work correctly. Splitting them up makes things > difficult for the user. sounds reasonable. I see that bnx2 has support for unpacking gzipped binary blobs, and it it the only such net driver (maybe the only such driver in the entire tree, I didn't check). This can be very useful for all other firmware images in other drivers. Last night I prepared a patch which basically separates unpacking function from bnx2-related code. Can you run-test attached patch? Meanwhile I will prepare follow-on patch which actually moves this function out of the driver and into lib/*. Size difference: # size */bn*.o textdata bss dec hex filename 54884 816896424 142997 22e95 net/bnx2.o 55276 818236424 143523 230a3 net.org/bnx2.o Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]> -- vda --- linux-2.6.23-rc6.org/drivers/net/bnx2.c Tue Sep 11 22:33:54 2007 +++ linux-2.6.23-rc6.gunzip/drivers/net/bnx2.c Wed Sep 19 00:01:19 2007 @@ -2767,93 +2767,61 @@ spin_unlock_bh(&bp->phy_lock); } -#define FW_BUF_SIZE 0x8000 - +/* To be moved to generic lib/ */ static int -bnx2_gunzip_init(struct bnx2 *bp) +bnx2_gunzip(u8 *zbuf, int len, void **outbuf) { - if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL) - goto gunzip_nomem1; + struct z_stream_s *strm; + void *gunzip_buf; + int rc; + int sz; - if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL) - goto gunzip_nomem2; + /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename) + * is stripped, 32-bit unpacked size (LE) is prepended instead */ + sz = *zbuf++; + sz = (sz << 8) + *zbuf++; + sz = (sz << 8) + *zbuf++; + sz = (sz << 8) + *zbuf++; - bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); - if (bp->strm->workspace == NULL) + rc = -ENOMEM; + gunzip_buf = vmalloc(sz); + if (gunzip_buf == NULL) + goto gunzip_nomem1; + strm = kmalloc(sizeof(*strm), GFP_KERNEL); + if (strm == NULL) + goto gunzip_nomem2; + strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL); + if (strm->workspace == NULL) goto gunzip_nomem3; - return 0; + strm->next_in = zbuf; + strm->avail_in = len; + strm->next_out = gunzip_buf; + strm->avail_out = sz; -gunzip_nomem3: - kfree(bp->strm); - bp->strm = NULL; + rc = zlib_inflateInit2(strm, -MAX_WBITS); + if (rc == Z_OK) { + rc = zlib_inflate(strm, Z_FINISH); + if (rc == Z_OK) { + rc = sz - strm->avail_out; + *outbuf = gunzip_buf; + } else + rc = -EINVAL; + zlib_inflateEnd(strm); + } else + rc = -EINVAL; + kfree(strm->workspace); +gunzip_nomem3: + kfree(strm); gunzip_nomem2: - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - + if (rc != Z_OK) + vfree(gunzip_buf); gunzip_nomem1: - printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for " - "uncompression.\n", bp->dev->name); - return -ENOMEM; + return rc; /* returns Z_OK (0) if successful */ } static void -bnx2_gunzip_end(struct bnx2 *bp) -{ - kfree(bp->strm->workspace); - - kfree(bp->strm); - bp->strm = NULL; - - if (bp->gunzip_buf) { - vfree(bp->gunzip_buf); - bp->gunzip_buf = NULL; - } -} - -static int -bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen) -{ - int n, rc; - - /* check gzip header */ - if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED)) - return -EINVAL; - - n = 10; - -#define FNAME 0x8 - if (zbuf[3] & FNAME) - while ((zbuf[n++] != 0) && (n < len)); - - bp->strm->next_in = zbuf + n; - bp->strm->avail_in = len - n; - bp->strm->next_out = bp->gunzip_buf; - bp->strm->avail_out = FW_BUF_SIZE; - - rc = zlib_inflateInit2(bp->strm, -MAX_WBITS); - if (rc != Z_OK) - return rc; - - rc = zlib_inflate(bp->strm, Z_FINISH); - - *outlen = FW_BUF_SIZE - bp->strm->avail_out; - *outbuf = bp->gunzip_buf; - - if ((rc != Z_OK) && (rc != Z_STREAM_END)) - printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n", - bp->dev->name, bp->strm->msg); - - zlib_inflateEnd(bp->strm); - - if (rc == Z_STREAM_END) - return 0; - - return rc; -} - -static void load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len, u32 rv2p_proc) { @@ -2902,22 +2870,16 @@ /* Load the Text area. */ offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base); if (fw->gz_text) { - u32 text_len; - void *text; - - rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text, - &text_len); - if (rc) - return rc; - - fw->text = text; - } - if (fw->gz_text) { + u32 *text; int j; + rc = bnx2_gunzip(fw->gz_text, fw->gz_text_len, (void*) &text); + if (rc < 0) + return rc;
Re: bnx2 dirver's firmware images
On Tuesday 18 September 2007 19:45, Michael Chan wrote: > We can compress all the different sections of the firmware. Currently, > we only compress the biggest chunks and the rest are uncompressed. > These zeros should compress to almost nothing. But I agree that the > firmware is still big. You don't need to store and fetch zeros at all. You *know* that they are zeros, right? So do this: - REG_WR_IND(bp, offset, fw->bss[j]); + REG_WR_IND(bp, offset, 0); -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 16:09, Linus Torvalds wrote: > On Mon, 10 Sep 2007, Denys Vlasenko wrote: > > static inline int > > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) > > { > > int return_status = QLA_SUCCESS; > > unsigned long loop_timeout ; > > scsi_qla_host_t *pha = to_qla_parent(ha); > > > > /* wait for 5 min at the max for loop to be ready */ > > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); > > > > while ((!atomic_read(&pha->loop_down_timer) && > > atomic_read(&pha->loop_state) == LOOP_DOWN) || > > atomic_read(&pha->loop_state) != LOOP_READY) { > > if (atomic_read(&pha->loop_state) == LOOP_DEAD) { > ... > > Is above correct or buggy? Correct, because msleep is a barrier. > > Is it obvious? No. > > It's *buggy*. But it has nothing to do with any msleep() in the loop, or > anything else. > > And more importantly, it would be equally buggy even *with* a "volatile" > atomic_read(). I am not saying that this code is okay, this isn't the point. (The code is in fact awful for several more reasons). My point is that people are confused as to what atomic_read() exactly means, and this is bad. Same for cpu_relax(). First one says "read", and second one doesn't say "barrier". This is real code from current kernel which demonstrates this: "I don't know that cpu_relax() is a barrier already": drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } "I think that atomic_read() is a read from memory and therefore I don't need a barrier": arch/x86_64/kernel/crash.c msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } Since neither camp seems to give up, I am proposing renaming them to something less confusing, and make everybody happy. cpu_relax_barrier() atomic_value(&x) atomic_fetch(&x) I'm not native English speaker, do these sound better? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 15:51, Arjan van de Ven wrote: > On Mon, 10 Sep 2007 11:56:29 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > and this I would say is buggy code all the way. > > Not from a pure C level semantics, but from a "busy waiting is buggy" > semantics level and a "I'm inventing my own locking" semantics level. After inspecting arch/*, I cannot agree with you. Otherwise almost all major architectures use "conceptually buggy busy-waiting": arch/alpha arch/i386 arch/ia64 arch/m32r arch/mips arch/parisc arch/powerpc arch/sh arch/sparc64 arch/um arch/x86_64 All of the above contain busy-waiting on atomic_read. Including these loops without barriers: arch/mips/kernel/smtc.c while (atomic_read(&idle_hook_initialized) < 1000) ; arch/mips/sgi-ip27/ip27-nmi.c while (atomic_read(&nmied_cpus) != num_online_cpus()); [Well maybe num_online_cpus() is a barrier, I didn't check] arch/sh/kernel/smp.c if (wait) while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1)); Bugs? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 14:38, Denys Vlasenko wrote: > You are basically trying to educate me how to use atomic properly. > You don't need to do it, as I am (currently) not a driver author. > > I am saying that people who are already using atomic_read() > (and who unfortunately did not read your explanation above) > will still sometimes use atomic_read() as a way to read atomic value > *from memory*, and will create nasty heisenbugs for you to debug. static inline int qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha) { int return_status = QLA_SUCCESS; unsigned long loop_timeout ; scsi_qla_host_t *pha = to_qla_parent(ha); /* wait for 5 min at the max for loop to be ready */ loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ); while ((!atomic_read(&pha->loop_down_timer) && atomic_read(&pha->loop_state) == LOOP_DOWN) || atomic_read(&pha->loop_state) != LOOP_READY) { if (atomic_read(&pha->loop_state) == LOOP_DEAD) { return_status = QLA_FUNCTION_FAILED; break; } msleep(1000); if (time_after_eq(jiffies, loop_timeout)) { return_status = QLA_FUNCTION_FAILED; break; } } return (return_status); } Is above correct or buggy? Correct, because msleep is a barrier. Is it obvious? No. static void qla2x00_rst_aen(scsi_qla_host_t *ha) { if (ha->flags.online && !ha->flags.reset_active && !atomic_read(&ha->loop_down_timer) && !(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) { do { clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags); /* * Issue marker command only when we are going to start * the I/O. */ ha->marker_needed = 1; } while (!atomic_read(&ha->loop_down_timer) && (test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags))); } } Is above correct? I honestly don't know. Correct, because set_bit is a barrier on _all _memory_? Will it break if set_bit will be changed to be a barrier only on its operand? Probably yes. drivers/kvm/kvm_main.c while (atomic_read(&completed) != needed) { cpu_relax(); barrier(); } Obviously author did not know that cpu_relax is already a barrier. See why I think driver authors will be confused? arch/x86_64/kernel/crash.c static void nmi_shootdown_cpus(void) { ... msecs = 1000; /* Wait at most a second for the other cpus to stop */ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { mdelay(1); msecs--; } ... } Is mdelay(1) a barrier? Yes, because it is a function on x86_64. Absolutely the same code will be buggy on an arch where mdelay(1) == udelay(1000), and udelay is implemented as inline busy-wait. arch/sparc64/kernel/smp.c /* Wait for response */ while (atomic_read(&data.finished) != cpus) cpu_relax(); ...later in the same file... while (atomic_read(&smp_capture_registry) != ncpus) rmb(); I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()? (No it doesn't). Which of those two while loops needs correcting? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Monday 10 September 2007 13:22, Kyle Moffett wrote: > On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote: > > On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > >> On Sun, 9 Sep 2007 19:02:54 +0100 > >> Denys Vlasenko <[EMAIL PROTECTED]> wrote: > >> > >>> Why is all this fixation on "volatile"? I don't think people want > >>> "volatile" keyword per se, they want atomic_read(&x) to _always_ > >>> compile into an memory-accessing instruction, not register access. > >> > >> and ... why is that? is there any valid, non-buggy code sequence > >> that makes that a reasonable requirement? > > > > Well, if you insist on having it again: > > > > Waiting for atomic value to be zero: > > > > while (atomic_read(&x)) > > continue; > > > > gcc may happily convert it into: > > > > reg = atomic_read(&x); > > while (reg) > > continue; > > Bzzt. Even if you fixed gcc to actually convert it to a busy loop on > a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that > does the conversion, it may be that the CPU does the caching of the > memory value. GCC has no mechanism to do cache-flushes or memory- > barriers except through our custom inline assembly. CPU can cache the value all right, but it cannot use that cached value *forever*, it has to react to invalidate cycles on the shared bus and re-fetch new data. IOW: atomic_read(&x) which compiles down to memory accessor will work properly. > the CPU. Thirdly, on a large system it may take some arbitrarily > large amount of time for cache-propagation to update the value of the > variable in your local CPU cache. Yes, but "arbitrarily large amount of time" is actually measured in nanoseconds here. Let's say 1000ns max for hundreds of CPUs? > Also, you > probably want a cpu_relax() in there somewhere to avoid overheating > the CPU. Yes, but 1. CPU shouldn't overheat (in a sense that it gets damaged), it will only use more power than needed. 2. cpu_relax() just throttles down my CPU, so it's performance optimization only. Wait, it isn't, it's a barrier too. Wow, "cpu_relax" is a barrier? How am I supposed to know that without reading lkml flamewars and/or header files? Let's try reading headers. asm-x86_64/processor.h: #define cpu_relax() rep_nop() So, is it a barrier? No clue yet. /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ static inline void rep_nop(void) { __asm__ __volatile__("rep;nop": : :"memory"); } Comment explicitly says that it is "a good thing" (doesn't say that it is mandatory) and says NOTHING about barriers! Barrier-ness is not mentioned and is hidden in "memory" clobber. Do you think it's obvious enough for average driver writer? I think not, especially that it's unlikely for him to even start suspecting that it is a memory barrier based on the "cpu_relax" name. > You simply CANNOT use an atomic_t as your sole synchronizing > primitive, it doesn't work! You virtually ALWAYS want to use an > atomic_t in the following types of situations: > > (A) As an object refcount. The value is never read except as part of > an atomic_dec_return(). Why aren't you using "struct kref"? > > (B) As an atomic value counter (number of processes, for example). > Just "reading" the value is racy anyways, if you want to enforce a > limit or something then use atomic_inc_return(), check the result, > and use atomic_dec() if it's too big. If you just want to return the > statistics then you are going to be instantaneous-point-in-time anyways. > > (C) As an optimization value (statistics-like, but exact accuracy > isn't important). > > Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like > completions, mutexes, semaphores, spinlocks, krefs, etc. It's not > useful for synchronization, only for keeping track of simple integer > RMW values. Note that atomic_read() and atomic_set() aren't very > useful RMW primitives (read-nomodify-nowrite and read-set-zero- > write). Code which assumes anything else is probably buggy in other > ways too. You are basically trying to educate me how to use atomic properly. You don't need to do it, as I am (currently) not a driver author. I am saying that people who are already using atomic_read() (and who unfortunately did not read your explanation above) will still sometimes use atomic_read() as a way to read atomic value *from memory*, and will create nasty heisenbugs for you to debug. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Sunday 09 September 2007 19:18, Arjan van de Ven wrote: > On Sun, 9 Sep 2007 19:02:54 +0100 > Denys Vlasenko <[EMAIL PROTECTED]> wrote: > > > Why is all this fixation on "volatile"? I don't think > > people want "volatile" keyword per se, they want atomic_read(&x) to > > _always_ compile into an memory-accessing instruction, not register > > access. > > and ... why is that? > is there any valid, non-buggy code sequence that makes that a > reasonable requirement? Well, if you insist on having it again: Waiting for atomic value to be zero: while (atomic_read(&x)) continue; gcc may happily convert it into: reg = atomic_read(&x); while (reg) continue; Expecting every driver writer to remember that atomic_read is not in fact a "read from memory" is naive. That won't happen. Face it, majority of driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;) The name of the macro is saying that it's a read. We are confusing users here. It's doubly confusing that cpy_relax(), which says _nothing_ about barriers in its name, is actually a barrier you need to insert here. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Friday 17 August 2007 17:48, Linus Torvalds wrote: > > On Fri, 17 Aug 2007, Nick Piggin wrote: > > > > That's not obviously just taste to me. Not when the primitive has many > > (perhaps, the majority) of uses that do not require said barriers. And > > this is not solely about the code generation (which, as Paul says, is > > relatively minor even on x86). I prefer people to think explicitly > > about barriers in their lockless code. > > Indeed. > > I think the important issues are: > > - "volatile" itself is simply a badly/weakly defined issue. The semantics >of it as far as the compiler is concerned are really not very good, and >in practice tends to boil down to "I will generate so bad code that >nobody can accuse me of optimizing anything away". > > - "volatile" - regardless of how well or badly defined it is - is purely >a compiler thing. It has absolutely no meaning for the CPU itself, so >it at no point implies any CPU barriers. As a result, even if the >compiler generates crap code and doesn't re-order anything, there's >nothing that says what the CPU will do. > > - in other words, the *only* possible meaning for "volatile" is a purely >single-CPU meaning. And if you only have a single CPU involved in the >process, the "volatile" is by definition pointless (because even >without a volatile, the compiler is required to make the C code appear >consistent as far as a single CPU is concerned). > > So, let's take the example *buggy* code where we use "volatile" to wait > for other CPU's: > > atomic_set(&var, 0); > while (!atomic_read(&var)) > /* nothing */; > > > which generates an endless loop if we don't have atomic_read() imply > volatile. > > The point here is that it's buggy whether the volatile is there or not! > Exactly because the user expects multi-processing behaviour, but > "volatile" doesn't actually give any real guarantees about it. Another CPU > may have done: > > external_ptr = kmalloc(..); > /* Setup is now complete, inform the waiter */ > atomic_inc(&var); > > but the fact is, since the other CPU isn't serialized in any way, the > "while-loop" (even in the presense of "volatile") doesn't actually work > right! Whatever the "atomic_read()" was waiting for may not have > completed, because we have no barriers! Why is all this fixation on "volatile"? I don't think people want "volatile" keyword per se, they want atomic_read(&x) to _always_ compile into an memory-accessing instruction, not register access. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Friday 24 August 2007 18:06, Christoph Lameter wrote: > On Fri, 24 Aug 2007, Satyam Sharma wrote: > > But if people do seem to have a mixed / confused notion of atomicity > > and barriers, and if there's consensus, then as I'd said earlier, I > > have no issues in going with the consensus (eg. having API variants). > > Linus would be more difficult to convince, however, I suspect :-) > > The confusion may be the result of us having barrier semantics in > atomic_read. If we take that out then we may avoid future confusions. I think better name may help. Nuke atomic_read() altogether. n = atomic_value(x);// doesnt hint as strongly at reading as "atomic_read" n = atomic_fetch(x);// yes, we _do_ touch RAM n = atomic_read_uncached(x); // or this How does that sound? -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Friday 24 August 2007 18:15, Christoph Lameter wrote: > On Fri, 24 Aug 2007, Denys Vlasenko wrote: > > On Thursday 16 August 2007 00:22, Paul Mackerras wrote: > > > Satyam Sharma writes: > > > In the kernel we use atomic variables in precisely those situations > > > where a variable is potentially accessed concurrently by multiple > > > CPUs, and where each CPU needs to see updates done by other CPUs in a > > > timely fashion. That is what they are for. Therefore the compiler > > > must not cache values of atomic variables in registers; each > > > atomic_read must result in a load and each atomic_set must result in a > > > store. Anything else will just lead to subtle bugs. > > > > Amen. > > A "timely" fashion? One cannot rely on something like that when coding. > The visibility of updates is insured by barriers and not by some fuzzy > notion of "timeliness". But here you do have some notion of time: while (atomic_read(&x)) continue; "continue when other CPU(s) decrement it down to zero". If "read" includes an insn which accesses RAM, you will see "new" value sometime after other CPU decrements it. "Sometime after" is on the order of nanoseconds here. It is a valid concept of time, right? The whole confusion is about whether atomic_read implies "read from RAM" or not. I am in a camp which thinks it does. You are in an opposite one. We just need a less ambiguous name. What about this: /** * atomic_read - read atomic variable * @v: pointer of type atomic_t * * Atomically reads the value of @v. * No compiler barrier implied. */ #define atomic_read(v) ((v)->counter) +/** + * atomic_read_uncached - read atomic variable from memory + * @v: pointer of type atomic_t + * + * Atomically reads the value of @v. This is guaranteed to emit an insn + * which accesses memory, atomically. No ordering guarantees! + */ +#define atomic_read_uncached(v) asm_or_volatile_ptr_magic(v) I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking" atomic a-la get_cpu()... -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Friday 24 August 2007 13:12, Kenn Humborg wrote: > > On Thursday 16 August 2007 01:39, Satyam Sharma wrote: > > > static inline void wait_for_init_deassert(atomic_t *deassert) > > > { > > > - while (!atomic_read(deassert)); > > > + while (!atomic_read(deassert)) > > > + cpu_relax(); > > > return; > > > } > > > > For less-than-briliant people like me, it's totally non-obvious that > > cpu_relax() is needed for correctness here, not just to make P4 happy. > > > > IOW: "atomic_read" name quite unambiguously means "I will read > > this variable from main memory". Which is not true and creates > > potential for confusion and bugs. > > To me, "atomic_read" means a read which is synchronized with other > changes to the variable (using the atomic_XXX functions) in such > a way that I will always only see the "before" or "after" > state of the variable - never an intermediate state while a > modification is happening. It doesn't imply that I have to > see the "after" state immediately after another thread modifies > it. So you are ok with compiler propagating n1 to n2 here: n1 += atomic_read(x); other_variable++; n2 += atomic_read(x); without accessing x second time. What's the point? Any sane coder will say that explicitly anyway: tmp = atomic_read(x); n1 += tmp; other_variable++; n2 += tmp; if only for the sake of code readability. Because first code is definitely hinting that it reads RAM twice, and it's actively *bad* for code readability when in fact it's not the case! Locking, compiler and CPU barriers are complicated enough already, please don't make them even harder to understand. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thursday 16 August 2007 00:22, Paul Mackerras wrote: > Satyam Sharma writes: > In the kernel we use atomic variables in precisely those situations > where a variable is potentially accessed concurrently by multiple > CPUs, and where each CPU needs to see updates done by other CPUs in a > timely fashion. That is what they are for. Therefore the compiler > must not cache values of atomic variables in registers; each > atomic_read must result in a load and each atomic_set must result in a > store. Anything else will just lead to subtle bugs. Amen. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Saturday 18 August 2007 05:13, Linus Torvalds wrote: > On Sat, 18 Aug 2007, Satyam Sharma wrote: > > No code does (or would do, or should do): > > > > x.counter++; > > > > on an "atomic_t x;" anyway. > > That's just an example of a general problem. > > No, you don't use "x.counter++". But you *do* use > > if (atomic_read(&x) <= 1) > > and loading into a register is stupid and pointless, when you could just > do it as a regular memory-operand to the cmp instruction. It doesn't mean that (volatile int*) cast is bad, it means that current gcc is bad (or "not good enough"). IOW: instead of avoiding volatile cast, it's better to fix the compiler. > And as far as the compiler is concerned, the problem is the 100% same: > combining operations with the volatile memop. > > The fact is, a compiler that thinks that > > movl mem,reg > cmpl $val,reg > > is any better than > > cmpl $val,mem > > is just not a very good compiler. Linus, in all honesty gcc has many more cases of suboptimal code, case of "volatile" is just one of many. Off the top of my head: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417 unsigned v; void f(unsigned A) { v = ((unsigned long long)A) * 365384439 >> (27+32); } gcc-4.1.1 -S -Os -fomit-frame-pointer t.c f: movl$365384439, %eax mull4(%esp) movl%edx, %eax <= ? shrl$27, %eax movl%eax, v ret Why is it moving %edx to %eax? gcc-4.2.1 -S -Os -fomit-frame-pointer t.c f: movl$365384439, %eax mull4(%esp) movl%edx, %eax <= ? xorl%edx, %edx <= ??! shrl$27, %eax movl%eax, v ret Progress... Now we also zero out %edx afterwards for no apparent reason. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Thursday 16 August 2007 01:39, Satyam Sharma wrote: > > static inline void wait_for_init_deassert(atomic_t *deassert) > { > - while (!atomic_read(deassert)); > + while (!atomic_read(deassert)) > + cpu_relax(); > return; > } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. IOW: "atomic_read" name quite unambiguously means "I will read this variable from main memory". Which is not true and creates potential for confusion and bugs. -- vda - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html