[PATCH v2] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko
Changes since v1:
Added changes in these files:
drivers/infiniband/hw/usnic/usnic_transport.c
drivers/staging/lustre/lnet/lnet/lib-socket.c
drivers/target/iscsi/iscsi_target_login.c
drivers/vhost/net.c
fs/dlm/lowcomms.c
fs/ocfs2/cluster/tcp.c
security/tomoyo/network.c


Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko 
CC: David S. Miller 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-blueto...@vger.kernel.org
CC: linux-decnet-u...@lists.sourceforge.net
CC: linux-wirel...@vger.kernel.org
CC: linux-r...@vger.kernel.org
CC: linux-s...@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: linux-...@vger.kernel.org
---
 drivers/infiniband/hw/usnic/usnic_transport.c |  5 ++--
 drivers/isdn/mISDN/socket.c   |  5 ++--
 drivers/net/ppp/pppoe.c   |  6 ++---
 drivers/net/ppp/pptp.c|  6 ++---
 drivers/scsi/iscsi_tcp.c  | 14 +--
 drivers/soc/qcom/qmi_interface.c  |  3 +--
 drivers/staging/ipx/af_ipx.c  |  6 ++---
 drivers/staging/irda/net/af_irda.c|  8 +++---
 drivers/staging/lustre/lnet/lnet/lib-socket.c |  7 +++---
 drivers/target/iscsi/iscsi_target_login.c | 18 +++---
 drivers/vhost/net.c   |  7 +++---
 fs/dlm/lowcomms.c |  7 +++---
 fs/ocfs2/cluster/tcp.c|  6 ++---
 include/linux/net.h   |  8 +++---
 include/net/inet_common.h |  2 +-
 include/net/ipv6.h|  2 +-
 include/net/sock.h|  2 +-
 net/appletalk/ddp.c   |  5 ++--
 net/atm/pvc.c |  5 ++--
 net/atm/svc.c |  5 ++--
 net/ax25/af_ax25.c|  4 +--
 net/bluetooth/hci_sock.c  |  4 +--
 net/bluetooth/l2cap_sock.c|  5 ++--
 net/bluetooth/rfcomm/sock.c   |  5 ++--
 net/bluetooth/sco.c   |  5 ++--
 net/can/raw.c |  6 ++---
 net/core/sock.c   |  5 ++--
 net/decnet/af_decnet.c|  6 ++---
 net/ipv4/af_inet.c|  5 ++--
 net/ipv6/af_inet6.c   |  5 ++--
 net/iucv/af_iucv.c|  5 ++--
 net/l2tp/l2tp_ip.c|  5 ++--
 net/l2tp/l2tp_ip6.c   |  5 ++--
 net/l2tp/l2tp_ppp.c   |  5 ++--
 net/llc/af_llc.c  |  5 ++--
 net/netlink/af_netlink.c  |  5 ++--
 net/netrom/af_netrom.c|  9 ---
 net/nfc/llcp_sock.c   |  5 ++--
 net/packet/af_packet.c| 10 +++-
 net/phonet/socket.c   |  5 ++--
 net/qrtr/qrtr.c   |  5 ++--
 net/rds/af_rds.c  |  5 ++--
 net/rds/tcp.c |  7 ++
 net/rose/af_rose.c|  5 ++--
 net/sctp/ipv6.c   |  8 +++---
 net/smc/af_smc.c  | 11 -
 net/socket.c  | 35 +--
 net/sunrpc/clnt.c |  6 ++---
 net/sunrpc/svcsock.c  | 13 ++
 net/sunrpc/xprtsock.c |  3 +--
 net/tipc/socket.c |  5 ++--
 net/unix/af_unix.c| 10 
 net/vmw_vsock/af_vsock.c  |  4 +--
 net/x25/af_x25.c  |  4 +--
 security/tomoyo/network.c |  5 ++--
 55 files changed, 159 insertions(+), 203 deletio

Re: [PATCH] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 06:47 PM, David Miller wrote:

From: Denys Vlasenko 
Date: Mon, 12 Feb 2018 15:15:18 +0100


Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

 textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko 


Please do an allmodconfig build, there are still some conversions you
missed:

security/tomoyo/network.c: In function ‘tomoyo_socket_listen_permission’:
security/tomoyo/network.c:658:19: warning: passing argument 3 of 
‘sock->ops->getname’ makes integer from pointer without a cast 
[-Wint-conversion]
 &addr, &addr_len, 0);
^
security/tomoyo/network.c:658:19: note: expected ‘int’ but argument is of type 
‘int *’
security/tomoyo/network.c:657:21: error: too many arguments to function 
‘sock->ops->getname’
const int error = sock->ops->getname(sock, (struct sockaddr *)
  ^~~~
fs/dlm/lowcomms.c: In function ‘lowcomms_error_report’:
fs/dlm/lowcomms.c:495:6: error: too many arguments to function 
‘kernel_getpeername’
   kernel_getpeername(con->sock, (struct sockaddr *)&saddr, &buflen)) {
   ^~
fs/dlm/lowcomms.c: In function ‘tcp_accept_from_sock’:
fs/dlm/lowcomms.c:761:7: warning: passing argument 3 of ‘newsock->ops->getname’ 
makes integer from pointer without a cast [-Wint-conversion]
&len, 2)) {
^
fs/dlm/lowcomms.c:761:7: note: expected ‘int’ but argument is of type ‘int *’
fs/dlm/lowcomms.c:760:6: error: too many arguments to function 
‘newsock->ops->getname’
   if (newsock->ops->getname(newsock, (struct sockaddr *)&peeraddr,
   ^~~


Sorry. Will send updated patch.
 


[PATCH] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko
Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko 
CC: David S. Miller 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-blueto...@vger.kernel.org
CC: linux-decnet-u...@lists.sourceforge.net
CC: linux-wirel...@vger.kernel.org
CC: linux-r...@vger.kernel.org
CC: linux-s...@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: linux-...@vger.kernel.org
---
 drivers/isdn/mISDN/socket.c|  5 ++---
 drivers/net/ppp/pppoe.c|  6 ++
 drivers/net/ppp/pptp.c |  6 ++
 drivers/scsi/iscsi_tcp.c   | 14 +++---
 drivers/soc/qcom/qmi_interface.c   |  3 +--
 drivers/staging/ipx/af_ipx.c   |  6 ++
 drivers/staging/irda/net/af_irda.c |  8 +++-
 include/linux/net.h|  8 +++-
 include/net/inet_common.h  |  2 +-
 include/net/ipv6.h |  2 +-
 include/net/sock.h |  2 +-
 net/appletalk/ddp.c|  5 ++---
 net/atm/pvc.c  |  5 ++---
 net/atm/svc.c  |  5 ++---
 net/ax25/af_ax25.c |  4 ++--
 net/bluetooth/hci_sock.c   |  4 ++--
 net/bluetooth/l2cap_sock.c |  5 ++---
 net/bluetooth/rfcomm/sock.c|  5 ++---
 net/bluetooth/sco.c|  5 ++---
 net/can/raw.c  |  6 ++
 net/core/sock.c|  5 +++--
 net/decnet/af_decnet.c |  6 ++
 net/ipv4/af_inet.c |  5 ++---
 net/ipv6/af_inet6.c|  5 ++---
 net/iucv/af_iucv.c |  5 ++---
 net/l2tp/l2tp_ip.c |  5 ++---
 net/l2tp/l2tp_ip6.c|  5 ++---
 net/l2tp/l2tp_ppp.c|  5 ++---
 net/llc/af_llc.c   |  5 ++---
 net/netlink/af_netlink.c   |  5 ++---
 net/netrom/af_netrom.c |  9 +
 net/nfc/llcp_sock.c|  5 ++---
 net/packet/af_packet.c | 10 --
 net/phonet/socket.c|  5 ++---
 net/qrtr/qrtr.c|  5 ++---
 net/rds/af_rds.c   |  5 ++---
 net/rds/tcp.c  |  7 ++-
 net/rose/af_rose.c |  5 ++---
 net/sctp/ipv6.c|  8 
 net/smc/af_smc.c   |  7 +++
 net/socket.c   | 35 +--
 net/sunrpc/clnt.c  |  6 +++---
 net/sunrpc/svcsock.c   | 13 -
 net/sunrpc/xprtsock.c  |  3 +--
 net/tipc/socket.c  |  5 ++---
 net/unix/af_unix.c | 10 +-
 net/vmw_vsock/af_vsock.c   |  4 ++--
 net/x25/af_x25.c   |  4 ++--
 48 files changed, 132 insertions(+), 171 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..1f8f489b4167 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -560,7 +560,7 @@ data_sock_bind(struct socket *sock, struct sockaddr *addr, 
int addr_len)
 
 static int
 data_sock_getname(struct socket *sock, struct sockaddr *addr,
- int *addr_len, int peer)
+ int peer)
 {
struct sockaddr_mISDN   *maddr = (struct sockaddr_mISDN *) addr;
struct sock *sk = sock->sk;
@@ -570,14 +570,13 @@ data_sock_getname(struct socket *sock, struct sockaddr 
*addr,
 
lock_sock(sk);
 
-   *addr_len = sizeof(*maddr);
maddr->family = AF_ISDN;
maddr->dev = _pms(sk)->dev->id;
maddr->channel = _pms(sk)->ch.nr;
maddr->sapi = _pms(sk)->ch.addr & 0xff;
maddr->tei = (_pms(sk)->ch.addr >> 8) & 0xff;
release_sock(sk);
-   return 0;
+   return sizeof(*maddr);
 }
 
 static const struct proto_ops data_sock_ops = {
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 5aa59f41bf8c

bug in "PCI: Support INTx masking on ConnectX-4 with firmware x.14.1100+"?

2017-07-10 Thread Denys Vlasenko
+   /* Reading from resource space should be 32b aligned */
+   fw_maj_min = ioread32be(fw_ver);
+   fw_sub_min = ioread32be(fw_ver + 1);
+   fw_major = fw_maj_min & 0x;
+   fw_minor = fw_maj_min >> 16;
+   fw_subminor = fw_sub_min & 0x;

Maybe second read should be ioread32be(fw_ver + 4)?


Re: [PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko

On 06/21/2017 09:24 PM, Marcelo Ricardo Leitner wrote:

On Wed, Jun 21, 2017 at 07:03:27PM +0200, Denys Vlasenko wrote:

This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko 
CC: Vlad Yasevich 
CC: Neil Horman 
CC: David Miller 
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org


Acked-by: Marcelo Ricardo Leitner 

Just wondering, are you conducting further research on this topic?
Because we probably could use such review on SCTP stack.


Here is the list of sctp inlines which has any (however small)
benefit when deinlined.

filename:lineno:inline_name:lines_of_source_code:saved_bytes_by_deinlining:size_of_code_of_deinlined_function

include/net/sctp/command.h:228:sctp_add_cmd_sf:7:8306:38
net/sctp/ulpevent.c:91:sctp_ulpevent_set_owner:13:1616:147
include/net/sctp/sctp.h:414:sctp_skb_set_owner_r:10:934:75
net/sctp/input.c:840:sctp_hash_key:13:896:359
net/sctp/input.c:823:sctp_hash_obj:13:704:409
include/net/sctp/checksum.h:60:sctp_compute_cksum:13:595:85
net/sctp/input.c:800:sctp_hash_cmp:18:320:124
net/sctp/socket.c:117:sctp_wspace:19:256:76
include/net/sctp/sctp.h:272:sctp_max_rto:7:204:68
net/sctp/socket.c:173:sctp_verify_addr:15:192:72
include/net/sctp/checksum.h:46:sctp_csum_update:4:147:21
include/net/sctp/sctp.h:519:param_type2af:8:134:35
include/net/sctp/sctp.h:399:sctp_list_dequeue:7:123:59
include/net/sctp/sctp.h:596:sctp_transport_dst_check:4:120:60
include/net/sctp/sctp.h:435:sctp_frag_point:12:65:65
net/sctp/outqueue.c:82:sctp_outq_dequeue_data:10:64:87
include/net/sctp/command.h:243:sctp_next_cmd:4:64:37
include/net/sctp/sm.h:347:sctp_data_size:6:19:19

For .c files, the patches are trivial and I have them auto-generated,
I'll send them to you privately to save you the mechanical work.

Unfortunately, for inlines in .h files this does not work
(a human is needed to decide where to more the function).

Of course, not every deinlining makes sense.


[PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko
This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko 
CC: Vlad Yasevich 
CC: Neil Horman 
CC: David Miller 
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changed since v1:
* reindented function argument list

 net/sctp/ulpevent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index ec2b3e0..bc3f495 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -88,8 +88,8 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent 
*event)
 /* Hold the association in case the msg_name needs read out of
  * the association.
  */
-static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
-  const struct sctp_association *asoc)
+static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
+   const struct sctp_association *asoc)
 {
struct sctp_chunk *chunk = event->chunk;
struct sk_buff *skb;
-- 
1.8.3.1



[PATCH] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko
This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko 
CC: Vlad Yasevich 
CC: Neil Horman 
CC: David Miller 
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/sctp/ulpevent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index ec2b3e0..049aa5a 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -88,7 +88,7 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent 
*event)
 /* Hold the association in case the msg_name needs read out of
  * the association.
  */
-static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
+static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
   const struct sctp_association *asoc)
 {
struct sctp_chunk *chunk = event->chunk;
-- 
1.8.3.1



[PATCH] liquidio: stop using huge static buffer, save 4096k in .data

2017-06-19 Thread Denys Vlasenko
Only compile-tested - I don't have the hardware.

>From code inspection, octeon_pci_write_core_mem() appears to be safe wrt
unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned
anyway.

Signed-off-by: Denys Vlasenko 
CC: Felix Manlunas 
CC: Prasad Kanneganti 
CC: Derek Chickles 
CC: David Miller 
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 6 +-
 drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c | 4 ++--
 drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h | 2 +-
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index 53f38d0..e08f760 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -724,13 +724,11 @@ static int octeon_console_read(struct octeon_device *oct, 
u32 console_num,
 }
 
 #define FBUF_SIZE  (4 * 1024 * 1024)
-u8 fbuf[FBUF_SIZE];
 
 int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 size_t size)
 {
int ret = 0;
-   u8 *p = fbuf;
u32 crc32_result;
u64 load_addr;
u32 image_len;
@@ -805,10 +803,8 @@ int octeon_download_firmware(struct octeon_device *oct, 
const u8 *data,
else
size = FBUF_SIZE;
 
-   memcpy(p, data, size);
-
/* download the image */
-   octeon_pci_write_core_mem(oct, load_addr, p, (u32)size);
+   octeon_pci_write_core_mem(oct, load_addr, data, 
(u32)size);
 
data += size;
rem -= (u32)size;
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
index 5cd96e7..4c85ae6 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
@@ -167,10 +167,10 @@ octeon_pci_read_core_mem(struct octeon_device *oct,
 void
 octeon_pci_write_core_mem(struct octeon_device *oct,
  u64 coreaddr,
- u8 *buf,
+ const u8 *buf,
  u32 len)
 {
-   __octeon_pci_rw_core_mem(oct, coreaddr, buf, len, 0);
+   __octeon_pci_rw_core_mem(oct, coreaddr, (u8 *)buf, len, 0);
 }
 
 u64 octeon_read_device_mem64(struct octeon_device *oct, u64 coreaddr)
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h 
b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
index bae2fdd..47a3ff5 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
@@ -66,7 +66,7 @@ octeon_pci_read_core_mem(struct octeon_device *oct,
 void
 octeon_pci_write_core_mem(struct octeon_device *oct,
  u64 coreaddr,
- u8 *buf,
+ const u8 *buf,
  u32 len);
 
 #endif
-- 
2.9.2



Huge static buffer in liquidio

2016-09-24 Thread Denys Vlasenko
Hello,

I would like to discuss this commit:

commit d3d7e6c65f75de18ced10a98595a847f9f95f0ce
Author: Raghu Vatsavayi 
Date:   Tue Jun 21 22:53:07 2016 -0700

liquidio: Firmware image download

This patch has firmware image related changes for: firmware
release upon failure, support latest firmware version and
firmware download in 4MB chunks.


Here is a part of it:


+u8 fbuf[4 * 1024 * 1024];
 ^^^ what?? [also, why it is not static?]
+
 int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 size_t size)
 {
int ret = 0;
-   u8 *p;
-   u8 *buffer;
+   u8 *p = fbuf;


Don't you think that using 4 megabytes of static buffer
*just for the firmware upload* is not a good practice?

Further down the patch it's obvious that the buffer is not even
needed, because the firmware is already in memory, the "data"
and "size" parameters of this function point to it.

The meat of the change of the FW download is this (deleted
some debug messages code):

-   buffer = kmemdup(data, size, GFP_KERNEL);
-   if (!buffer)
-   return -ENOMEM;
-
-   p = buffer + sizeof(struct octeon_firmware_file_header);
+   data += sizeof(struct octeon_firmware_file_header);

/* load all images */
for (i = 0; i < be32_to_cpu(h->num_images); i++) {
load_addr = be64_to_cpu(h->desc[i].addr);
image_len = be32_to_cpu(h->desc[i].len);

-   /* download the image */
-   octeon_pci_write_core_mem(oct, load_addr, p, image_len);
+   /* Write in 4MB chunks*/
+   rem = image_len;

-   p += image_len;
+   while (rem) {
+   if (rem < (4 * 1024 * 1024))
+   size = rem;
+   else
+   size = 4 * 1024 * 1024;
+
+   memcpy(p, data, size);
+
+   /* download the image */
+   octeon_pci_write_core_mem(oct, load_addr, p, (u32)size);
+
+   data += size;
+   rem -= (u32)size;
+   load_addr += size;
+   }
}
+   dev_info(&oct->pci_dev->dev, "Writing boot command: %s\n",
+h->bootcmd);

/* Invoke the bootcmd */
ret = octeon_console_send_cmd(oct, h->bootcmd, 50);

-done_downloading:
-   kfree(buffer);


octeon_pci_write_core_mem() takes spinlock around copy op,
I take this was the reason for this change: reduce
IRQ-disabled time.

Do you actually need an intermediate fbuf[] buffer here?
(IOW: can't you just write data to PCI from memory area pointed
by "data" ptr?)

If there is indeed a reason for intermediate buffer,
why did you drop the approach of having a temporary kmalloc'ed
buffer and switches to a static (and *huge*) buffer?

In my opinion, 4 Mbyte buffer is an overkill in any case.
A buffer of ~4..16 Kbyte one would work just fine - it's not like
you need to squeeze last 0.5% of speed when you upload firmware.


[PATCH] e1000e: prevent division by zero if TIMINCA is zero

2016-05-06 Thread Denys Vlasenko
Users report that under VMWare, er32(TIMINCA) returns zero.
This causes division by zero at init time as follows:

 ==>incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
/* latch SYSTIMH on read of SYSTIML */
systim_next = (cycle_t)er32(SYSTIML);
systim_next |= (cycle_t)er32(SYSTIMH) << 32;

time_delta = systim_next - systim;
temp = time_delta;
 >  rem = do_div(temp, incvalue);

This change makes kernel survive this, and users report that
NIC does work after this change.

Since on real hardware incvalue is never zero, this should not affect
real hardware use case.

Signed-off-by: Denys Vlasenko 
CC: Jeff Kirsher 
CC: "Ruinskiy, Dima" 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
CC: LKML 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 269087c..0626935 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4315,7 +4315,8 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 
time_delta = systim_next - systim;
temp = time_delta;
-   rem = do_div(temp, incvalue);
+   /* VMWare users have seen incvalue of zero, don't div / 
0 */
+   rem = incvalue ? do_div(temp, incvalue) : (time_delta 
!= 0);
 
systim = systim_next;
 
-- 
1.8.1.4



Re: [PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

2016-04-20 Thread Denys Vlasenko
On 04/19/2016 10:57 PM, Jeff Kirsher wrote:
> On Tue, 2016-04-19 at 14:34 +0200, Denys Vlasenko wrote:
>> "incvalue" variable holds a result of "er32(TIMINCA) &
>> E1000_TIMINCA_INCVALUE_MASK"
>> and used in "do_div(temp, incvalue)" as a divisor.
>>
>> Thus, "u64 incvalue" declaration is probably a mistake.
>> Even though it seems to be a harmless one, let's fix it.
>>
>> Signed-off-by: Denys Vlasenko 
>> CC: Jeff Kirsher 
>> CC: Jesse Brandeburg 
>> CC: Shannon Nelson 
>> CC: Carolyn Wyborny 
>> CC: Don Skidmore 
>> CC: Bruce Allan 
>> CC: John Ronciak 
>> CC: Mitch Williams 
>> CC: David S. Miller 
>> CC: LKML 
>> CC: netdev@vger.kernel.org
>> ---
>>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> First of all, trimmed down the recipient list since almost all of the
> reviewers you added have nothing to do with e1000e.

I took the list here, MAINTAINERS:

INTEL ETHERNET DRIVERS
M:  Jeff Kirsher 
R:  Jesse Brandeburg 
R:  Shannon Nelson 
R:  Carolyn Wyborny 
R:  Don Skidmore 
R:  Bruce Allan 
R:  John Ronciak 
R:  Mitch Williams 

No more specific information who's e1000e reviewer.
Sorry.



[PATCH 2/3] e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check

2016-04-19 Thread Denys Vlasenko
If two consecutive reads of the counter are the same, it is also not an 
overflow.
"systimel_1 < systimel_2" should be "systimel_1 <= systimel_2".

Before the patch, we could perform an *erroneous* correction:

Let's say that systimel_1 == systimel_2 == 0x.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.

Signed-off-by: Denys Vlasenko 
CC: Jeff Kirsher 
CC: Jesse Brandeburg 
CC: Shannon Nelson 
CC: Carolyn Wyborny 
CC: Don Skidmore 
CC: Bruce Allan 
CC: John Ronciak 
CC: Mitch Williams 
CC: David S. Miller 
CC: LKML 
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 967311b..99d0e6e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4287,7 +4287,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
systimeh = er32(SYSTIMH);
systimel_2 = er32(SYSTIML);
/* Check for overflow. If there was no overflow, use the values */
-   if (systimel_1 < systimel_2) {
+   if (systimel_1 <= systimel_2) {
systim = (cycle_t)systimel_1;
systim |= (cycle_t)systimeh << 32;
} else {
-- 
1.8.1.4



[PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

2016-04-19 Thread Denys Vlasenko
"incvalue" variable holds a result of "er32(TIMINCA) & 
E1000_TIMINCA_INCVALUE_MASK"
and used in "do_div(temp, incvalue)" as a divisor.

Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.

Signed-off-by: Denys Vlasenko 
CC: Jeff Kirsher 
CC: Jesse Brandeburg 
CC: Shannon Nelson 
CC: Carolyn Wyborny 
CC: Don Skidmore 
CC: Bruce Allan 
CC: John Ronciak 
CC: Mitch Williams 
CC: David S. Miller 
CC: LKML 
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..967311b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4300,7 +4300,8 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
}
 
if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
-   u64 incvalue, time_delta, rem, temp;
+   u64 time_delta, rem, temp;
+   u32 incvalue;
int i;
 
/* errata for 82574/82583 possible bad bits read from SYSTIMH/L
-- 
1.8.1.4



[PATCH 3/3] e1000e: e1000e_cyclecounter_read(): do overflow check only if needed

2016-04-19 Thread Denys Vlasenko
SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0]

er32(SYSTIML) are probably moderately expensive (they are pci bus reads).
Can we avoid one of them? Yes, we can.

If the SYSTIML value we see is smaller than 0xff00, the overflow
into SYSTIMH would require at least two increments.

We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order.

Even if one increment happens between them, the overflow into SYSTIMH
is impossible, and we can avoid doing another er32(SYSTIML) read
and overflow check.

Signed-off-by: Denys Vlasenko 
CC: Jeff Kirsher 
CC: Jesse Brandeburg 
CC: Shannon Nelson 
CC: Carolyn Wyborny 
CC: Don Skidmore 
CC: Bruce Allan 
CC: John Ronciak 
CC: Mitch Williams 
CC: David S. Miller 
CC: LKML 
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 99d0e6e..6f17f89 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4275,7 +4275,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
 cc);
struct e1000_hw *hw = &adapter->hw;
-   u32 systimel_1, systimel_2, systimeh;
+   u32 systimel, systimeh;
cycle_t systim, systim_next;
/* SYSTIMH latching upon SYSTIML read does not work well.
 * This means that if SYSTIML overflows after we read it but before
@@ -4283,21 +4283,21 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 * will experience a huge non linear increment in the systime value
 * to fix that we test for overflow and if true, we re-read systime.
 */
-   systimel_1 = er32(SYSTIML);
+   systimel = er32(SYSTIML);
systimeh = er32(SYSTIMH);
-   systimel_2 = er32(SYSTIML);
-   /* Check for overflow. If there was no overflow, use the values */
-   if (systimel_1 <= systimel_2) {
-   systim = (cycle_t)systimel_1;
-   systim |= (cycle_t)systimeh << 32;
-   } else {
-   /* There was an overflow, read again SYSTIMH, and use
-* systimel_2
-*/
-   systimeh = er32(SYSTIMH);
-   systim = (cycle_t)systimel_2;
-   systim |= (cycle_t)systimeh << 32;
+   /* Is systimel is so large that overflow is possible? */
+   if (systimel >= (u32)0x - E1000_TIMINCA_INCVALUE_MASK) {
+   u32 systimel_2 = er32(SYSTIML);
+   if (systimel > systimel_2) {
+   /* There was an overflow, read again SYSTIMH, and use
+* systimel_2
+*/
+   systimeh = er32(SYSTIMH);
+   systimel = systimel_2;
+   }
}
+   systim = (cycle_t)systimel;
+   systim |= (cycle_t)systimeh << 32;
 
if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
u64 time_delta, rem, temp;
-- 
1.8.1.4



[PATCH] drivers/net/ethernet/jme.c: Deinline jme_reset_mac_processor, save 2816 bytes

2016-04-08 Thread Denys Vlasenko
This function compiles to 895 bytes of machine code.

Clearly, this isn't a time-critical function.
For one, it has a number of udelay(1) calls.

Signed-off-by: Denys Vlasenko 
CC: David S. Miller 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/jme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index 3ddf657..711cb19 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -222,7 +222,7 @@ jme_clear_ghc_reset(struct jme_adapter *jme)
jwrite32f(jme, JME_GHC, jme->reg_ghc);
 }
 
-static inline void
+static void
 jme_reset_mac_processor(struct jme_adapter *jme)
 {
static const u32 mask[WAKEUP_FRAME_MASK_DWNR] = {0, 0, 0, 0};
-- 
2.1.0



[PATCH] net: force inlining of netif_tx_start/stop_queue, sock_hold, __sock_put

2016-04-08 Thread Denys Vlasenko
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Arguably, gcc should do better, but gcc people aren't willing
to invest time into it, asking to use __always_inline instead.

With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.

netif_tx_stop_queue: 207 copies, 590 calls:
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi)
5d  pop%rbp
c3  retq

netif_tx_start_queue: 47 copies, 111 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 80 a7 e0 01 00 00 fe lock andb $0xfe,0x1e0(%rdi)
5d  pop%rbp
c3  retq

sock_hold: 39 copies, 124 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 ff 87 80 00 00 00lock incl 0x80(%rdi)
5d  pop%rbp
c3  retq

__sock_put: 6 copies, 13 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 ff 8f 80 00 00 00lock decl 0x80(%rdi)
5d  pop%rbp
c3  retq

This patch fixes this via s/inline/__always_inline/.

Code size decrease after the patch is ~2.5k:

text  data  bss   dec hex filename
56719876  56364551 36196352 149280779 8e5d80b vmlinux_before
56717440  56364551 36196352 149278343 8e5ce87 vmlinux

Signed-off-by: Denys Vlasenko 
CC: David S. Miller 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: netfilter-de...@vger.kernel.org
---
 include/linux/netdevice.h | 4 ++--
 include/net/sock.h| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d0..f924ddc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2801,7 +2801,7 @@ static inline void netif_tx_schedule_all(struct 
net_device *dev)
netif_schedule_queue(netdev_get_tx_queue(dev, i));
 }
 
-static inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_start_queue(struct netdev_queue 
*dev_queue)
 {
clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
@@ -2851,7 +2851,7 @@ static inline void netif_tx_wake_all_queues(struct 
net_device *dev)
}
 }
 
-static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 {
set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e0..fd15eb1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -564,7 +564,7 @@ static inline bool __sk_del_node_init(struct sock *sk)
modifications.
  */
 
-static inline void sock_hold(struct sock *sk)
+static __always_inline void sock_hold(struct sock *sk)
 {
atomic_inc(&sk->sk_refcnt);
 }
@@ -572,7 +572,7 @@ static inline void sock_hold(struct sock *sk)
 /* Ungrab socket in the context, which assumes that socket refcnt
cannot hit zero, f.e. it is true in context of any socketcall.
  */
-static inline void __sock_put(struct sock *sk)
+static __always_inline void __sock_put(struct sock *sk)
 {
atomic_dec(&sk->sk_refcnt);
 }
-- 
1.8.1.4



Re: [PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread Denys Vlasenko
On 09/28/2015 05:32 PM, David Laight wrote:
> From: Eric Dumazet
>> Sent: 28 September 2015 15:27
>> On Mon, 2015-09-28 at 14:12 +, David Laight wrote:
>>> From: Neil Horman
>>>> Sent: 28 September 2015 14:51
>>>> On Mon, Sep 28, 2015 at 02:34:04PM +0200, Denys Vlasenko wrote:
>>>>> Seemingly innocuous sctp_trans_state_to_prio_map[] array
>>>>> is way bigger than it looks, since
>>>>> "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !
>>>>>
>>>>> This patch replaces it with switch() statement.
>>>
>>> What about just adding 1 (and masking) before indexing the array?
>>> That might require a static inline function with a local static array.
>>>
>>> Or define the array as (say) [16] and just mask the state before using
>>> it as an index?
>>
>> Just let the compiler do its job, instead of obfuscating source.
>>
>> Compilers can transform a switch into an (optimal) table if it is really
>> a gain.
> 
> The compiler can choose between a jump table and nested ifs for a switch
> statement. I've never seen it convert one into a data array index.

I don't know why people are fixated on a lookup table here.

For just four possible values, the amount of generated code
is less than one Icache cacheline.

Instruction cachelines are efficiently prefetched and branches
are predicted on all modern CPUs.
Possible data access for lookup table can not be prefetched
as efficiently.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread Denys Vlasenko
Seemingly innocuous sctp_trans_state_to_prio_map[] array
is way bigger than it looks, since
"[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !

This patch replaces it with switch() statement.

Signed-off-by: Denys Vlasenko 
CC: Vlad Yasevich 
CC: Neil Horman 
CC: Marcelo Ricardo Leitner 
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---

Changes since v1: tweaked comments

 net/sctp/associola.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 197c3f5..dae51ac 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1208,20 +1208,22 @@ void sctp_assoc_update(struct sctp_association *asoc,
  *   within this document.
  *
  * Our basic strategy is to round-robin transports in priorities
- * according to sctp_state_prio_map[] e.g., if no such
+ * according to sctp_trans_score() e.g., if no such
  * transport with state SCTP_ACTIVE exists, round-robin through
  * SCTP_UNKNOWN, etc. You get the picture.
  */
-static const u8 sctp_trans_state_to_prio_map[] = {
-   [SCTP_ACTIVE]   = 3,/* best case */
-   [SCTP_UNKNOWN]  = 2,
-   [SCTP_PF]   = 1,
-   [SCTP_INACTIVE] = 0,/* worst case */
-};
-
 static u8 sctp_trans_score(const struct sctp_transport *trans)
 {
-   return sctp_trans_state_to_prio_map[trans->state];
+   switch (trans->state) {
+   case SCTP_ACTIVE:
+   return 3;   /* best case */
+   case SCTP_UNKNOWN:
+   return 2;
+   case SCTP_PF:
+   return 1;
+   default: /* case SCTP_INACTIVE */
+   return 0;   /* worst case */
+   }
 }
 
 static struct sctp_transport *sctp_trans_elect_tie(struct sctp_transport 
*trans1,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] jhash: Deinline jhash and jhash2

2015-07-19 Thread Denys Vlasenko
This patch deinlines jhash and jhash2.

It also removes rhashtable_jhash2(key, length, seed)
because it was merely calling jhash2(key, length, seed).

With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

jhash: 297 bytes, 111 calls
jhash2: 205 bytes, 136 calls

Total size decrease is about 33,000 bytes:

text data  bss   dec hex filename
90663567 17221960 36659200 144544727 89d93d7 vmlinux5
90630370 17221864 36659200 144511434 89d11ca vmlinux.after

Signed-off-by: Denys Vlasenko 
CC: Thomas Graf 
CC: David Miller 
CC: Tom Herbert 
CC: Alexander Duyck 
CC: Jozsef Kadlecsik 
CC: Herbert Xu 
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changes in v2: created a new source file, jhash.c
Changes in v3: do not deinline __jhash_nwords;
  use EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL

 include/linux/jhash.h |  89 +--
 lib/Makefile  |   2 +-
 lib/jhash.c   | 113 ++
 lib/rhashtable.c  |  13 +++---
 4 files changed, 123 insertions(+), 94 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 348c6f4..847c1aa 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -57,93 +57,8 @@
 /* An arbitrary initial parameter */
 #define JHASH_INITVAL  0xdeadbeef
 
-/* jhash - hash an arbitrary key
- * @k: sequence of bytes as key
- * @length: the length of the key
- * @initval: the previous hash, or an arbitray value
- *
- * The generic version, hashes an arbitrary sequence of bytes.
- * No alignment or length assumptions are made about the input key.
- *
- * Returns the hash value of the key. The result depends on endianness.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-   u32 a, b, c;
-   const u8 *k = key;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + length + initval;
-
-   /* All but the last block: affect some 32 bits of (a,b,c) */
-   while (length > 12) {
-   a += __get_unaligned_cpu32(k);
-   b += __get_unaligned_cpu32(k + 4);
-   c += __get_unaligned_cpu32(k + 8);
-   __jhash_mix(a, b, c);
-   length -= 12;
-   k += 12;
-   }
-   /* Last block: affect all 32 bits of (c) */
-   /* All the case statements fall through */
-   switch (length) {
-   case 12: c += (u32)k[11]<<24;
-   case 11: c += (u32)k[10]<<16;
-   case 10: c += (u32)k[9]<<8;
-   case 9:  c += k[8];
-   case 8:  b += (u32)k[7]<<24;
-   case 7:  b += (u32)k[6]<<16;
-   case 6:  b += (u32)k[5]<<8;
-   case 5:  b += k[4];
-   case 4:  a += (u32)k[3]<<24;
-   case 3:  a += (u32)k[2]<<16;
-   case 2:  a += (u32)k[1]<<8;
-   case 1:  a += k[0];
-__jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
-/* jhash2 - hash an array of u32's
- * @k: the key which must be an array of u32's
- * @length: the number of u32's in the key
- * @initval: the previous hash, or an arbitray value
- *
- * Returns the hash value of the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-   u32 a, b, c;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + (length<<2) + initval;
-
-   /* Handle most of the key */
-   while (length > 3) {
-   a += k[0];
-   b += k[1];
-   c += k[2];
-   __jhash_mix(a, b, c);
-   length -= 3;
-   k += 3;
-   }
-
-   /* Handle the last 3 u32's: all the case statements fall through */
-   switch (length) {
-   case 3: c += k[2];
-   case 2: b += k[1];
-   case 1: a += k[0];
-   __jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
+u32 jhash(const void *key, u32 length, u32 initval);
+u32 jhash2(const u32 *k, u32 length, u32 initval);
 
 /* __jhash_nwords - hash exactly 3, 2 or 1 word(s) */
 static inline u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
diff --git a/lib/Makefile b/lib/Makefile
index 6897b52..978be53 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -26,7 +26,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o 
debug_locks.o random32.o \
 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
-percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
+percpu-refcount.o percpu_ida.o jhash.o rhashtable.o reciprocal_div.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_S

Re: [PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords

2015-07-19 Thread Denys Vlasenko
On 07/16/2015 08:17 PM, David Miller wrote:
> From: Tom Herbert 
> Date: Thu, 16 Jul 2015 08:43:25 -0700
> 
>> On Thu, Jul 16, 2015 at 5:40 AM, Denys Vlasenko  wrote:
>>> This patch deinlines jhash, jhash2 and __jhash_nwords.
>>>
>>> It also removes rhashtable_jhash2(key, length, seed)
>>> because it was merely calling jhash2(key, length, seed).
>>>
>>> With this .config: http://busybox.net/~vda/kernel_config,
>>> after deinlining these functions have sizes and callsite counts
>>> as follows:
>>>
>>> __jhash_nwords: 72 bytes, 75 calls
>>> jhash: 297 bytes, 111 calls
>>> jhash2: 205 bytes, 136 calls
>>>
>> jhash is used in several places in the critical data path. Does the
>> decrease in text size justify performance impact of not inlining it?
> 
> Tom took the words right out of my mouth.
> 
> Denys, you keep making deinlining changes like this all the time, like
> a robot.  But I never see you make any effort to look into the performance
> nor code generation ramifications of your changes.

The performance impact of the call/ret pair on modern x86
CPUs is only 5 cycles. To make a real difference in execution
speed, the function has to be less than 100 bytes of code.

jhash and jhash2, at more than 200 bytes of code, are definitely
far too large for inlining. Here is the smaller of two, jhash2:

:
   8d 84 b2 ef be ad delea-0x21524111(%rdx,%rsi,4),%eax
   55  push   %rbp
   89 c1   mov%eax,%ecx
   48 89 e5mov%rsp,%rbp
   89 c2   mov%eax,%edx
   eb 62   jmp81a0d204 
   03 47 08add0x8(%rdi),%eax
   03 17   add(%rdi),%edx
   83 ee 03sub$0x3,%esi
   03 4f 04add0x4(%rdi),%ecx
   48 83 c7 0c add$0xc,%rdi
   41 89 c0mov%eax,%r8d
   29 c2   sub%eax,%edx
   41 c1 c8 1c ror$0x1c,%r8d
   01 c8   add%ecx,%eax
   44 31 c2xor%r8d,%edx
   41 89 d0mov%edx,%r8d
   29 d1   sub%edx,%ecx
   01 c2   add%eax,%edx
   41 c1 c8 1a ror$0x1a,%r8d
   41 31 c8xor%ecx,%r8d
   44 89 c1mov%r8d,%ecx
   44 29 c0sub%r8d,%eax
   41 01 d0add%edx,%r8d
   c1 c9 18ror$0x18,%ecx
   31 c1   xor%eax,%ecx
   89 c8   mov%ecx,%eax
   29 ca   sub%ecx,%edx
   46 8d 0c 01 lea(%rcx,%r8,1),%r9d
   c1 c8 10ror$0x10,%eax
   31 d0   xor%edx,%eax
   89 c1   mov%eax,%ecx
   41 29 c0sub%eax,%r8d
   42 8d 14 08 lea(%rax,%r9,1),%edx
   c1 c9 0dror$0xd,%ecx
   44 31 c1xor%r8d,%ecx
   89 c8   mov%ecx,%eax
   41 29 c9sub%ecx,%r9d
   01 d1   add%edx,%ecx
   c1 c8 1cror$0x1c,%eax
   44 31 c8xor%r9d,%eax
   83 fe 03cmp$0x3,%esi
   77 99   ja 81a0d1a2 
   83 fe 02cmp$0x2,%esi
   74 0e   je 81a0d21c 
   83 fe 03cmp$0x3,%esi
   74 06   je 81a0d219 
   ff ce   dec%esi
   75 45   jne81a0d25c 
   eb 06   jmp81a0d21f 
   03 47 08add0x8(%rdi),%eax
   03 4f 04add0x4(%rdi),%ecx
   89 ce   mov%ecx,%esi
   03 17   add(%rdi),%edx
   31 c8   xor%ecx,%eax
   c1 ce 12ror$0x12,%esi
   29 f0   sub%esi,%eax
   89 c6   mov%eax,%esi
   31 c2   xor%eax,%edx
   c1 ce 15ror$0x15,%esi
   29 f2   sub%esi,%edx
   89 d6   mov%edx,%esi
   31 d1   xor%edx,%ecx
   c1 ce 07ror$0x7,%esi
   29 f1   sub%esi,%ecx
   89 ce   mov%ecx,%esi
   31 c8   xor%ecx,%eax
   c1 ce 10ror$0x10,%esi
   29 f0   sub%esi,%eax
   89 c6   mov%eax,%esi
   31 c2   xor%eax,%ed

[PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords

2015-07-16 Thread Denys Vlasenko
This patch deinlines jhash, jhash2 and __jhash_nwords.

It also removes rhashtable_jhash2(key, length, seed)
because it was merely calling jhash2(key, length, seed).

With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

__jhash_nwords: 72 bytes, 75 calls
jhash: 297 bytes, 111 calls
jhash2: 205 bytes, 136 calls

Total size decrease is about 38,000 bytes:

text data  bss   dec hex filename
90663567 17221960 36659200 144544727 89d93d7 vmlinux5
90625577 17221864 36659200 144506641 89cff11 vmlinux.after

Signed-off-by: Denys Vlasenko 
CC: Thomas Graf 
CC: Alexander Duyck 
CC: Jozsef Kadlecsik 
CC: Herbert Xu 
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changes in v2: created a new source file, jhash.c

 include/linux/jhash.h | 123 +
 lib/Makefile  |   2 +-
 lib/jhash.c   | 149 ++
 lib/rhashtable.c  |  13 +++--
 4 files changed, 160 insertions(+), 127 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 348c6f4..0b3f55d 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -31,131 +31,14 @@
 /* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */
 #define jhash_mask(n)   (jhash_size(n)-1)
 
-/* __jhash_mix -- mix 3 32-bit values reversibly. */
-#define __jhash_mix(a, b, c)   \
-{  \
-   a -= c;  a ^= rol32(c, 4);  c += b; \
-   b -= a;  b ^= rol32(a, 6);  a += c; \
-   c -= b;  c ^= rol32(b, 8);  b += a; \
-   a -= c;  a ^= rol32(c, 16); c += b; \
-   b -= a;  b ^= rol32(a, 19); a += c; \
-   c -= b;  c ^= rol32(b, 4);  b += a; \
-}
-
-/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
-#define __jhash_final(a, b, c) \
-{  \
-   c ^= b; c -= rol32(b, 14);  \
-   a ^= c; a -= rol32(c, 11);  \
-   b ^= a; b -= rol32(a, 25);  \
-   c ^= b; c -= rol32(b, 16);  \
-   a ^= c; a -= rol32(c, 4);   \
-   b ^= a; b -= rol32(a, 14);  \
-   c ^= b; c -= rol32(b, 24);  \
-}
-
 /* An arbitrary initial parameter */
 #define JHASH_INITVAL  0xdeadbeef
 
-/* jhash - hash an arbitrary key
- * @k: sequence of bytes as key
- * @length: the length of the key
- * @initval: the previous hash, or an arbitray value
- *
- * The generic version, hashes an arbitrary sequence of bytes.
- * No alignment or length assumptions are made about the input key.
- *
- * Returns the hash value of the key. The result depends on endianness.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-   u32 a, b, c;
-   const u8 *k = key;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + length + initval;
-
-   /* All but the last block: affect some 32 bits of (a,b,c) */
-   while (length > 12) {
-   a += __get_unaligned_cpu32(k);
-   b += __get_unaligned_cpu32(k + 4);
-   c += __get_unaligned_cpu32(k + 8);
-   __jhash_mix(a, b, c);
-   length -= 12;
-   k += 12;
-   }
-   /* Last block: affect all 32 bits of (c) */
-   /* All the case statements fall through */
-   switch (length) {
-   case 12: c += (u32)k[11]<<24;
-   case 11: c += (u32)k[10]<<16;
-   case 10: c += (u32)k[9]<<8;
-   case 9:  c += k[8];
-   case 8:  b += (u32)k[7]<<24;
-   case 7:  b += (u32)k[6]<<16;
-   case 6:  b += (u32)k[5]<<8;
-   case 5:  b += k[4];
-   case 4:  a += (u32)k[3]<<24;
-   case 3:  a += (u32)k[2]<<16;
-   case 2:  a += (u32)k[1]<<8;
-   case 1:  a += k[0];
-__jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
-/* jhash2 - hash an array of u32's
- * @k: the key which must be an array of u32's
- * @length: the number of u32's in the key
- * @initval: the previous hash, or an arbitray value
- *
- * Returns the hash value of the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-   u32 a, b, c;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + (length<<2) + initval;
-
-   /* Handle most of the key */
-   while (length > 3) {
-   a += k[0];
-   b += k[1];
-   c += k[2];
-   __jhash_mix(a, b, c);
-   length -= 3;
-   k += 3;
-   }
-
-   /* Handle the last 3 u32's: all the case statements fall through */
-   switch

[PATCH] mac80211: Deinline rate_control_rate_init, rate_control_rate_update

2015-07-15 Thread Denys Vlasenko
With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

rate_control_rate_init: 554 bytes, 8 calls
rate_control_rate_update: 1596 bytes, 5 calls

Total size reduction: about 11 kbytes.

Signed-off-by: Denys Vlasenko 
CC: John Linville 
CC: Michal Kazior 
CC: Johannes Berg 
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/mac80211/rate.c | 59 
 net/mac80211/rate.h | 60 +++--
 2 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index fda33f9..03687d2 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -29,6 +29,65 @@ module_param(ieee80211_default_rc_algo, charp, 0644);
 MODULE_PARM_DESC(ieee80211_default_rc_algo,
 "Default rate control algorithm for mac80211 to use");
 
+void rate_control_rate_init(struct sta_info *sta)
+{
+   struct ieee80211_local *local = sta->sdata->local;
+   struct rate_control_ref *ref = sta->rate_ctrl;
+   struct ieee80211_sta *ista = &sta->sta;
+   void *priv_sta = sta->rate_ctrl_priv;
+   struct ieee80211_supported_band *sband;
+   struct ieee80211_chanctx_conf *chanctx_conf;
+
+   ieee80211_sta_set_rx_nss(sta);
+
+   if (!ref)
+   return;
+
+   rcu_read_lock();
+
+   chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf);
+   if (WARN_ON(!chanctx_conf)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band];
+
+   spin_lock_bh(&sta->rate_ctrl_lock);
+   ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista,
+   priv_sta);
+   spin_unlock_bh(&sta->rate_ctrl_lock);
+   rcu_read_unlock();
+   set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
+}
+
+void rate_control_rate_update(struct ieee80211_local *local,
+   struct ieee80211_supported_band *sband,
+   struct sta_info *sta, u32 changed)
+{
+   struct rate_control_ref *ref = local->rate_ctrl;
+   struct ieee80211_sta *ista = &sta->sta;
+   void *priv_sta = sta->rate_ctrl_priv;
+   struct ieee80211_chanctx_conf *chanctx_conf;
+
+   if (ref && ref->ops->rate_update) {
+   rcu_read_lock();
+
+   chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf);
+   if (WARN_ON(!chanctx_conf)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   spin_lock_bh(&sta->rate_ctrl_lock);
+   ref->ops->rate_update(ref->priv, sband, &chanctx_conf->def,
+ ista, priv_sta, changed);
+   spin_unlock_bh(&sta->rate_ctrl_lock);
+   rcu_read_unlock();
+   }
+   drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
+}
+
 int ieee80211_rate_control_register(const struct rate_control_ops *ops)
 {
struct rate_control_alg *alg;
diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index 25c9be5..624fe5b 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -71,64 +71,10 @@ rate_control_tx_status_noskb(struct ieee80211_local *local,
spin_unlock_bh(&sta->rate_ctrl_lock);
 }
 
-static inline void rate_control_rate_init(struct sta_info *sta)
-{
-   struct ieee80211_local *local = sta->sdata->local;
-   struct rate_control_ref *ref = sta->rate_ctrl;
-   struct ieee80211_sta *ista = &sta->sta;
-   void *priv_sta = sta->rate_ctrl_priv;
-   struct ieee80211_supported_band *sband;
-   struct ieee80211_chanctx_conf *chanctx_conf;
-
-   ieee80211_sta_set_rx_nss(sta);
-
-   if (!ref)
-   return;
-
-   rcu_read_lock();
-
-   chanctx_conf = rcu_dereference(sta->sdata->vif.chanctx_conf);
-   if (WARN_ON(!chanctx_conf)) {
-   rcu_read_unlock();
-   return;
-   }
-
-   sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band];
-
-   spin_lock_bh(&sta->rate_ctrl_lock);
-   ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista,
-   priv_sta);
-   spin_unlock_bh(&sta->rate_ctrl_lock);
-   rcu_read_unlock();
-   set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
-}
-
-static inline void rate_control_rate_update(struct ieee80211_local *local,
+void rate_control_rate_init(struct sta_info *sta);
+void rate_control_rate_update(struct ieee80211_local *local,
struct ieee80211_supported_band *sband,
-  

[PATCH] mac80211: Deinline drv_sta_state

2015-07-15 Thread Denys Vlasenko
With this .config: http://busybox.net/~vda/kernel_config,
after deinlining the function size is 3132 bytes and there are
7 callsites.

Total size reduction: about 20 kbytes.

Signed-off-by: Denys Vlasenko 
CC: John Linville 
CC: Michal Kazior 
Cc: Johannes Berg 
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/mac80211/Makefile |  1 +
 net/mac80211/driver-ops.c | 41 +
 net/mac80211/driver-ops.h | 29 ++---
 3 files changed, 44 insertions(+), 27 deletions(-)
 create mode 100644 net/mac80211/driver-ops.c

diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 3275f01..783e891 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_MAC80211) += mac80211.o
 # mac80211 objects
 mac80211-y := \
main.o status.o \
+   driver-ops.o \
sta_info.o \
wep.o \
wpa.o \
diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c
new file mode 100644
index 000..267c3b1
--- /dev/null
+++ b/net/mac80211/driver-ops.c
@@ -0,0 +1,41 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include "ieee80211_i.h"
+#include "trace.h"
+#include "driver-ops.h"
+
+__must_check
+int drv_sta_state(struct ieee80211_local *local,
+ struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ enum ieee80211_sta_state old_state,
+ enum ieee80211_sta_state new_state)
+{
+   int ret = 0;
+
+   might_sleep();
+
+   sdata = get_bss_sdata(sdata);
+   if (!check_sdata_in_driver(sdata))
+   return -EIO;
+
+   trace_drv_sta_state(local, sdata, &sta->sta, old_state, new_state);
+   if (local->ops->sta_state) {
+   ret = local->ops->sta_state(&local->hw, &sdata->vif, &sta->sta,
+   old_state, new_state);
+   } else if (old_state == IEEE80211_STA_AUTH &&
+  new_state == IEEE80211_STA_ASSOC) {
+   ret = drv_sta_add(local, sdata, &sta->sta);
+   if (ret == 0)
+   sta->uploaded = true;
+   } else if (old_state == IEEE80211_STA_ASSOC &&
+  new_state == IEEE80211_STA_AUTH) {
+   drv_sta_remove(local, sdata, &sta->sta);
+   }
+   trace_drv_return_int(local, ret);
+   return ret;
+}
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 32a2e70..02d9133 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -573,37 +573,12 @@ static inline void drv_sta_pre_rcu_remove(struct 
ieee80211_local *local,
trace_drv_return_void(local);
 }
 
-static inline __must_check
+__must_check
 int drv_sta_state(struct ieee80211_local *local,
  struct ieee80211_sub_if_data *sdata,
  struct sta_info *sta,
  enum ieee80211_sta_state old_state,
- enum ieee80211_sta_state new_state)
-{
-   int ret = 0;
-
-   might_sleep();
-
-   sdata = get_bss_sdata(sdata);
-   if (!check_sdata_in_driver(sdata))
-   return -EIO;
-
-   trace_drv_sta_state(local, sdata, &sta->sta, old_state, new_state);
-   if (local->ops->sta_state) {
-   ret = local->ops->sta_state(&local->hw, &sdata->vif, &sta->sta,
-   old_state, new_state);
-   } else if (old_state == IEEE80211_STA_AUTH &&
-  new_state == IEEE80211_STA_ASSOC) {
-   ret = drv_sta_add(local, sdata, &sta->sta);
-   if (ret == 0)
-   sta->uploaded = true;
-   } else if (old_state == IEEE80211_STA_ASSOC &&
-  new_state == IEEE80211_STA_AUTH) {
-   drv_sta_remove(local, sdata, &sta->sta);
-   }
-   trace_drv_return_int(local, ret);
-   return ret;
-}
+ enum ieee80211_sta_state new_state);
 
 static inline void drv_sta_rc_update(struct ieee80211_local *local,
 struct ieee80211_sub_if_data *sdata,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}

2015-07-14 Thread Denys Vlasenko
On Tue, Jul 14, 2015 at 2:38 PM, Sergei Shtylyov
 wrote:
>> +#define IWL_READ_WRITE(static_inline) \
>> +static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \
>> +{ \
>> +   trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val); \
>> +   iwl_trans_write8(trans, ofs, val); \
>> +} \
> [...]
>>
>> +#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING)
>> +IWL_READ_WRITE(static inline)
>
>Not static_inline?

Yes. Here we are putting two words, "static inline", in front of every
function definition.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}

2015-07-14 Thread Denys Vlasenko
With CONFIG_IWLWIFI_DEVICE_TRACING=y, these functions are rather large,
too big for inlining.

With this .config: http://busybox.net/~vda/kernel_config,
after uninlining these functions have sizes and callsite counts
as follows:

iwl_read32  475 bytes, 51 callsites
iwl_write32 477 bytes, 90 callsites
iwl_write8  493 bytes, 3 callsites

Reduction in size is about 74,000 bytes:

text data  bss   dec hex filename
90758147 17226024 36659200 144643371 89f152b vmlinux0
90687995 17221928 36659200 144569123 89df323 vmlinux.after

Signed-off-by: Denys Vlasenko 
CC: Johannes Berg 
CC: Emmanuel Grumbach 
Cc: "John W. Linville" 
Cc: Intel Linux Wireless 
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 drivers/net/wireless/iwlwifi/iwl-io.c |  7 +++
 drivers/net/wireless/iwlwifi/iwl-io.h | 39 +--
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-io.c 
b/drivers/net/wireless/iwlwifi/iwl-io.c
index 27c66e4..aed121e 100644
--- a/drivers/net/wireless/iwlwifi/iwl-io.c
+++ b/drivers/net/wireless/iwlwifi/iwl-io.c
@@ -38,6 +38,13 @@
 
 #define IWL_POLL_INTERVAL 10   /* microseconds */
 
+#if defined(CONFIG_IWLWIFI_DEVICE_TRACING)
+IWL_READ_WRITE( /*not inlined*/ )
+IWL_EXPORT_SYMBOL(iwl_write8);
+IWL_EXPORT_SYMBOL(iwl_write32);
+IWL_EXPORT_SYMBOL(iwl_read32);
+#endif
+
 int iwl_poll_bit(struct iwl_trans *trans, u32 addr,
 u32 bits, u32 mask, int timeout)
 {
diff --git a/drivers/net/wireless/iwlwifi/iwl-io.h 
b/drivers/net/wireless/iwlwifi/iwl-io.h
index 705d12c..3c9d2a8 100644
--- a/drivers/net/wireless/iwlwifi/iwl-io.h
+++ b/drivers/net/wireless/iwlwifi/iwl-io.h
@@ -32,24 +32,31 @@
 #include "iwl-devtrace.h"
 #include "iwl-trans.h"
 
-static inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val)
-{
-   trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val);
-   iwl_trans_write8(trans, ofs, val);
-}
-
-static inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val)
-{
-   trace_iwlwifi_dev_iowrite32(trans->dev, ofs, val);
-   iwl_trans_write32(trans, ofs, val);
+#define IWL_READ_WRITE(static_inline) \
+static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \
+{ \
+   trace_iwlwifi_dev_iowrite8(trans->dev, ofs, val); \
+   iwl_trans_write8(trans, ofs, val); \
+} \
+static_inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val) \
+{ \
+   trace_iwlwifi_dev_iowrite32(trans->dev, ofs, val); \
+   iwl_trans_write32(trans, ofs, val); \
+} \
+static_inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs) \
+{ \
+   u32 val = iwl_trans_read32(trans, ofs); \
+   trace_iwlwifi_dev_ioread32(trans->dev, ofs, val); \
+   return val; \
 }
 
-static inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs)
-{
-   u32 val = iwl_trans_read32(trans, ofs);
-   trace_iwlwifi_dev_ioread32(trans->dev, ofs, val);
-   return val;
-}
+#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING)
+IWL_READ_WRITE(static inline)
+#else
+void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val);
+void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val);
+u32 iwl_read32(struct iwl_trans *trans, u32 ofs);
+#endif
 
 static inline void iwl_set_bit(struct iwl_trans *trans, u32 reg, u32 mask)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 3c59x: Fix shared IRQ handling

2015-07-07 Thread Denys Vlasenko
As its first order of business, boomerang_interrupt() checks whether
the device really has any pending interrupts. If it does not,
it does nothing and returns, but it still returns IRQ_HANDLED.

This is wrong: interrupt was not handled, IRQ handlers of other
devices sharing this IRQ line need to be called.

vortex_interrupt() has it right: it returns IRQ_NONE in this case
via IRQ_RETVAL(0).

Do the same in boomerang_interrupt().

Signed-off-by: Denys Vlasenko 
CC: David S. Miller 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/3com/3c59x.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 41095eb..c11d6fc 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -2382,6 +2384,7 @@ boomerang_interrupt(int irq, void *dev_id)
void __iomem *ioaddr;
int status;
int work_done = max_interrupt_work;
+   int handled = 0;
 
ioaddr = vp->ioaddr;
 
@@ -2400,6 +2403,7 @@ boomerang_interrupt(int irq, void *dev_id)
 
if ((status & IntLatch) == 0)
goto handler_exit;  /* No interrupt: shared IRQs 
can cause this */
+   handled = 1;
 
if (status == 0x) { /* h/w no longer present (hotplug)? */
if (vortex_debug > 1)
@@ -2501,7 +2505,7 @@ boomerang_interrupt(int irq, void *dev_id)
 handler_exit:
vp->handling_irq = 0;
spin_unlock(&vp->lock);
-   return IRQ_HANDLED;
+   return IRQ_RETVAL(handled);
 }
 
 static int vortex_rx(struct net_device *dev)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-04-29 Thread Denys Vlasenko
On Wed, Apr 1, 2015 at 9:48 PM, David Miller  wrote:
> D.S. Ljungmark (1):
>   ipv6: Don't reduce hop limit for an interface

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6fd99094de2b83d1d4c8457f2c83483b2828e75a

I was testing this change and apparently it doesn't close the hole.

The python script I use to send RAs:

#!/usr/bin/env python
import sys
import time
import scapy.all
from scapy.layers.inet6 import *
ip = IPv6()
# ip.dst = 'ff02::1'
ip.dst = sys.argv[1]
icmp = ICMPv6ND_RA()
icmp.chlim = 1
for x in range(10):
send(ip/icmp)
time.sleep(1)

# ./ipv6-hop-limit.py fe80::21e:37ff:fed0:5006
.
Sent 1 packets.
...<10 times>...
Sent 1 packets.

After I do this, on the targeted machine I check hop_limits:

# for f in /proc/sys/net/ipv6/conf/*/hop_limit; do echo -n $f:; cat $f; done
/proc/sys/net/ipv6/conf/all/hop_limit:64
/proc/sys/net/ipv6/conf/default/hop_limit:64
/proc/sys/net/ipv6/conf/enp0s25/hop_limit:1  <=== THIS
/proc/sys/net/ipv6/conf/lo/hop_limit:64
/proc/sys/net/ipv6/conf/wlp3s0/hop_limit:64

As you see, the interface which received RAs still lowered
its hop_limit to 1. I take it means that the bug is still present
(right? I'm not a network guy...).

I triple-checked that I do run the kernel with the fix.
Further investigation shows that the code touched by the fix
is not even reached, hop_limit is changed elsewhere.

I'm willing to test additional patches.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-17 Thread Denys Vlasenko
On 04/17/2015 07:42 PM, Eric Dumazet wrote:
> On Fri, 2015-04-17 at 19:05 +0200, Denys Vlasenko wrote:
>> How do you expect one to find excessively large inlines,
>> if not on allyesconfig build?
> 
> Tuning kernel sources based on allyesconfig build _size_ only is
> terrible. We could build an interpreter based kernel and maybe reduce
> its size by 50% who knows...
> 
> You are saying that all inline should be removed, since it is obvious
> kernel size _will_ be smaller.

I am not saying that. That would be stupid.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netns: remove BUG_ONs from net_generic()

2015-04-17 Thread Denys Vlasenko
This inline has ~500 callsites.

On 04/14/2015 08:37 PM, David Miller wrote:
> That BUG_ON() was added 7 years ago, and I don't remember it ever
> triggering or helping us diagnose something, so just remove it and
> keep the function inlined.

On x86 allyesconfig build:

text data  bss   dec hex filename
82447071 22255384 20627456 125329911 77861f7 vmlinux4
82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime

Signed-off-by: Denys Vlasenko 
CC: Eric W. Biederman 
CC: David S. Miller 
CC: Jan Engelhardt 
CC: Jiri Pirko 
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 include/net/netns/generic.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h
index 0931618..70e1585 100644
--- a/include/net/netns/generic.h
+++ b/include/net/netns/generic.h
@@ -38,11 +38,9 @@ static inline void *net_generic(const struct net *net, int 
id)
 
rcu_read_lock();
ng = rcu_dereference(net->gen);
-   BUG_ON(id == 0 || id > ng->len);
ptr = ng->ptr[id - 1];
rcu_read_unlock();
 
-   BUG_ON(!ptr);
return ptr;
 }
 #endif
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-17 Thread Denys Vlasenko
On 04/16/2015 02:38 PM, Eric Dumazet wrote:
> On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote:
> 
>> However, without BUG_ONs, function is still a bit big
>> on PREEMPT configs.
> 
> Only on allyesconfig builds, that nobody use but to prove some points
> about code size.

How do you expect one to find excessively large inlines,
if not on allyesconfig build?

Only by using allyesconfig, I can measure how many calls
are there in the kernel. (grepping source is utterly unreliable
due to nested inlines and macros).

For the record: I am not using the _full_ allyesconfig,
I do disable some debugging options which clearly aren't
ever enabled on production systems. E.g. in my config:

# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_KASAN is not set

etc.

> If you look at net_generic(), it is mostly used from code that is
> normally in 3 modules. How many people really load them ?
> 
> net/tipc : 91 call sites
> net/sunrpc : 57
> fs/nfsd & fs/lockd : 183
> 
> Then few remaining uses in tunnels.

Grepping is far from reliable. The above missed more than half
of all calls. I disassembed vmlinux after deinlining, there are
nearly 500 calls of net_generic().

> As we suggested, please just remove the BUG_ON().

Going to send the patch in a minute.
-- 
vda

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread Denys Vlasenko
Hi David, Eric,

As you may have surmised, this patch wasn't a result of me looking
at networking code; rather, it is a result of semi-automated search
for huge inlines.

The last step of this process would be the submission of patches.
I am expecting a range of responses from maintainers: enthusiastic,
"no reply", "go away, I don't care about code size",
and everything in between.

I can't invest a large amount of effort trying to push
every deinlining patch through. If someone, for example, would
flat out refuse to fix a huge inline in his driver, so be it.

I will be happy to run at least a few iterations over a patch
if maintainers do see a merit in deinlining, but have some
reservations wrt performance.

Your response falls into the latter case (you aren't refusing the patch
outright, but want it to be changed in some way).

It would help if you tell me how I should change the patches.

(I also hope to have a semi-generic way of addressing
performance concerns for future deinlining
patch submissions.)


On 04/14/2015 08:37 PM, David Miller wrote:
> From: Denys Vlasenko 
> Date: Tue, 14 Apr 2015 14:25:11 +0200
> 
>> On x86 allyesconfig build:
>> The function compiles to 130 bytes of machine code.
>> It has 493 callsites.
>> Total reduction of vmlinux size: 27906 bytes.
>>
>>text  data  bss   dec hex filename
>> 82447071 22255384 20627456 125329911 77861f7 vmlinux4
>> 82419165 22255384 20627456 125302005 777f4f5 vmlinux5
>>
>> Signed-off-by: Denys Vlasenko 
> 
> That BUG_ON() was added 7 years ago, and I don't remember it ever
> triggering or helping us diagnose something, so just remove it and
> keep the function inlined.

There are two BUG_ONs. I'll remove both of them in v2.
Let me know if you want something else.

However, without BUG_ONs, function is still a bit big
on PREEMPT configs.

Among almost 500 users, many probably aren't hot paths.

How about having an inlined version, say, "fast_net_generic()",
and a deinlined one, and using one or another as appropriate.
Is this ok with you?


On 04/14/2015 03:19 PM, Eric Dumazet wrote:
> On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote:
>> On x86 allyesconfig build:
>> The function compiles to 130 bytes of machine code.
>> It has 493 callsites.
>> Total reduction of vmlinux size: 27906 bytes.
>>
>>text   data  bss   dec hex filename
>> 82447071 22255384 20627456 125329911 77861f7 vmlinux4
>> 82419165 22255384 20627456 125302005 777f4f5 vmlinux5
>
> This sounds a big hammer to me.
>
> These savings probably comes from the BUG_ON() that could simply be
> removed.
> The second one for sure has no purpose. First one looks defensive.
>
> For a typical (non allyesconfig) kernel, net_generic() would translate
> to :
>
> return net->gen[id - 1]
>
> Tunnels need this in fast path, so I presume we could introduce
> net_generic_rcu() to keep this stuff inlined where it matters.
>
> static inline void *net_generic_rcu(const struct net *net, int id)
> {
>   struct net_generic *ng = rcu_dereference(net->gen);
>
>   return ng->ptr[id - 1];
> }

I looked at the code and I'm not feeling confident enough
to find places where replacing net_generic() with
net_generic_rcu() is okay.

Would it be okay if I add net_generic_rcu() as you suggest,
but leave it to you (network people) to switch to it where
appropriate?

-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ip[6]_tables.c: remove some inlines

2007-12-31 Thread Denys Vlasenko
On Monday 31 December 2007 17:00, Patrick McHardy wrote:
> Denys Vlasenko wrote:
> > ip[6]_tables.c seem to abuse inline.
> >
> > This patch removes inlines except those which are used
> > by packet matching code and thus are performance-critical.
> >   
> 
> Some people also consider the ruleset replacement path performance
> critical, but overall I agree with your patch. I'm travelling currently
> though so it will take a few days until I'll apply it.
> 
> Do you have some numbers that show the actual difference these
> changes make?

Before:

$ size */*/*/ip*tables*.o
   textdata bss dec hex filename
   6402 500  1669181b06 net/ipv4/netfilter/ip_tables.o
   7130 500  1676461dde net/ipv6/netfilter/ip6_tables.o

After:

$ size */*/*/ip*tables*.o
   textdata bss dec hex filename
   6307 500  1668231aa7 net/ipv4/netfilter/ip_tables.o
   7010 500  1675261d66 net/ipv6/netfilter/ip6_tables.o

--
vda
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ip[6]_tables.c: remove some inlines

2007-12-31 Thread Denys Vlasenko
On Friday 28 December 2007 15:30, Patrick McHardy wrote:
> >> This clashes with my pending patches, which I'll push upstream
> >> today. I also spent some time resyncing ip_tables and ip6_tables
> >> so a diff of both (with some sed'ing) shows only the actual
> >> differences, so please update ip6_tables as well.
> >
> > I would be happy to rediff the patch.
> >
> > Which tree should I sync against in order to not collide
> > with your changes?
> 
> The net-2.6.25.git tree contains all my changes.

ip[6]_tables.c seem to abuse inline.

This patch removes inlines except those which are used
by packet matching code and thus are performance-critical.

Please take this patch into netfilter queue.

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index f5b66ec..982b7f9 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -75,6 +75,7 @@ do {\
Hence the start of any table is given by get_table() below.  */
 
 /* Returns whether matches rule or not. */
+/* Performance critical - called for every packet */
 static inline bool
 ip_packet_match(const struct iphdr *ip,
 		const char *indev,
@@ -153,7 +154,7 @@ ip_packet_match(const struct iphdr *ip,
 	return true;
 }
 
-static inline bool
+static bool
 ip_checkentry(const struct ipt_ip *ip)
 {
 	if (ip->flags & ~IPT_F_MASK) {
@@ -183,8 +184,9 @@ ipt_error(struct sk_buff *skb,
 	return NF_DROP;
 }
 
-static inline
-bool do_match(struct ipt_entry_match *m,
+/* Performance critical - called for every packet */
+static inline bool
+do_match(struct ipt_entry_match *m,
 	  const struct sk_buff *skb,
 	  const struct net_device *in,
 	  const struct net_device *out,
@@ -199,6 +201,7 @@ bool do_match(struct ipt_entry_match *m,
 		return false;
 }
 
+/* Performance critical */
 static inline struct ipt_entry *
 get_entry(void *base, unsigned int offset)
 {
@@ -206,6 +209,7 @@ get_entry(void *base, unsigned int offset)
 }
 
 /* All zeroes == unconditional rule. */
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 unconditional(const struct ipt_ip *ip)
 {
@@ -221,7 +225,7 @@ unconditional(const struct ipt_ip *ip)
 
 #if defined(CONFIG_NETFILTER_XT_TARGET_TRACE) || \
 defined(CONFIG_NETFILTER_XT_TARGET_TRACE_MODULE)
-static const char *hooknames[] = {
+static const char *const hooknames[] = {
 	[NF_INET_PRE_ROUTING]		= "PREROUTING",
 	[NF_INET_LOCAL_IN]		= "INPUT",
 	[NF_INET_FORWARD]		= "FORWARD",
@@ -235,7 +239,7 @@ enum nf_ip_trace_comments {
 	NF_IP_TRACE_COMMENT_POLICY,
 };
 
-static const char *comments[] = {
+static const char *const comments[] = {
 	[NF_IP_TRACE_COMMENT_RULE]	= "rule",
 	[NF_IP_TRACE_COMMENT_RETURN]	= "return",
 	[NF_IP_TRACE_COMMENT_POLICY]	= "policy",
@@ -251,6 +255,7 @@ static struct nf_loginfo trace_loginfo = {
 	},
 };
 
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 get_chainname_rulenum(struct ipt_entry *s, struct ipt_entry *e,
 		  char *hookname, char **chainname,
@@ -567,7 +572,7 @@ mark_source_chains(struct xt_table_info *newinfo,
 	return 1;
 }
 
-static inline int
+static int
 cleanup_match(struct ipt_entry_match *m, unsigned int *i)
 {
 	if (i && (*i)-- == 0)
@@ -579,7 +584,7 @@ cleanup_match(struct ipt_entry_match *m, unsigned int *i)
 	return 0;
 }
 
-static inline int
+static int
 check_entry(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
@@ -600,7 +605,8 @@ check_entry(struct ipt_entry *e, const char *name)
 	return 0;
 }
 
-static inline int check_match(struct ipt_entry_match *m, const char *name,
+static int
+check_match(struct ipt_entry_match *m, const char *name,
 			  const struct ipt_ip *ip,
 			  unsigned int hookmask, unsigned int *i)
 {
@@ -623,7 +629,7 @@ static inline int check_match(struct ipt_entry_match *m, const char *name,
 	return ret;
 }
 
-static inline int
+static int
 find_check_match(struct ipt_entry_match *m,
 		 const char *name,
 		 const struct ipt_ip *ip,
@@ -652,7 +658,7 @@ err:
 	return ret;
 }
 
-static inline int check_target(struct ipt_entry *e, const char *name)
+static int check_target(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
 	struct xt_target *target;
@@ -673,7 +679,7 @@ static inline int check_target(struct ipt_entry *e, const char *name)
 	return ret;
 }
 
-static inline int
+static int
 find_check_entry(struct ipt_entry *e, const char *name, unsigned int size,
 		 unsigned int *i)
 {
@@ -717,7 +723,7 @@ find_check_entry(struct ipt_entry *e, const char *name, unsigned int size,
 	return ret;
 }
 
-static inline int
+static int
 check_entry_size_and_hooks(struct ipt_entry *e,
 			   struct xt_table_info *newinfo,
 			   unsigned char *base,
@@ -760,7 +766,7 

Re: [PATCH] net/ipv4/netfilter/ip_tables.c: remove some inlines

2007-12-26 Thread Denys Vlasenko
On Monday 17 December 2007 14:47, Patrick McHardy wrote:
> Please CC netfilter-devel on netfilter patches.
>
> Denys Vlasenko wrote:
> > Hi Patrick, Harald,
> >
> > I was working on unrelated problem and noticed that ip_tables.c
> > seem to abuse inline. I prepared a patch which removes inlines
> > except those which are used by packet matching code
> > (and thus are really performance-critical).
> > I added comments explaining that remaining inlines are
> > performance critical.
> >
> > Result as reported by size:
> >
> >textdata bss dec hex filename
> > -  6451 380  8869191b07 ip_tables.o
> > +  6339 348  7267591a67 ip_tables.o
> >
> > Please take this patch into netfilter queue.
>
> This clashes with my pending patches, which I'll push upstream
> today. I also spent some time resyncing ip_tables and ip6_tables
> so a diff of both (with some sed'ing) shows only the actual
> differences, so please update ip6_tables as well.

I would be happy to rediff the patch.

Which tree should I sync against in order to not collide
with your changes?
--
vda
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/ipv4/netfilter/ip_tables.c: remove some inlines

2007-12-16 Thread Denys Vlasenko
Hi Patrick, Harald,

I was working on unrelated problem and noticed that ip_tables.c
seem to abuse inline. I prepared a patch which removes inlines
except those which are used by packet matching code
(and thus are really performance-critical).
I added comments explaining that remaining inlines are
performance critical.

Result as reported by size:

   textdata bss dec hex filename
-  6451 380  8869191b07 ip_tables.o
+  6339 348  7267591a67 ip_tables.o

Please take this patch into netfilter queue.

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
diff -urpN linux-2.6.org/net/ipv4/netfilter/ip_tables.c linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c
--- linux-2.6.org/net/ipv4/netfilter/ip_tables.c	2007-12-14 10:46:37.0 -0800
+++ linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c	2007-12-16 12:37:46.0 -0800
@@ -74,6 +74,7 @@ do {\
Hence the start of any table is given by get_table() below.  */
 
 /* Returns whether matches rule or not. */
+/* Performance critical - called for every packet */
 static inline int
 ip_packet_match(const struct iphdr *ip,
 		const char *indev,
@@ -152,7 +153,7 @@ ip_packet_match(const struct iphdr *ip,
 	return 1;
 }
 
-static inline bool
+static bool
 ip_checkentry(const struct ipt_ip *ip)
 {
 	if (ip->flags & ~IPT_F_MASK) {
@@ -182,6 +183,7 @@ ipt_error(struct sk_buff *skb,
 	return NF_DROP;
 }
 
+/* Performance critical - called for every packet */
 static inline
 bool do_match(struct ipt_entry_match *m,
 	  const struct sk_buff *skb,
@@ -198,6 +200,7 @@ bool do_match(struct ipt_entry_match *m,
 		return false;
 }
 
+/* Performance critical */
 static inline struct ipt_entry *
 get_entry(void *base, unsigned int offset)
 {
@@ -205,6 +208,7 @@ get_entry(void *base, unsigned int offse
 }
 
 /* All zeroes == unconditional rule. */
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 unconditional(const struct ipt_ip *ip)
 {
@@ -219,7 +223,7 @@ unconditional(const struct ipt_ip *ip)
 
 #if defined(CONFIG_NETFILTER_XT_TARGET_TRACE) || \
 defined(CONFIG_NETFILTER_XT_TARGET_TRACE_MODULE)
-static const char *hooknames[] = {
+static const char *const hooknames[] = {
 	[NF_IP_PRE_ROUTING]		= "PREROUTING",
 	[NF_IP_LOCAL_IN]		= "INPUT",
 	[NF_IP_FORWARD]			= "FORWARD",
@@ -233,7 +237,7 @@ enum nf_ip_trace_comments {
 	NF_IP_TRACE_COMMENT_POLICY,
 };
 
-static const char *comments[] = {
+static const char *const comments[] = {
 	[NF_IP_TRACE_COMMENT_RULE]	= "rule",
 	[NF_IP_TRACE_COMMENT_RETURN]	= "return",
 	[NF_IP_TRACE_COMMENT_POLICY]	= "policy",
@@ -249,6 +253,7 @@ static struct nf_loginfo trace_loginfo =
 	},
 };
 
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 get_chainname_rulenum(struct ipt_entry *s, struct ipt_entry *e,
 		  char *hookname, char **chainname,
@@ -567,7 +572,7 @@ mark_source_chains(struct xt_table_info 
 	return 1;
 }
 
-static inline int
+static int
 cleanup_match(struct ipt_entry_match *m, unsigned int *i)
 {
 	if (i && (*i)-- == 0)
@@ -579,7 +584,7 @@ cleanup_match(struct ipt_entry_match *m,
 	return 0;
 }
 
-static inline int
+static int
 check_entry(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
@@ -599,7 +604,7 @@ check_entry(struct ipt_entry *e, const c
 	return 0;
 }
 
-static inline int check_match(struct ipt_entry_match *m, const char *name,
+static int check_match(struct ipt_entry_match *m, const char *name,
 const struct ipt_ip *ip, unsigned int hookmask,
 unsigned int *i)
 {
@@ -622,7 +627,7 @@ static inline int check_match(struct ipt
 	return ret;
 }
 
-static inline int
+static int
 find_check_match(struct ipt_entry_match *m,
 	const char *name,
 	const struct ipt_ip *ip,
@@ -651,7 +656,7 @@ err:
 	return ret;
 }
 
-static inline int check_target(struct ipt_entry *e, const char *name)
+static int check_target(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
 	struct xt_target *target;
@@ -672,7 +677,7 @@ static inline int check_target(struct ip
 	return ret;
 }
 
-static inline int
+static int
 find_check_entry(struct ipt_entry *e, const char *name, unsigned int size,
 	unsigned int *i)
 {
@@ -716,7 +721,7 @@ find_check_entry(struct ipt_entry *e, co
 	return ret;
 }
 
-static inline int
+static int
 check_entry_size_and_hooks(struct ipt_entry *e,
 			   struct xt_table_info *newinfo,
 			   unsigned char *base,
@@ -759,7 +764,7 @@ check_entry_size_and_hooks(struct ipt_en
 	return 0;
 }
 
-static inline int
+static int
 cleanup_entry(struct ipt_entry *e, unsigned int *i)
 {
 	struct ipt_entry_target *t;
@@ -1293,7 +1298,7 @@ __do_replace(const char *name, unsigned 
 	get_counters(oldinfo, counters);
 	/* Decrease module usage counts and free resource */
 	loc_cpu_old_entry = oldinfo->entries[raw_smp_processor_id()];
-	IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldi

Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Wednesday 14 November 2007 00:27, Adrian Bunk wrote:
> You missed the following in my email:
> "we slowly scare them away due to the many bug reports without any
>  reaction."
>
> The problem is that bug reports take time. If you go away from easy
> things like compile errors then even things like describing what does
> no longer work, ideally producing a scenario where you can reproduce it
> and verifying whether it was present in previous kernels can easily take
> many hours that are spent before the initial bug report.
>
> If the bug report then gets ignored we discourage the person who sent
> the bug report to do any work related to the kernel again.

Cannot agree more. I am in a similar position right now.
My patch to aic7xxx driver was ubmitted four times
with not much reaction from scsi guys.

Finally they replied and asked to rediff it against their
git tree. I did that and sent patches back. No reply since then.

And mind you, the patch is not trying to do anything
complex, it mostly moves code around, removes 'inline',
adds 'const'. What should I think about it?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 11:57, Gabriel C wrote:
> > The main problem is finding experienced developers who spend time on
> > looking into bug reports.
>
> There are already. IMO the problem is the development model.
>
> There are tons new features in each new kernel release and 'tons new bugs'
> which are not fixed during the release cycle nor in the .XX stable kernels.
>
> Maybe after XX kernel releases there should be one just with bug-fixes
> _without_ any new features , eg: cleaning bugs from bugzilla , know
> regressions , cleaning up code , removing broken drivers and the like.

Won't work. You cannot force people to work on things they don't
find interesting, long-term.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 10:56, Adrian Bunk wrote:
> On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote:
> > On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
> > > Btw, I used to test every -mm kernel. But since I've switched distros
> > > (gentoo->ubuntu)
> > > and I have less time, I feel it's harder to test -rc or -mm kernels (I
> > > know this isn't a lkml problem
> > > but more a distro problem, but I would love having an ubuntu blessed
> > > repo with current dev kernel
> > > for the latest stable ubuntu release).
> >
> > There are two parts to this.  One is a Ubuntu development kernel which
> > we can give to large numbers of people to expand our testing pool.
> > But if we don't do a better job of responding to bug reports that
> > would be generated by expanded testing this won't necessarily help us.
> >...
>
> The main problem aren't missing testers [1] - we already have relatively
> experienced people testing kernels and/or reporting bugs, and we slowly
> scare them away due to the many bug reports without any reaction.
>
> The main problem is finding experienced developers who spend time on
> looking into bug reports.
>
> Getting many relatively unexperienced users (who need more guidance for
> debugging issues) as additional testers is therefore IMHO not
> necessarily a good idea.

And where experienced developrs are coming from?
They are not born with Linux kernel skills.
They grow up from within user base.

Bigger user base -> more developers (eventually)
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 07:08, Mark Lord wrote:
> Ingo Molnar wrote:
> ..
>
> > This is all QA-101 that _cannot be argued against on a rational basis_,
> > it's just that these sorts of things have been largely ignored for
> > years, in favor of the all-too-easy "open source means many eyeballs and
> > that is our QA" answer, which is a _good_ answer but by far not the most
> > intelligent answer! Today "many eyeballs" is simply not good enough and
> > nature (and other OS projects) will route us around if we dont change.
>
> ..
>
> QA-101 and "many eyeballs" are not at all in opposition.
> The latter is how we find out about bugs on uncommon hardware,
> and the former is what we need to track them and overall quality.
>
> A HUGE problem I have with current "efforts", is that once someone
> reports a bug, the onus seems to be 99% on the *reporter* to find
> the exact line of code or commit.  Ghad what a repressive method.

This is the only method that scales.

Developer has only 24 hours in each day, and sometimes he needs to eat,
sleep, and maybe even pay attention to e.g. his kids.

But bug reporters are much more numerous and they have more
hours in one day combined.

BUT - it means that developers should try to increase user base,
not scare users away.

> And if the "developer" who broke the damn thing, or who at least
> "claims" to be supporting that code, cannot "reproduce" the bug,
> they drop it completely.

Developer should let reporter know that reporter needs to help
a bit here. Sometimes a bit of hand holding is needed, but it
pays off because you breed more qualified testers/bug reporters.

> Contrast that flawed approach with how Linus does things..
> he thinks through the symptoms, matches them to the code,
> and figures out what the few possibilities might be,
> and feeds back some trial balloon patches for the bug reporter to try.
>
> MUCH better.
>
> And remember, *I'm* an old-time Linux kernel developer.. just think about
> the people reporting bugs who haven't been around here since 1992..

Yes. Developers should not grow more and more unhelpful
and arrogant towards their users just because inexperienced
users send incomplete/poorly written bug reports.
They need to provide help, not humiliate/ignore.

I think we agree here.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.24 breaks powerpc mysteriously

2007-10-13 Thread Denys Vlasenko
On Friday 12 October 2007 05:09, Paul Mackerras wrote:
> > I supposed a hacky fix is to add __KERNEL__ ifdef protection around
> > zlib_inflate_blob() and those troublesome includes.  A nicer fix is
> 
> That would do, but I don't see why zlib_inflate_blob had to be added
> to inflate.c rather than being in a new file under lib/zlib_inflate.

Done.

> If nothing else, that bloats configs that currently use inflate.c (in
> the kernel) but don't need zlib_inflate_blob() - which would be all
> the existing uses. :)

I did it.

However, in general I prefer to have a better linker, one which
is capable of eliminate unused fuctions/data on the per-object basis,
not per .o file.

Patches to implement that are sent to Sam Ravnborg.
Hopefully they will make it to mainline.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.24 breaks powerpc mysteriously

2007-10-13 Thread Denys Vlasenko
On Friday 12 October 2007 03:45, David Miller wrote:
> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Thu, 11 Oct 2007 19:22:33 -0700
> 
> > With net-2.6.24 (pulled yesterday) applied:
> > 
> > g5:/usr/src/25> ml arch/powerpc/boot/inflate.o
> >   Using ARCH=powerpc CROSS_COMPILE=
> >   CHK include/linux/version.h
> >   CHK include/linux/utsrelease.h
> >   CALLscripts/checksyscalls.sh
> >   BOOTCC  arch/powerpc/boot/inflate.o
> > arch/powerpc/boot/inflate.c:920:19: errno.h: No such file or directory
> > arch/powerpc/boot/inflate.c:921:18: slab.h: No such file or directory
> > arch/powerpc/boot/inflate.c:922:21: vmalloc.h: No such file or directory
> > arch/powerpc/boot/inflate.c: In function `zlib_inflate_blob':
> > arch/powerpc/boot/inflate.c:928: error: syntax error before '*' token
> 
> The only thing we touched in zlib is in the patch below.
> 
> I suspect the lib/zlib_inflate/inflate.c changes, I had no idea that
> some pieces of code try to use this into userspace.
> 
> I supposed a hacky fix is to add __KERNEL__ ifdef protection around
> zlib_inflate_blob() and those troublesome includes.  A nicer fix is
> probably to change the zlib_inflate_blob() interface to pass in
> pointers to alloc() and free() routines instead of calling kernel ones
> directly.
> 
> Denys?
> 
> commit 8336793baf962163c9fab5a3f39614295fdbab27
> Author: Denys Vlasenko <[EMAIL PROTECTED]>
> Date:   Sun Sep 30 17:56:49 2007 -0700
> 
> [ZLIB]: Move bnx2 driver gzip unpacker into zlib.
> 
> Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
> Acked-by: Michael Chan <[EMAIL PROTECTED]>
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

Please find updated patch in attachment.
Compile-tested.
--
vda
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2761,48 +2761,6 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(&bp->phy_lock);
 }
 
-/* To be moved to generic lib/ */
-static int
-bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len)
-{
-	struct z_stream_s *strm;
-	int rc;
-
-	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
-	 * is stripped */
-
-	rc = -ENOMEM;
-	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
-	if (strm == NULL)
-		goto gunzip_nomem2;
-	strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (strm->workspace == NULL)
-		goto gunzip_nomem3;
-
-	strm->next_in = zbuf;
-	strm->avail_in = len;
-	strm->next_out = gunzip_buf;
-	strm->avail_out = sz;
-
-	rc = zlib_inflateInit2(strm, -MAX_WBITS);
-	if (rc == Z_OK) {
-		rc = zlib_inflate(strm, Z_FINISH);
-		/* after Z_FINISH, only Z_STREAM_END is "we unpacked it all" */
-		if (rc == Z_STREAM_END)
-			rc = sz - strm->avail_out;
-		else
-			rc = -EINVAL;
-		zlib_inflateEnd(strm);
-	} else
-		rc = -EINVAL;
-
-	kfree(strm->workspace);
-gunzip_nomem3:
-	kfree(strm);
-gunzip_nomem2:
-	return rc;
-}
-
 static void
 load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len,
 	u32 rv2p_proc)
@@ -2858,7 +2816,7 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, struct fw_info *fw)
 		text = vmalloc(FW_BUF_SIZE);
 		if (!text)
 			return -ENOMEM;
-		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len);
+		rc = zlib_inflate_blob(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len);
 		if (rc < 0) {
 			vfree(text);
 			return rc;
@@ -2935,14 +2893,14 @@ bnx2_init_cpus(struct bnx2 *bp)
 	text = vmalloc(FW_BUF_SIZE);
 	if (!text)
 		return -ENOMEM;
-	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
+	rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
 	if (rc < 0) {
 		vfree(text);
 		goto init_cpu_err;
 	}
 	load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1);
 
-	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2));
+	rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2));
 	if (rc < 0) {
 		vfree(text);
 		goto init_cpu_err;
--- a/include/linux/zlib.h
+++ b/include/linux/zlib.h
@@ -82,7 +82,7 @@
 struct internal_state;
 
 typedef struct z_stream_s {
-Byte*next_in;   /* next input byte */
+const Byte *next_in;   /* next input byte */
 uInt avail_in;  /* number of bytes available at next_in */
 uLongtotal_in;  /* total nb of input bytes read so far */
 
@@ -699,4 +699,8 @@ extern int zlib_inflateInit2(z_streamp strm, int  windowBits);
 struct internal_state {int dummy;}; /* hack for buggy compilers */
 #endif
 
+/* Utility function: initialize zlib, unpack binary blob, clean up zlib,
+ * return len or negative error code. */
+extern int zlib_inflate_blob(void *dst, unsigned dst_sz, const void *src, unsigned src_sz);
+
 #endif /* _ZLIB_H */
--- a/lib/zlib_inflate/i

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote:
> 
> On Fri, 21 Sep 2007, Denys Vlasenko wrote:
> 
> > On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:
> >> On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:
> >>
> >>> I plan to use gzip compression on following drivers' firmware,
> >>> if patches will be accepted:
> >>>
> >>>textdata bss dec hex filename
> >>>   17653  109968 240  127861   1f375 drivers/net/acenic.o
> >>>6628  120448   4  127080   1f068 drivers/net/dgrs.o
> >>>  ^^
> >>
> >> Should this be redone to use the existing firmware loading framework to
> >> load the firmware instead?
> >
> > Not in every case.
> >
> > For example, bnx2 maintainer says that driver and
> > firmware are closely tied for his driver. IOW: you upgrade kernel
> > and your NIC is not working anymore.
>
> Firmware may come with a kernel. We have a "install modules", we can also 
> add "install firmware".

Install where? I boot my machine over NFS, and it has no hard drive.

> > Another argument is to make kernel be able to bring up NICs
> > without needing firmware images in initramfs/initrd/hard drive.
> 
> It is not possible to bring up things like FC or WiFi without firmware, 
> what special is in classic NICs?

Nothing.

It is just not (yet?) decreed from The Very Top that all and every
firmware image should be loaded using request_firmware().

Also people may want to gzip something else than firmware.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 21:13, Andi Kleen wrote:
> Denys Vlasenko <[EMAIL PROTECTED]> writes:
> > 
> > I plan to use gzip compression on following drivers' firmware,
> > if patches will be accepted:
> > 
> >textdata bss dec hex filename
> >   17653  109968 240  127861   1f375 drivers/net/acenic.o
> >6628  120448   4  127080   1f068 drivers/net/dgrs.o
> >  ^^
> 
> Just change the makefiles to always install gzip'ed modules
> modutils knows how to unzip them on the fly.

But I compile net/* into bzImage. I like netbooting :)
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:
> On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:
> 
> > I plan to use gzip compression on following drivers' firmware,
> > if patches will be accepted:
> > 
> >textdata bss dec hex filename
> >   17653  109968 240  127861   1f375 drivers/net/acenic.o
> >6628  120448   4  127080   1f068 drivers/net/dgrs.o
> >  ^^
> 
> Should this be redone to use the existing firmware loading framework to
> load the firmware instead?

Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.

Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 18:49, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Fri, 21 Sep 2007 18:03:55 +0100
> 
> > Do patches look ok to you?
> 
> I'm travelling so I haven't looked closely yet :-)
> 
> Michael can take a look and I'll try to do so as well
> tonight.

Good.

I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^

--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 17:14, David Miller wrote:
> From: Denys Vlasenko <[EMAIL PROTECTED]>
> Date: Fri, 21 Sep 2007 12:01:24 +0100
> 
> > Hi Jeff,
> 
> BNX2 and TG3 patches goes through Michael Chan and myself,
> and I usually merge them in instead of Jeff.

Didn't know that, sorry.

Do patches look ok to you?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] bnx2: move gzip unpacker to zlib

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 12:01, Denys Vlasenko wrote:
> I will move this code
> out of the driver and into zlib in follow-on patch.

No, I won't. I accidentally attached both patches to first email,
you can find it there. Sorry.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
Hi Jeff,

This patch modifies gzip unpacking code in bnx2 driver so that
it does not depend on bnx2 internals. I will move this code
out of the driver and into zlib in follow-on patch.

It can be useful in other drivers which need to store firmwares
or any other relatively big binary blobs - fonts, cursor bitmaps,
whatever.

Patch is run tested by Michael Chan (driver author).

Michael, can you add your ACK?

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c
--- linux-2.6.23-rc6/drivers/net/bnx2.c	2007-09-14 00:08:11.0 +0100
+++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c	2007-09-21 11:44:03.0 +0100
@@ -52,6 +52,8 @@
 #include "bnx2_fw.h"
 #include "bnx2_fw2.h"
 
+#define FW_BUF_SIZE		0x8000
+
 #define DRV_MODULE_NAME		"bnx2"
 #define PFX DRV_MODULE_NAME	": "
 #define DRV_MODULE_VERSION	"1.6.4"
@@ -2767,89 +2769,45 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(&bp->phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len)
 {
-	if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	int rc;
 
-	if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped */
 
-	bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp->strm->workspace == NULL)
+	rc = -ENOMEM;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm->workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm->next_in = zbuf;
+	strm->avail_in = len;
+	strm->next_out = gunzip_buf;
+	strm->avail_out = sz;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		/* after Z_FINISH, only Z_STREAM_END is "we unpacked it all" */
+		if (rc == Z_STREAM_END)
+			rc = sz - strm->avail_out;
+		else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm->workspace);
 gunzip_nomem3:
-	kfree(bp->strm);
-	bp->strm = NULL;
-
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp->gunzip_buf);
-	bp->gunzip_buf = NULL;
-
-gunzip_nomem1:
-	printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for "
-			"uncompression.\n", bp->dev->name);
-	return -ENOMEM;
-}
-
-static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp->strm->workspace);
-
-	kfree(bp->strm);
-	bp->strm = NULL;
-
-	if (bp->gunzip_buf) {
-		vfree(bp->gunzip_buf);
-		bp->gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3] & FNAME)
-		while ((zbuf[n++] != 0) && (n < len));
-
-	bp->strm->next_in = zbuf + n;
-	bp->strm->avail_in = len - n;
-	bp->strm->next_out = bp->gunzip_buf;
-	bp->strm->avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp->strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp->strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp->strm->avail_out;
-	*outbuf = bp->gunzip_buf;
-
-	if ((rc != Z_OK) && (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n",
-		   bp->dev->name, bp->strm->msg);
-
-	zlib_inflateEnd(bp->strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
 	return rc;
 }
 
@@ -2902,22 +2860,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_
 	/* Load the Text area. */
 	offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base);
 	if (fw->gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text,
- &text_len);
-		if (rc)
-			return rc;
-
-		fw->text = text;
-	}
-	if (fw->gz_text) {
+		u32 *text;
 		int j;
 
+		text = vmalloc(FW_BUF_SIZE);
+		if (!text)
+			return -ENOMEM;
+		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len);
+		if (rc < 0) {
+			vfree(text);
+			return rc;
+		}
 		for (j = 0; j < (fw->text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,27 +2936,27 @@ bnx2_init_cpus(struct bnx2 *bp)
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
-
-

Re: bnx2 dirver's firmware images

2007-09-20 Thread Denys Vlasenko
On Wednesday 19 September 2007 22:43, Michael Chan wrote:
> On Wed, 2007-09-19 at 21:29 +0100, Denys Vlasenko wrote:
> 
> > Are you saying that you successfully run-tested it?
> 
> I've only reviewed the code.  Let's resolve these issues first before
> testing the code.

Please test these two patches.
I updated them according to your comments.
--
vda
diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c
--- linux-2.6.23-rc6/drivers/net/bnx2.c	2007-09-14 00:08:11.0 +0100
+++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c	2007-09-20 15:47:06.0 +0100
@@ -52,6 +52,8 @@
 #include "bnx2_fw.h"
 #include "bnx2_fw2.h"
 
+#define FW_BUF_SIZE		0x8000
+
 #define DRV_MODULE_NAME		"bnx2"
 #define PFX DRV_MODULE_NAME	": "
 #define DRV_MODULE_VERSION	"1.6.4"
@@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(&bp->phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf)
 {
-	if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	int rc;
 
-	if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped */
 
-	bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp->strm->workspace == NULL)
+	rc = -ENOMEM;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm->workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm->next_in = zbuf;
+	strm->avail_in = len;
+	strm->next_out = gunzip_buf;
+	strm->avail_out = sz;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		if (rc == Z_OK)
+			rc = sz - strm->avail_out;
+		else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm->workspace);
 gunzip_nomem3:
-	kfree(bp->strm);
-	bp->strm = NULL;
-
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp->gunzip_buf);
-	bp->gunzip_buf = NULL;
-
-gunzip_nomem1:
-	printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for "
-			"uncompression.\n", bp->dev->name);
-	return -ENOMEM;
-}
-
-static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp->strm->workspace);
-
-	kfree(bp->strm);
-	bp->strm = NULL;
-
-	if (bp->gunzip_buf) {
-		vfree(bp->gunzip_buf);
-		bp->gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3] & FNAME)
-		while ((zbuf[n++] != 0) && (n < len));
-
-	bp->strm->next_in = zbuf + n;
-	bp->strm->avail_in = len - n;
-	bp->strm->next_out = bp->gunzip_buf;
-	bp->strm->avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp->strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp->strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp->strm->avail_out;
-	*outbuf = bp->gunzip_buf;
-
-	if ((rc != Z_OK) && (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n",
-		   bp->dev->name, bp->strm->msg);
-
-	zlib_inflateEnd(bp->strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
 	return rc;
 }
 
@@ -2902,22 +2859,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_
 	/* Load the Text area. */
 	offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base);
 	if (fw->gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text,
- &text_len);
-		if (rc)
-			return rc;
-
-		fw->text = text;
-	}
-	if (fw->gz_text) {
+		u32 *text;
 		int j;
 
+		text = vmalloc(FW_BUF_SIZE);
+		if (!text)
+			return -ENOMEM;
+		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw->gz_text, fw->gz_text_len);
+		if (rc < 0) {
+			vfree(text);
+			return rc;
+		}
 		for (j = 0; j < (fw->text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw->text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,27 +2935,27 @@ bnx2_init_cpus(struct bnx2 *bp)
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
-
-	if ((rc = bnx2_gunzip_init(bp)) != 0)
-		return rc;
 
 	/* Initialize the RV2P processor. */
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), &t

Re: bnx2 dirver's firmware images

2007-09-19 Thread Denys Vlasenko
On Wednesday 19 September 2007 22:00, Michael Chan wrote:
> On Wed, 2007-09-19 at 09:30 +0100, Denys Vlasenko wrote:
> +   /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
> +* is stripped, 32-bit unpacked size (LE) is prepended instead */
> +   sz = *zbuf++;
> +   sz = (sz << 8) + *zbuf++;
> +   sz = (sz << 8) + *zbuf++;
> +   sz = (sz << 8) + *zbuf++;
> 
> I don't have a problem with removing the gzip header.  It doesn't
> contain very useful information other than a valid header for sanity
> check.  But I don't think we need to arbitrarily add the unpacked size
> in front of the gzipped data.  The driver knows the size (e.g. the size
> of RAM on the chip) and should pass it to the call.  The driver should
> also allocate the memory for the unpacked data instead of allocating the
> memory inside the call and freeing it by the caller.  For example, the
> driver may need to use pci_alloc_consistent() if the firmware is to be
> DMA'ed to the chip.
> 
> Other than that, everything else looks fine.  Thanks.

Are you saying that you successfully run-tested it?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2 dirver's firmware images

2007-09-19 Thread Denys Vlasenko
On Tuesday 18 September 2007 21:05, Michael Chan wrote:
> The bnx2 firmware changes quite frequently.  A new driver quite often
> requires new firmware to work correctly.  Splitting them up makes things
> difficult for the user.

sounds reasonable.

I see that bnx2 has support for unpacking gzipped binary blobs,
and it it the only such net driver (maybe the only such driver
in the entire tree, I didn't check).

This can be very useful for all other firmware images in other drivers.

Last night I prepared a patch which basically separates unpacking
function from bnx2-related code. Can you run-test attached patch?

Meanwhile I will prepare follow-on patch which actually moves this
function out of the driver and into lib/*.

Size difference:

# size */bn*.o
   textdata bss dec hex filename
  54884   816896424  142997   22e95 net/bnx2.o
  55276   818236424  143523   230a3 net.org/bnx2.o

Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
--
vda
--- linux-2.6.23-rc6.org/drivers/net/bnx2.c	Tue Sep 11 22:33:54 2007
+++ linux-2.6.23-rc6.gunzip/drivers/net/bnx2.c	Wed Sep 19 00:01:19 2007
@@ -2767,93 +2767,61 @@
 	spin_unlock_bh(&bp->phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(u8 *zbuf, int len, void **outbuf)
 {
-	if ((bp->gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	void *gunzip_buf;
+	int rc;
+	int sz;
 
-	if ((bp->strm = kmalloc(sizeof(*bp->strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped, 32-bit unpacked size (LE) is prepended instead */
+	sz = *zbuf++;
+	sz = (sz << 8) + *zbuf++;
+	sz = (sz << 8) + *zbuf++;
+	sz = (sz << 8) + *zbuf++;
 
-	bp->strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp->strm->workspace == NULL)
+	rc = -ENOMEM;
+	gunzip_buf = vmalloc(sz);
+	if (gunzip_buf == NULL)
+		goto gunzip_nomem1;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm->workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm->workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm->next_in = zbuf;
+	strm->avail_in = len;
+	strm->next_out = gunzip_buf;
+	strm->avail_out = sz;
 
-gunzip_nomem3:
-	kfree(bp->strm);
-	bp->strm = NULL;
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		if (rc == Z_OK) {
+			rc = sz - strm->avail_out;
+			*outbuf = gunzip_buf;
+		} else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm->workspace);
+gunzip_nomem3:
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp->gunzip_buf);
-	bp->gunzip_buf = NULL;
-
+	if (rc != Z_OK)
+		vfree(gunzip_buf);
 gunzip_nomem1:
-	printk(KERN_ERR PFX "%s: Cannot allocate firmware buffer for "
-			"uncompression.\n", bp->dev->name);
-	return -ENOMEM;
+	return rc; /* returns Z_OK (0) if successful */
 }
 
 static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp->strm->workspace);
-
-	kfree(bp->strm);
-	bp->strm = NULL;
-
-	if (bp->gunzip_buf) {
-		vfree(bp->gunzip_buf);
-		bp->gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3] & FNAME)
-		while ((zbuf[n++] != 0) && (n < len));
-
-	bp->strm->next_in = zbuf + n;
-	bp->strm->avail_in = len - n;
-	bp->strm->next_out = bp->gunzip_buf;
-	bp->strm->avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp->strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp->strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp->strm->avail_out;
-	*outbuf = bp->gunzip_buf;
-
-	if ((rc != Z_OK) && (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX "%s: Firmware decompression error: %s\n",
-		   bp->dev->name, bp->strm->msg);
-
-	zlib_inflateEnd(bp->strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
-	return rc;
-}
-
-static void
 load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len,
 	u32 rv2p_proc)
 {
@@ -2902,22 +2870,16 @@
 	/* Load the Text area. */
 	offset = cpu_reg->spad_base + (fw->text_addr - cpu_reg->mips_view_base);
 	if (fw->gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw->gz_text, fw->gz_text_len, &text,
- &text_len);
-		if (rc)
-			return rc;
-
-		fw->text = text;
-	}
-	if (fw->gz_text) {
+		u32 *text;
 		int j;
 
+		rc = bnx2_gunzip(fw->gz_text, fw->gz_text_len, (void*) &text);
+		if (rc < 0)
+			return rc;
 	

Re: bnx2 dirver's firmware images

2007-09-18 Thread Denys Vlasenko
On Tuesday 18 September 2007 19:45, Michael Chan wrote:
> We can compress all the different sections of the firmware.  Currently,
> we only compress the biggest chunks and the rest are uncompressed.

> These zeros should compress to almost nothing.  But I agree that the
> firmware is still big.

You don't need to store and fetch zeros at all. You *know* that
they are zeros, right? So do this:

- REG_WR_IND(bp, offset, fw->bss[j]);
+ REG_WR_IND(bp, offset, 0);

--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 16:09, Linus Torvalds wrote:
> On Mon, 10 Sep 2007, Denys Vlasenko wrote:
> > static inline int
> > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
> > {
> > int  return_status = QLA_SUCCESS;
> > unsigned long loop_timeout ;
> > scsi_qla_host_t *pha = to_qla_parent(ha);
> > 
> > /* wait for 5 min at the max for loop to be ready */
> > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
> > 
> > while ((!atomic_read(&pha->loop_down_timer) &&
> > atomic_read(&pha->loop_state) == LOOP_DOWN) ||
> > atomic_read(&pha->loop_state) != LOOP_READY) {
> > if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
> ...
> > Is above correct or buggy? Correct, because msleep is a barrier.
> > Is it obvious? No.
> 
> It's *buggy*. But it has nothing to do with any msleep() in the loop, or 
> anything else.
> 
> And more importantly, it would be equally buggy even *with* a "volatile" 
> atomic_read().

I am not saying that this code is okay, this isn't the point.
(The code is in fact awful for several more reasons).

My point is that people are confused as to what atomic_read()
exactly means, and this is bad. Same for cpu_relax().
First one says "read", and second one doesn't say "barrier".

This is real code from current kernel which demonstrates this:

"I don't know that cpu_relax() is a barrier already":

drivers/kvm/kvm_main.c
        while (atomic_read(&completed) != needed) {
                cpu_relax();
                barrier();
        }

"I think that atomic_read() is a read from memory and therefore
I don't need a barrier":

arch/x86_64/kernel/crash.c
        msecs = 1000; /* Wait at most a second for the other cpus to stop */
        while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
                mdelay(1);
                msecs--;
        }

Since neither camp seems to give up, I am proposing renaming
them to something less confusing, and make everybody happy.

cpu_relax_barrier()
atomic_value(&x)
atomic_fetch(&x)

I'm not native English speaker, do these sound better?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 15:51, Arjan van de Ven wrote:
> On Mon, 10 Sep 2007 11:56:29 +0100
> Denys Vlasenko <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Well, if you insist on having it again:
> > 
> > Waiting for atomic value to be zero:
> > 
> > while (atomic_read(&x))
> > continue;
> > 
> 
> and this I would say is buggy code all the way.
>
> Not from a pure C level semantics, but from a "busy waiting is buggy"
> semantics level and a "I'm inventing my own locking" semantics level.

After inspecting arch/*, I cannot agree with you.
Otherwise almost all major architectures use
"conceptually buggy busy-waiting":

arch/alpha
arch/i386
arch/ia64
arch/m32r
arch/mips
arch/parisc
arch/powerpc
arch/sh
arch/sparc64
arch/um
arch/x86_64

All of the above contain busy-waiting on atomic_read.

Including these loops without barriers:

arch/mips/kernel/smtc.c
while (atomic_read(&idle_hook_initialized) < 1000)
;
arch/mips/sgi-ip27/ip27-nmi.c
while (atomic_read(&nmied_cpus) != num_online_cpus());

[Well maybe num_online_cpus() is a barrier, I didn't check]

arch/sh/kernel/smp.c
if (wait)
while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1));

Bugs?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 14:38, Denys Vlasenko wrote:
> You are basically trying to educate me how to use atomic properly.
> You don't need to do it, as I am (currently) not a driver author.
> 
> I am saying that people who are already using atomic_read()
> (and who unfortunately did not read your explanation above)
> will still sometimes use atomic_read() as a way to read atomic value
> *from memory*, and will create nasty heisenbugs for you to debug.

static inline int
qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
{
int  return_status = QLA_SUCCESS;
unsigned long loop_timeout ;
scsi_qla_host_t *pha = to_qla_parent(ha);

/* wait for 5 min at the max for loop to be ready */
loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);

while ((!atomic_read(&pha->loop_down_timer) &&
atomic_read(&pha->loop_state) == LOOP_DOWN) ||
atomic_read(&pha->loop_state) != LOOP_READY) {
if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
return_status = QLA_FUNCTION_FAILED;
break;
}
msleep(1000);
if (time_after_eq(jiffies, loop_timeout)) {
return_status = QLA_FUNCTION_FAILED;
break;
}
}
return (return_status);
}

Is above correct or buggy? Correct, because msleep is a barrier.
Is it obvious? No.

static void
qla2x00_rst_aen(scsi_qla_host_t *ha)
{
if (ha->flags.online && !ha->flags.reset_active &&
!atomic_read(&ha->loop_down_timer) &&
!(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) {
do {
clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags);

/*
 * Issue marker command only when we are going to start
 * the I/O.
 */
ha->marker_needed = 1;
} while (!atomic_read(&ha->loop_down_timer) &&
(test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags)));
}
}

Is above correct? I honestly don't know. Correct, because set_bit is
a barrier on _all _memory_? Will it break if set_bit will be changed
to be a barrier only on its operand? Probably yes.

drivers/kvm/kvm_main.c

while (atomic_read(&completed) != needed) {
cpu_relax();
barrier();
}

Obviously author did not know that cpu_relax is already a barrier.
See why I think driver authors will be confused?

arch/x86_64/kernel/crash.c

static void nmi_shootdown_cpus(void)
{
...
msecs = 1000; /* Wait at most a second for the other cpus to stop */
while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
mdelay(1);
msecs--;
}
...
}

Is mdelay(1) a barrier? Yes, because it is a function on x86_64.
Absolutely the same code will be buggy on an arch where
mdelay(1) == udelay(1000), and udelay is implemented
as inline busy-wait.

arch/sparc64/kernel/smp.c

/* Wait for response */
while (atomic_read(&data.finished) != cpus)
cpu_relax();
...later in the same file...
while (atomic_read(&smp_capture_registry) != ncpus)
rmb();

I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()?
(No it doesn't). Which of those two while loops needs correcting?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 13:22, Kyle Moffett wrote:
> On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote:
> > On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
> >> On Sun, 9 Sep 2007 19:02:54 +0100
> >> Denys Vlasenko <[EMAIL PROTECTED]> wrote:
> >>
> >>> Why is all this fixation on "volatile"? I don't think people want  
> >>> "volatile" keyword per se, they want atomic_read(&x) to _always_  
> >>> compile into an memory-accessing instruction, not register access.
> >>
> >> and ... why is that?  is there any valid, non-buggy code sequence  
> >> that makes that a reasonable requirement?
> >
> > Well, if you insist on having it again:
> >
> > Waiting for atomic value to be zero:
> >
> > while (atomic_read(&x))
> > continue;
> >
> > gcc may happily convert it into:
> >
> > reg = atomic_read(&x);
> > while (reg)
> > continue;
> 
> Bzzt.  Even if you fixed gcc to actually convert it to a busy loop on  
> a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that  
> does the conversion, it may be that the CPU does the caching of the  
> memory value.  GCC has no mechanism to do cache-flushes or memory- 
> barriers except through our custom inline assembly.

CPU can cache the value all right, but it cannot use that cached value
*forever*, it has to react to invalidate cycles on the shared bus
and re-fetch new data.

IOW: atomic_read(&x) which compiles down to memory accessor
will work properly.

> the CPU.  Thirdly, on a large system it may take some arbitrarily  
> large amount of time for cache-propagation to update the value of the  
> variable in your local CPU cache.

Yes, but "arbitrarily large amount of time" is actually measured
in nanoseconds here. Let's say 1000ns max for hundreds of CPUs?

> Also, you   
> probably want a cpu_relax() in there somewhere to avoid overheating  
> the CPU.

Yes, but 
1. CPU shouldn't overheat (in a sense that it gets damaged),
   it will only use more power than needed.
2. cpu_relax() just throttles down my CPU, so it's performance
   optimization only. Wait, it isn't, it's a barrier too.
   Wow, "cpu_relax" is a barrier? How am I supposed to know
   that without reading lkml flamewars and/or header files?

Let's try reading headers. asm-x86_64/processor.h:

#define cpu_relax()   rep_nop()

So, is it a barrier? No clue yet.

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
__asm__ __volatile__("rep;nop": : :"memory");
}

Comment explicitly says that it is "a good thing" (doesn't say
that it is mandatory) and says NOTHING about barriers!

Barrier-ness is not mentioned and is hidden in "memory" clobber.

Do you think it's obvious enough for average driver writer?
I think not, especially that it's unlikely for him to even start
suspecting that it is a memory barrier based on the "cpu_relax"
name.

> You simply CANNOT use an atomic_t as your sole synchronizing
> primitive, it doesn't work!  You virtually ALWAYS want to use an  
> atomic_t in the following types of situations:
> 
> (A) As an object refcount.  The value is never read except as part of  
> an atomic_dec_return().  Why aren't you using "struct kref"?
> 
> (B) As an atomic value counter (number of processes, for example).   
> Just "reading" the value is racy anyways, if you want to enforce a  
> limit or something then use atomic_inc_return(), check the result,  
> and use atomic_dec() if it's too big.  If you just want to return the  
> statistics then you are going to be instantaneous-point-in-time anyways.
> 
> (C) As an optimization value (statistics-like, but exact accuracy  
> isn't important).
> 
> Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like  
> completions, mutexes, semaphores, spinlocks, krefs, etc.  It's not  
> useful for synchronization, only for keeping track of simple integer  
> RMW values.  Note that atomic_read() and atomic_set() aren't very  
> useful RMW primitives (read-nomodify-nowrite and read-set-zero- 
> write).  Code which assumes anything else is probably buggy in other  
> ways too.

You are basically trying to educate me how to use atomic properly.
You don't need to do it, as I am (currently) not a driver author.

I am saying that people who are already using atomic_read()
(and who unfortunately did not read your explanation above)
will still sometimes use atomic_read() as a way to read atomic value
*from memory*, and will create nasty heisenbugs for you to debug.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
> On Sun, 9 Sep 2007 19:02:54 +0100
> Denys Vlasenko <[EMAIL PROTECTED]> wrote:
> 
> > Why is all this fixation on "volatile"? I don't think
> > people want "volatile" keyword per se, they want atomic_read(&x) to
> > _always_ compile into an memory-accessing instruction, not register
> > access.
> 
> and ... why is that?
> is there any valid, non-buggy code sequence that makes that a
> reasonable requirement?

Well, if you insist on having it again:

Waiting for atomic value to be zero:

while (atomic_read(&x))
continue;

gcc may happily convert it into:

reg = atomic_read(&x);
while (reg)
continue;

Expecting every driver writer to remember that atomic_read is not in fact
a "read from memory" is naive. That won't happen. Face it, majority of
driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;)
The name of the macro is saying that it's a read.
We are confusing users here.

It's doubly confusing that cpy_relax(), which says _nothing_ about barriers
in its name, is actually a barrier you need to insert here.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-09 Thread Denys Vlasenko
On Friday 17 August 2007 17:48, Linus Torvalds wrote:
> 
> On Fri, 17 Aug 2007, Nick Piggin wrote:
> > 
> > That's not obviously just taste to me. Not when the primitive has many
> > (perhaps, the majority) of uses that do not require said barriers. And
> > this is not solely about the code generation (which, as Paul says, is
> > relatively minor even on x86). I prefer people to think explicitly
> > about barriers in their lockless code.
> 
> Indeed.
> 
> I think the important issues are:
> 
>  - "volatile" itself is simply a badly/weakly defined issue. The semantics 
>of it as far as the compiler is concerned are really not very good, and 
>in practice tends to boil down to "I will generate so bad code that 
>nobody can accuse me of optimizing anything away".
> 
>  - "volatile" - regardless of how well or badly defined it is - is purely 
>a compiler thing. It has absolutely no meaning for the CPU itself, so 
>it at no point implies any CPU barriers. As a result, even if the 
>compiler generates crap code and doesn't re-order anything, there's 
>nothing that says what the CPU will do.
> 
>  - in other words, the *only* possible meaning for "volatile" is a purely 
>single-CPU meaning. And if you only have a single CPU involved in the 
>process, the "volatile" is by definition pointless (because even 
>without a volatile, the compiler is required to make the C code appear 
>consistent as far as a single CPU is concerned).
> 
> So, let's take the example *buggy* code where we use "volatile" to wait 
> for other CPU's:
> 
>   atomic_set(&var, 0);
>   while (!atomic_read(&var))
>   /* nothing */;
> 
> 
> which generates an endless loop if we don't have atomic_read() imply 
> volatile.
> 
> The point here is that it's buggy whether the volatile is there or not! 
> Exactly because the user expects multi-processing behaviour, but 
> "volatile" doesn't actually give any real guarantees about it. Another CPU 
> may have done:
> 
>   external_ptr = kmalloc(..);
>   /* Setup is now complete, inform the waiter */
>   atomic_inc(&var);
> 
> but the fact is, since the other CPU isn't serialized in any way, the 
> "while-loop" (even in the presense of "volatile") doesn't actually work 
> right! Whatever the "atomic_read()" was waiting for may not have 
> completed, because we have no barriers!

Why is all this fixation on "volatile"? I don't think
people want "volatile" keyword per se, they want atomic_read(&x) to
_always_ compile into an memory-accessing instruction, not register access.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:06, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Satyam Sharma wrote:
> > But if people do seem to have a mixed / confused notion of atomicity
> > and barriers, and if there's consensus, then as I'd said earlier, I
> > have no issues in going with the consensus (eg. having API variants).
> > Linus would be more difficult to convince, however, I suspect :-)
>
> The confusion may be the result of us having barrier semantics in
> atomic_read. If we take that out then we may avoid future confusions.

I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as "atomic_read"
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:15, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Denys Vlasenko wrote:
> > On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> > > Satyam Sharma writes:
> > > In the kernel we use atomic variables in precisely those situations
> > > where a variable is potentially accessed concurrently by multiple
> > > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > > timely fashion.  That is what they are for.  Therefore the compiler
> > > must not cache values of atomic variables in registers; each
> > > atomic_read must result in a load and each atomic_set must result in a
> > > store.  Anything else will just lead to subtle bugs.
> >
> > Amen.
>
> A "timely" fashion? One cannot rely on something like that when coding.
> The visibility of updates is insured by barriers and not by some fuzzy
> notion of "timeliness".

But here you do have some notion of time:

while (atomic_read(&x))
continue;

"continue when other CPU(s) decrement it down to zero".
If "read" includes an insn which accesses RAM, you will
see "new" value sometime after other CPU decrements it.
"Sometime after" is on the order of nanoseconds here.
It is a valid concept of time, right?

The whole confusion is about whether atomic_read implies
"read from RAM" or not. I am in a camp which thinks it does.
You are in an opposite one.

We just need a less ambiguous name.

What about this:

/**
 * atomic_read - read atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically reads the value of @v.
 * No compiler barrier implied.
 */
#define atomic_read(v)  ((v)->counter)

+/**
+ * atomic_read_uncached - read atomic variable from memory
+ * @v: pointer of type atomic_t
+ *
+ * Atomically reads the value of @v. This is guaranteed to emit an insn
+ * which accesses memory, atomically. No ordering guarantees!
+ */
+#define atomic_read_uncached(v)  asm_or_volatile_ptr_magic(v)

I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking"
atomic a-la get_cpu()...
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 13:12, Kenn Humborg wrote:
> > On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> > >  static inline void wait_for_init_deassert(atomic_t *deassert)
> > >  {
> > > - while (!atomic_read(deassert));
> > > + while (!atomic_read(deassert))
> > > + cpu_relax();
> > >   return;
> > >  }
> >
> > For less-than-briliant people like me, it's totally non-obvious that
> > cpu_relax() is needed for correctness here, not just to make P4 happy.
> >
> > IOW: "atomic_read" name quite unambiguously means "I will read
> > this variable from main memory". Which is not true and creates
> > potential for confusion and bugs.
>
> To me, "atomic_read" means a read which is synchronized with other
> changes to the variable (using the atomic_XXX functions) in such
> a way that I will always only see the "before" or "after"
> state of the variable - never an intermediate state while a
> modification is happening.  It doesn't imply that I have to
> see the "after" state immediately after another thread modifies
> it.

So you are ok with compiler propagating n1 to n2 here:

n1 += atomic_read(x);
other_variable++;
n2 += atomic_read(x);

without accessing x second time. What's the point? Any sane coder
will say that explicitly anyway:

tmp = atomic_read(x);
n1 += tmp;
other_variable++;
n2 += tmp;

if only for the sake of code readability. Because first code
is definitely hinting that it reads RAM twice, and it's actively *bad*
for code readability when in fact it's not the case!

Locking, compiler and CPU barriers are complicated enough already,
please don't make them even harder to understand.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> Satyam Sharma writes:
> In the kernel we use atomic variables in precisely those situations
> where a variable is potentially accessed concurrently by multiple
> CPUs, and where each CPU needs to see updates done by other CPUs in a
> timely fashion.  That is what they are for.  Therefore the compiler
> must not cache values of atomic variables in registers; each
> atomic_read must result in a load and each atomic_set must result in a
> store.  Anything else will just lead to subtle bugs.

Amen.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Saturday 18 August 2007 05:13, Linus Torvalds wrote:
> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> > No code does (or would do, or should do):
> >
> > x.counter++;
> >
> > on an "atomic_t x;" anyway.
>
> That's just an example of a general problem.
>
> No, you don't use "x.counter++". But you *do* use
>
>   if (atomic_read(&x) <= 1)
>
> and loading into a register is stupid and pointless, when you could just
> do it as a regular memory-operand to the cmp instruction.

It doesn't mean that (volatile int*) cast is bad, it means that current gcc
is bad (or "not good enough"). IOW: instead of avoiding volatile cast,
it's better to fix the compiler.

> And as far as the compiler is concerned, the problem is the 100% same:
> combining operations with the volatile memop.
>
> The fact is, a compiler that thinks that
>
>   movl mem,reg
>   cmpl $val,reg
>
> is any better than
>
>   cmpl $val,mem
>
> is just not a very good compiler.

Linus, in all honesty gcc has many more cases of suboptimal code,
case of "volatile" is just one of many.

Off the top of my head:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

unsigned v;
void f(unsigned A) { v = ((unsigned long long)A) * 365384439 >> (27+32); }

gcc-4.1.1 -S -Os -fomit-frame-pointer t.c

f:
movl$365384439, %eax
mull4(%esp)
movl%edx, %eax <= ?
shrl$27, %eax
movl%eax, v
ret

Why is it moving %edx to %eax?

gcc-4.2.1 -S -Os -fomit-frame-pointer t.c

f:
movl$365384439, %eax
mull4(%esp)
movl%edx, %eax <= ?
xorl%edx, %edx <= ??!
shrl$27, %eax
movl%eax, v
ret

Progress... Now we also zero out %edx afterwards for no apparent reason.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
>
>  static inline void wait_for_init_deassert(atomic_t *deassert)
>  {
> - while (!atomic_read(deassert));
> + while (!atomic_read(deassert))
> + cpu_relax();
>   return;
>  }

For less-than-briliant people like me, it's totally non-obvious that
cpu_relax() is needed for correctness here, not just to make P4 happy.

IOW: "atomic_read" name quite unambiguously means "I will read
this variable from main memory". Which is not true and creates
potential for confusion and bugs.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html