Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Evgeniy Polyakov
On Wed, Aug 23, 2006 at 02:43:50AM +0200, Jari Sundell ([EMAIL PROTECTED]) 
wrote:
> Actually, I didn't miss that, it is an orthogonal issue. A timespec
> timeout parameter for the syscall does not imply the use of timespec
> in any timer event, etc. Nor is there any timespec timer in kqueue's
> struct kevent, which is the only (interface related) thing that will
> be exposed.

void * in structure exported to userspace is forbidden.
long in syscall requires wrapper in per-arch code (although that
workaround _is_ there, it does not mean that broken interface should 
be used).
poll uses millisecods - it is perfectly ok.

> Rakshasa

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 04:46:19PM -0700, Ulrich Drepper ([EMAIL PROTECTED]) 
wrote:
> DaveM says there are example programs for the current interfaces.  I
> must admit I haven't seen those either.  So if possible, point the world
> to them again.  If you do that now I'll review everything and write up
> my recommendations re the interface before Monday.

Attached typical usage for inode and timer events.
Network AIO was implemented as separated syscalls.

> -- 
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> 



-- 
Evgeniy Polyakov
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 

#define _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4) \
type name (type1 arg1, type2 arg2, type3 arg3, type4 arg4) \
{\
return syscall(__NR_##name, arg1, arg2, arg3, arg4);\
}

#define _syscall5(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4, \
  type5,arg5) \
type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5) \
{\
return syscall(__NR_##name, arg1, arg2, arg3, arg4, arg5);\
}

#define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4, \
  type5,arg5,type6,arg6) \
type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5, type6 arg6) \
{\
return syscall(__NR_##name, arg1, arg2, arg3, arg4, arg5, arg6);\
}

_syscall4(int, kevent_ctl, int, arg1, unsigned int, argv2, unsigned int, argv3, 
void *, argv4);
_syscall6(int, kevent_get_events, int, arg1, unsigned int, argv2, unsigned int, 
argv3, unsigned int, argv4, void *, argv5, unsigned, arg6);

#define ulog(f, a...) fprintf(stderr, f, ##a)
#define ulog_err(f, a...) ulog(f ": %s [%d].\n", ##a, strerror(errno), errno)

static void usage(char *p)
{
ulog("Usage: %s -t type -e event -o oneshot -p path -n wait_num -h\n", 
p);
}

static int get_id(int type, char *path)
{
int ret = -1;

switch (type) {
case KEVENT_TIMER:
ret = 3000;
break;
case KEVENT_INODE:
ret = open(path, O_RDONLY);
break;
}

return ret;
}

int main(int argc, char *argv[])
{
int ch, fd, err, type, event, oneshot, i, num, wait_num;
char *path;
char buf[4096];
struct ukevent *uk;
struct timeval tm1, tm2;

path = NULL;
type = event = -1;
oneshot = 0;
wait_num = 10;

while ((ch = getopt(argc, argv, "p:t:e:o:n:h")) > 0) {
switch (ch) {
case 'n':
wait_num = atoi(optarg);
break;
case 'p':
path = optarg;
break;
case 't':
type = atoi(optarg);
break;
case 'e':
event = atoi(optarg);
break;
case 'o':
oneshot = atoi(optarg);
break;
default:
usage(argv[0]);
return -1;
}
}

if (event == -1 || type == -1 || (type == KEVENT_INODE && !path)) {
ulog("You need at least -t -e parameters and -p for inode 
notifications.\n");
usage(argv[0]);
return -1;
}

fd = kevent_ctl(0, KEVENT_CTL_INIT, 1, NULL);
if (fd == -1) {
ulog_err("Failed create kevent control block");
return -1;
}

memset(buf, 0, sizeof(buf));

gettimeofday(&tm1, NULL);

num = 1;
for (i=0; ievent = event;
uk->type = type;
if (oneshot)
uk->req_flags |= KEVENT_REQ_ONESHOT;
uk->user[0] = i;
uk->id.raw[0] = get_id(uk->type, path);

err = kevent_ctl(fd, KEVENT_CTL_ADD, 1, uk);
if (err < 0) {
ulog_err("Failed to perform control operation: type=%d, 
event=%d, oneshot=%d", type, event, oneshot);
close(fd);
return err;
}
ulog("%s: err: %d.\n", __func__, err);
if (err) {
ulog("%d: ret_flags: 0x%x, ret_data: %u %d.\n", i, 
uk->ret_flags, uk->ret_data[0], (int)uk->ret_data[1]);
}
}

gettimeofday(&tm2, NULL);

ulog("%08ld.%08ld: Load: diff=%ld usecs.\n", 
tm2.tv_sec, tm2.tv_usec, ((tm2.tv_sec - 
tm1.tv_sec)*100 + (tm2.tv_usec - tm1.tv_usec))/num);

while (1) {
  

Re: [PATCH] locking bug in fib_semantics.c

2006-08-22 Thread Jarek Poplawski
On Tue, Aug 22, 2006 at 12:35:56PM +0200, Jarek Poplawski wrote:
... 
> Hello,
> I've found it at last but on that occasion I've got some
> doubt according to rcu_read_lock and rcu_call treatment:
...

Actually there is one more doubt (bug really, but
not very probable): proc file reading is without any
locking in fib_hash.c, so if somebody uses programs
which do that often, he could have problems while
adding or deleting a route in a wrong time. If it
will be ever changed, fz_nent should also be ++/--
under lock, I think. 

Jarek P.
 
PS: linux-2.6.18-rc4
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Proposed Linux kevent API (was: Re: [take12 0/3] kevent: Generic event handling mechanism.)

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 06:36:07PM -0700, Nicholas Miell ([EMAIL PROTECTED]) 
wrote:
> == The Proposed Linux kevent API == 
> 
> The proposed Linux kevent API is a new unified event handling
> interface, similar in spirit to Windows completion ports and Solaris
> completion ports and similar in fact to the FreeBSD/OS X kqueue
> interface.
> 
> Using a single kernel call, a thread can wait for all possible event
> types that the kernel can generate, instead of past interfaces that
> only allow you to wait for specific subsets of events (e.g. POSIX
> sigevent completions are limited only to AIO completion, timer expiry,
> and the arrival of new messages to a message queue, while epoll_wait
> is just a more efficient method of doing a traditional Unix select or
> poll).
> 
> Instead of evolving the struct sigevent notification methods to allow
> you to continue using standard POSIX interfaces like lio_listio(),
> mq_notify() or timer_create() while queuing completion notifications
> to a kevent completion queue (much the way the Solaris port API is
> designed, or the the API proposed by Ulrich Drepper in "The
> Need for Asynchronous, Zero-Copy Network I/O" as found at
> http://people.redhat.com/drepper/newni.pdf ), kevent choooses to
> follow the FreeBSD route and introduce an entirely new and
> incompatible method of requesting and reporting event notifications
> (while also managing to be incompatible with FreeBSD's kqueue).
> 
> This is done through the introduction of two new syscalls and a
> variety of supporting datatypes. The first function, kevent_ctl(), is
> used to create and manipulate kevent queues, while the second,
> kevent_get_events(), is use to wait for new events.
> 
> 
> They operate as follows:
> 
> int kevent_ctl(int fd, unsigned int cmd, unsigned int num, void *arg);
> 
> fd is the file descriptor referring to the kevent queue to
> manipulate. It is ignored if the cmd parameter is KEVENT_CTL_INIT.
> 
> cmd is the requested operation. It can be one of the following:
> 
>   KEVENT_CTL_INIT - create a new kevent queue and return it's file
>   descriptor. The fd, num, and arg parameters are ignored.
> 
>   KEVENT_CTL_ADD, KEVENT_CTL_MODIFY, KEVENT_CTL_REMOVE - add new,
>   modify existing, or remove existing event notification
>   requests.
> 
> num is the number of struct ukevent in the array pointed to by arg
> 
> arg is an array of struct ukevent. Why it is of type void* and not 
>   struct ukevent* is a mystery.
> 
> When called, kevent_ctl will carry out the operation specified in the
> cmd parameter.
> 
> 
> int kevent_get_events(int ctl_fd, unsigned int min_nr,
>   unsigned int max_nr, unsigned int timeout,
>   void *buf, unsigned flags)
> 
> ctl_fd is the file descriptor referring to the kevent queue.
> 
> min_nr is the minimum number of completed events that
>kevent_get_events will block waiting for.
> 
> max_nr is the number of struct ukevent in buf.
> 
> timeout is the number of milliseconds to wait before returning less
>   than min_nr events. If this is -1, I *think* it'll wait
>   indefinitely, but I'm not sure that msecs_to_jiffies(-1) ends
>   up being MAX_SCHEDULE_TIMEOUT

You forget the case for non-blocked file descriptor.
Here is comment from the code:

 * In nonblocking mode it returns as many events as possible, but not more than 
@max_nr.
 * In blocking mode it waits until timeout or if at least @min_nr events are 
ready.

> buf is a pointer an array of struct ukevent. Why it is of type void*
> and not struct ukevent* is a mystery.
> 
> flags is unused.
> 
> When called, kevent_get_events will wait timeout milliseconds for at
> least min_nr completed events, copying completed struct ukevents to
> buf and deleting any KEVENT_REQ_ONESHOT event requests.
> 
> 
> The bulk of the interface is entirely done through the ukevent struct.
> It is used to add event requests, modify existing event requests,
> specify which event requests to remove, and return completed events.
> 
> struct ukevent contains the following members:
> 
> struct kevent_id id
>This is described as containing the "socket number, file
>descriptor and so on", which I take to mean it contains an fd,
>however for some mysterious reason struct kevent_id contains
>__u32 raw[2] and (for KEVENT_POLL events) the actual fd is
>placed in raw[0] and raw[1] is never mentioned except to
>faithfully copy it around.
> 
>For KEVENT_TIMER events, raw[0] contains a relative time in
>milliseconds and raw[1] is still not used.
> 
>Why the struct member is called "raw" remains a mystery.

If you followed previous patchsets you could find, that there were
network AIO, fs IO and fs-inotify-like notifications.
Some of them use that fields.
I got two u32 numbers to be "union"ed with pointer like user data is.
That pointer should be obtained through Ulrich's dma_allo

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Albert Cahalan

Ulrich Drepper writes:


I so far also haven't taken the time to look exactly at the
interface. I plan to do it asap since this is IMO our big chance
to get it right. I want to have a unifying interface which can
handle all the different events we need and which come up today
and tomorrow.  We have to be able to handle not only file
descriptors and AIO but also timers, signals, message queues
(OK, they are file descriptors but let's make it official),
futexes.  I'm probably missing the one or the other thing now.


Yeah, you're missing one. I must warn you, it's tasteless.
You forgot ptrace events. (everybody now: EEEW!)

The wait-related functions in general are interesting.
People like to use a general event mechanism to deal with
threads exiting. Seriously, it would really help with
porting code from that other OS.

How about SysV semaphores?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Proposed Linux kevent API

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 20:47 -0700, Nicholas Miell wrote:
> On Tue, 2006-08-22 at 20:31 -0700, David Miller wrote:
> > From: Nicholas Miell <[EMAIL PROTECTED]>
> > Date: Tue, 22 Aug 2006 18:36:07 -0700
> > 
> > > Dear DaveM,
> > > 
> > >   Go fuck yourself.
> > 
> > I guess this is the bit that's supposed to make me take you seriously
> > :-)
> 
> Of course. ^_^
> 

Note that when I made this suggestion, I was not literally instructing
you to perform sexual acts upon yourself, especially if such a thing
would be illegal in your jurisdiction (although, IIRC, you moved to
Seattle recently and I'm pretty sure we allow that kind of thing here,
but we don't generally talk about it in public). So, my apologies to
you, Dave, for making such metaphorical instructions.

However, your choice to characterize my technical criticism as "rants"
and "complaints" and your continuous variations on "let's see you do
something better" as if it were a valid response to my objections did
get on my nerves and made it very hard for me to take you seriously. 

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Proposed Linux kevent API

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 20:31 -0700, David Miller wrote:
> From: Nicholas Miell <[EMAIL PROTECTED]>
> Date: Tue, 22 Aug 2006 18:36:07 -0700
> 
> > Dear DaveM,
> > 
> > Go fuck yourself.
> 
> I guess this is the bit that's supposed to make me take you seriously
> :-)

Of course. ^_^

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Proposed Linux kevent API

2006-08-22 Thread David Miller
From: Nicholas Miell <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 18:36:07 -0700

> Dear DaveM,
> 
>   Go fuck yourself.

I guess this is the bit that's supposed to make me take you seriously
:-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Proposed Linux kevent API

2006-08-22 Thread Howard Chu

Nicholas Miell wrote:

Having looked all this over to figure out what it actually does, I can
make the following comments:

- there's a distinct lack of any sort of commenting beyond brief
descriptions of what the occasional function is supposed to do

- the kevent interface is all the horror of the BSD kqueue interface,
but with no compatibility with the BSD kqueue interface.

- lots of parameters from userspace go unsanitized, although I'm not
sure if this will actually cause problems. At the very least, there
should be checks for unknown flags and use of reserved fields, lest
somebody start using them for their own purposes and then their app
breaks when a newer version of the kernel starts using them itself.
  



Which reminds me, why go through the trouble of copying the structs back 
and forth between userspace  and kernel space? Why not map the struct 
array and leave it in place, as I proposed back here?
http://groups.google.com/group/linux.kernel/browse_frm/ 
thread/57847cfedb61bdd5/8d02afa60a8f83af?lnk=gst&q=equeue&rnum= 
1#8d02afa60a8f83af


--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sunhttp://highlandsun.com/hyc
 OpenLDAP Core Teamhttp://www.openldap.org/project/

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 16:46 -0700, Ulrich Drepper wrote:
> I so far also haven't taken the time to look exactly at the interface.
> I plan to do it asap since this is IMO our big chance to get it right.
> I want to have a unifying interface which can handle all the different
> events we need and which come up today and tomorrow.  We have to be able
> to handle not only file descriptors and AIO but also timers, signals,
> message queues (OK, they are file descriptors but let's make it
> official), futexes.  I'm probably missing the one or the other thing now.

Are you sure about signals? I thought about that, but they generally
fall into two categories: signals that have to be signals (i.e. SIGILL,
SIGABRT, SIGFPE, SIGSEGV, etc.) and signals that should be replaced a
queued event notification (SIGALRM, SIGRTMIN-SIGRTMAX).

Of course, that leaves things like SIGTERM, SIGINT, SIGQUIT, etc. so,
uh, nevermind then. Signal redirection to event queues is definitely
needed.

> DaveM says there are example programs for the current interfaces.  I
> must admit I haven't seen those either.  So if possible, point the world
> to them again.  If you do that now I'll review everything and write up
> my recommendations re the interface before Monday.

There's a handful of little test apps at
http://tservice.net.ru/~s0mbre/archive/kevent/ , but they don't work
with the current iteration of the interface. I don't know if there are
others somewhere else.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The Proposed Linux kevent API (was: Re: [take12 0/3] kevent: Generic event handling mechanism.)

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 16:06 -0700, David Miller wrote:
> With the time you spent writing this long email alone you could have
> worked on either documenting Evgeniy's interfaces or trying to write
> test applications against kevent to validate how useful the interfaces
> are and if there are any problems with them.
> 
> You choose to rant and complain instead of participate.
> 
> Therefore, many of us cannot take you seriously. 

== The Proposed Linux kevent API == 

The proposed Linux kevent API is a new unified event handling
interface, similar in spirit to Windows completion ports and Solaris
completion ports and similar in fact to the FreeBSD/OS X kqueue
interface.

Using a single kernel call, a thread can wait for all possible event
types that the kernel can generate, instead of past interfaces that
only allow you to wait for specific subsets of events (e.g. POSIX
sigevent completions are limited only to AIO completion, timer expiry,
and the arrival of new messages to a message queue, while epoll_wait
is just a more efficient method of doing a traditional Unix select or
poll).

Instead of evolving the struct sigevent notification methods to allow
you to continue using standard POSIX interfaces like lio_listio(),
mq_notify() or timer_create() while queuing completion notifications
to a kevent completion queue (much the way the Solaris port API is
designed, or the the API proposed by Ulrich Drepper in "The
Need for Asynchronous, Zero-Copy Network I/O" as found at
http://people.redhat.com/drepper/newni.pdf ), kevent choooses to
follow the FreeBSD route and introduce an entirely new and
incompatible method of requesting and reporting event notifications
(while also managing to be incompatible with FreeBSD's kqueue).

This is done through the introduction of two new syscalls and a
variety of supporting datatypes. The first function, kevent_ctl(), is
used to create and manipulate kevent queues, while the second,
kevent_get_events(), is use to wait for new events.


They operate as follows:

int kevent_ctl(int fd, unsigned int cmd, unsigned int num, void *arg);

fd is the file descriptor referring to the kevent queue to
manipulate. It is ignored if the cmd parameter is KEVENT_CTL_INIT.

cmd is the requested operation. It can be one of the following:

KEVENT_CTL_INIT - create a new kevent queue and return it's file
descriptor. The fd, num, and arg parameters are ignored.

KEVENT_CTL_ADD, KEVENT_CTL_MODIFY, KEVENT_CTL_REMOVE - add new,
modify existing, or remove existing event notification
requests.

num is the number of struct ukevent in the array pointed to by arg

arg is an array of struct ukevent. Why it is of type void* and not 
struct ukevent* is a mystery.

When called, kevent_ctl will carry out the operation specified in the
cmd parameter.


int kevent_get_events(int ctl_fd, unsigned int min_nr,
unsigned int max_nr, unsigned int timeout,
void *buf, unsigned flags)

ctl_fd is the file descriptor referring to the kevent queue.

min_nr is the minimum number of completed events that
   kevent_get_events will block waiting for.

max_nr is the number of struct ukevent in buf.

timeout is the number of milliseconds to wait before returning less
than min_nr events. If this is -1, I *think* it'll wait
indefinitely, but I'm not sure that msecs_to_jiffies(-1) ends
up being MAX_SCHEDULE_TIMEOUT

buf is a pointer an array of struct ukevent. Why it is of type void*
and not struct ukevent* is a mystery.

flags is unused.

When called, kevent_get_events will wait timeout milliseconds for at
least min_nr completed events, copying completed struct ukevents to
buf and deleting any KEVENT_REQ_ONESHOT event requests.


The bulk of the interface is entirely done through the ukevent struct.
It is used to add event requests, modify existing event requests,
specify which event requests to remove, and return completed events.

struct ukevent contains the following members:

struct kevent_id id
   This is described as containing the "socket number, file
   descriptor and so on", which I take to mean it contains an fd,
   however for some mysterious reason struct kevent_id contains
   __u32 raw[2] and (for KEVENT_POLL events) the actual fd is
   placed in raw[0] and raw[1] is never mentioned except to
   faithfully copy it around.

   For KEVENT_TIMER events, raw[0] contains a relative time in
   milliseconds and raw[1] is still not used.

   Why the struct member is called "raw" remains a mystery.

__u32 type
  The actual event type, either KEVENT_POLL for fd polling or
  KEVENT_TIMER for timers.

__u32 event
  For events of type KEVENT_POLL, event contains the polling flags
  of interest (i.e. POLLIN, POLLPRI, POLLOUT, POLLERR, POLLHUP,
  POLLNVAL).

  For events of type KEVENT_TIMER, event is ignored.

__u32 req_flags
  Per-event 

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Jari Sundell

On 8/23/06, David Miller <[EMAIL PROTECTED]> wrote:

> There are system calls that take timespec, so I assume the magic is
> already available for handling the timeout argument of kevent.

System calls are one thing, they can be translated for these
kinds of situations.  But this doesn't help, and nothing at
all can be done, for datastructures exposed to userspace via
mmap()'d buffers, which is what kevent will be doing.

This is what Alexey is trying to explain to you.


Actually, I didn't miss that, it is an orthogonal issue. A timespec
timeout parameter for the syscall does not imply the use of timespec
in any timer event, etc. Nor is there any timespec timer in kqueue's
struct kevent, which is the only (interface related) thing that will
be exposed.

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ProxyARP and IPSec

2006-08-22 Thread H. Peter Anvin

Hello all,

I am having a puzzlement combining ProxyARP and IPsec.  Specificially, I 
want to take a single address from a local LAN and extend it via IPsec 
to another site.


Unfortunately IPsec tunnels, unlike all other tunnels, don't have 
pseudo-devices associated with them.  I understand this to be from 
desire of uniformity with the other modes of IPsec, but this is one of 
many cases where it causes problems.


Specifically, Linux will not ProxyARP for an address unless it has a 
route for it, *and* that route either has a DNAT marking or points to a 
different interface than the input interface:


net/ipv4/arp.c:

   855  } else if (IN_DEV_FORWARD(in_dev)) {
   856  if ((rt->rt_flags&RTCF_DNAT) ||
   857  (addr_type == RTN_UNICAST  && 
rt->u.dst.dev != dev &&


   858   (arp_fwd_proxy(in_dev, rt) || 
pneigh_lookup(&arp_tbl, &tip, dev, 0 {


However, since IPsec tunnels don't have interfaces associated with it, 
the route to the other side of the IPsec tunnel with point to the same 
interface (there is, elsewhere, a security policy associated with the 
address), and this selection will fail.


Does anyone know of a trick around this issue?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread David Miller
From: "Jari Sundell" <[EMAIL PROTECTED]>
Date: Wed, 23 Aug 2006 02:28:32 +0200

> There are system calls that take timespec, so I assume the magic is
> already available for handling the timeout argument of kevent.

System calls are one thing, they can be translated for these
kinds of situations.  But this doesn't help, and nothing at
all can be done, for datastructures exposed to userspace via
mmap()'d buffers, which is what kevent will be doing.

This is what Alexey is trying to explain to you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Jari Sundell

On 8/23/06, Alexey Kuznetsov <[EMAIL PROTECTED]> wrote:

Let me explain, as a person who did this mistake and deeply
regrets about this.

F.e. in this case you just cannot use kevents in 32bit application
on x86_64, unless you add the whole translation layer inside kevent core.
Even when you deal with plain syscall, translation is a big pain,
but when you use mmapped buffer, it can be simply impossible.

F.e. my mistake was "unsigned long" in struct tpacket_hdr in linux/if_packet.h.
It makes use of mmapped packet socket essentially impossible by 32bit
applications on 64bit archs.


There are system calls that take timespec, so I assume the magic is
already available for handling the timeout argument of kevent.
Although I'm not entirely sure about the kqueue timer interface, there
isn't any reason timespec would need to be written to the mmaped
buffer for the rest.

AFAICS, only struct ukevent is visible to the user, same would go for
kqueue's struct kevent.

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bridge-netfilter: don't overwrite memory outside of skb

2006-08-22 Thread Stephen Hemminger
The bridge netfilter code needs to check for space at the
front of the skb before overwriting; otherwise if skb from
device doesn't have headroom, then it will cause random
memory corruption.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- linux-2.6.17.9.orig/include/linux/netfilter_bridge.h2006-08-21 
11:39:58.0 -0700
+++ linux-2.6.17.9/include/linux/netfilter_bridge.h 2006-08-21 
11:40:26.0 -0700
@@ -47,18 +47,26 @@
 #define BRNF_BRIDGED   0x08
 #define BRNF_NF_BRIDGE_PREROUTING  0x10
 
-
 /* Only used in br_forward.c */
-static inline
-void nf_bridge_maybe_copy_header(struct sk_buff *skb)
+static inline int nf_bridge_maybe_copy_header(struct sk_buff *skb)
 {
+   int err;
+
if (skb->nf_bridge) {
if (skb->protocol == __constant_htons(ETH_P_8021Q)) {
+   err = skb_cow(skb, 18);
+   if (err)
+   return err;
memcpy(skb->data - 18, skb->nf_bridge->data, 18);
skb_push(skb, 4);
-   } else
+   } else {
+   err = skb_cow(skb, 16);
+   if (err)
+   return err;
memcpy(skb->data - 16, skb->nf_bridge->data, 16);
+   }
}
+   return 0;
 }
 
 /* This is called by the IP fragmenting code and it ensures there is
--- linux-2.6.17.9.orig/net/bridge/br_forward.c 2006-08-18 09:26:24.0 
-0700
+++ linux-2.6.17.9/net/bridge/br_forward.c  2006-08-21 11:40:26.0 
-0700
@@ -43,11 +43,15 @@
else {
 #ifdef CONFIG_BRIDGE_NETFILTER
/* ip_refrag calls ip_fragment, doesn't copy the MAC header. */
-   nf_bridge_maybe_copy_header(skb);
+   if (nf_bridge_maybe_copy_header(skb))
+   kfree_skb(skb);
+   else
 #endif
-   skb_push(skb, ETH_HLEN);
+   {
+   skb_push(skb, ETH_HLEN);
 
-   dev_queue_xmit(skb);
+   dev_queue_xmit(skb);
+   }
}
 
return 0;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] bridge-netfilter: simplify nf_bridge_pad

2006-08-22 Thread shemminger
Do some simple optimization on the nf_bridge_pad() function
and don't use magic constants. Eliminate a double call and
the #ifdef'd code for CONFIG_BRIDGE_NETFILTER.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- br-nf.orig/include/linux/netfilter_bridge.h 2006-08-22 16:46:47.0 
-0700
+++ br-nf/include/linux/netfilter_bridge.h  2006-08-22 16:52:25.0 
-0700
@@ -5,9 +5,8 @@
  */
 
 #include 
-#if defined(__KERNEL__) && defined(CONFIG_BRIDGE_NETFILTER)
 #include 
-#endif
+#include 
 
 /* Bridge Hooks */
 /* After promisc drops, checksum checks. */
@@ -57,16 +56,10 @@
 
 /* This is called by the IP fragmenting code and it ensures there is
  * enough room for the encapsulating header (if there is one). */
-static inline
-int nf_bridge_pad(struct sk_buff *skb)
+static inline int nf_bridge_pad(const struct sk_buff *skb)
 {
-   if (skb->protocol == __constant_htons(ETH_P_IP))
-   return 0;
-   if (skb->nf_bridge) {
-   if (skb->protocol == __constant_htons(ETH_P_8021Q))
-   return 4;
-   }
-   return 0;
+   return (skb->nf_bridge && skb->protocol == htons(ETH_P_8021Q))
+   ? VLAN_HLEN : 0;
 }
 
 struct bridge_skb_cb {
@@ -78,6 +71,7 @@
 extern int brnf_deferred_hooks;
 #else
 #define nf_bridge_maybe_copy_header(skb)   (0)
+#define nf_bridge_pad(skb) (0)
 #endif /* CONFIG_BRIDGE_NETFILTER */
 
 #endif /* __KERNEL__ */
--- br-nf.orig/net/ipv4/ip_output.c 2006-08-22 16:43:41.0 -0700
+++ br-nf/net/ipv4/ip_output.c  2006-08-22 16:51:22.0 -0700
@@ -425,7 +425,7 @@
int ptr;
struct net_device *dev;
struct sk_buff *skb2;
-   unsigned int mtu, hlen, left, len, ll_rs;
+   unsigned int mtu, hlen, left, len, ll_rs, pad;
int offset;
__be16 not_last_frag;
struct rtable *rt = (struct rtable*)skb->dst;
@@ -554,14 +554,13 @@
left = skb->len - hlen; /* Space per frame */
ptr = raw + hlen;   /* Where to start from */
 
-#ifdef CONFIG_BRIDGE_NETFILTER
/* for bridged IP traffic encapsulated inside f.e. a vlan header,
-* we need to make room for the encapsulating header */
-   ll_rs = LL_RESERVED_SPACE_EXTRA(rt->u.dst.dev, nf_bridge_pad(skb));
-   mtu -= nf_bridge_pad(skb);
-#else
-   ll_rs = LL_RESERVED_SPACE(rt->u.dst.dev);
-#endif
+* we need to make room for the encapsulating header
+*/
+   pad = nf_bridge_pad(skb);
+   ll_rs = LL_RESERVED_SPACE_EXTRA(rt->u.dst.dev, pad);
+   mtu -= pad;
+
/*
 *  Fragment the datagram.
 */

--


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] bridge-netfilter: debug message fixes

2006-08-22 Thread shemminger
If CONFIG_NETFILTER_DEBUG is enabled, it shouldn't change the
actions of the filtering. The message about skb->dst being NULL
is commonly triggered by dhclient, so it is useless. Make sure all
messages end in newline.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- br-nf.orig/net/bridge/br_netfilter.c2006-08-22 16:48:56.0 
-0700
+++ br-nf/net/bridge/br_netfilter.c 2006-08-22 17:07:28.0 -0700
@@ -718,16 +718,6 @@
else
pf = PF_INET6;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-   /* Sometimes we get packets with NULL ->dst here (for example,
-* running a dhcp client daemon triggers this). This should now
-* be fixed, but let's keep the check around. */
-   if (skb->dst == NULL) {
-   printk(KERN_CRIT "br_netfilter: skb->dst == NULL.");
-   return NF_ACCEPT;
-   }
-#endif
-
nf_bridge = skb->nf_bridge;
nf_bridge->physoutdev = skb->dev;
realindev = nf_bridge->physindev;
@@ -809,7 +799,7 @@
 * keep the check just to be sure... */
if (skb->mac.raw < skb->head || skb->mac.raw + ETH_HLEN > skb->data) {
printk(KERN_CRIT "br_netfilter: Argh!! br_nf_post_routing: "
-  "bad mac.raw pointer.");
+  "bad mac.raw pointer.\n");
goto print_error;
}
 #endif
@@ -827,7 +817,7 @@
 
 #ifdef CONFIG_NETFILTER_DEBUG
if (skb->dst == NULL) {
-   printk(KERN_CRIT "br_netfilter: skb->dst == NULL.");
+   printk(KERN_INFO "br_netfilter post_routing: skb->dst == 
NULL\n");
goto print_error;
}
 #endif
@@ -864,6 +854,7 @@
}
printk(" head:%p, raw:%p, data:%p\n", skb->head, skb->mac.raw,
   skb->data);
+   dump_stack();
return NF_ACCEPT;
 #endif
 }

--


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] bridge-netfilter: code rearrangement for clarity

2006-08-22 Thread shemminger
Cleanup and rearrangement for better style and clarity:
Split the function nf_bridge_maybe_copy_header into two pieces
Move copy portion out of line.
Use Ethernet header size macros.
Use header file to handle CONFIG_NETFILTER_BRIDGE differences

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- br-nf.orig/include/linux/netfilter_bridge.h 2006-08-22 16:45:05.0 
-0700
+++ br-nf/include/linux/netfilter_bridge.h  2006-08-22 16:46:47.0 
-0700
@@ -47,26 +47,12 @@
 
 
 /* Only used in br_forward.c */
-static inline
-int nf_bridge_maybe_copy_header(struct sk_buff *skb)
+extern int nf_bridge_copy_header(struct sk_buff *skb);
+static inline int nf_bridge_maybe_copy_header(struct sk_buff *skb)
 {
-   int err;
-
-   if (skb->nf_bridge) {
-   if (skb->protocol == __constant_htons(ETH_P_8021Q)) {
-   err = skb_cow(skb, 18);
-   if (err)
-   return err;
-   memcpy(skb->data - 18, skb->nf_bridge->data, 18);
-   skb_push(skb, 4);
-   } else {
-   err = skb_cow(skb, 16);
-   if (err)
-   return err;
-   memcpy(skb->data - 16, skb->nf_bridge->data, 16);
-   }
-   }
-   return 0;
+   if (skb->nf_bridge)
+   return nf_bridge_copy_header(skb);
+   return 0;
 }
 
 /* This is called by the IP fragmenting code and it ensures there is
@@ -90,6 +76,8 @@
 };
 
 extern int brnf_deferred_hooks;
+#else
+#define nf_bridge_maybe_copy_header(skb)   (0)
 #endif /* CONFIG_BRIDGE_NETFILTER */
 
 #endif /* __KERNEL__ */
--- br-nf.orig/net/bridge/br_forward.c  2006-08-22 16:44:04.0 -0700
+++ br-nf/net/bridge/br_forward.c   2006-08-22 16:48:12.0 -0700
@@ -38,13 +38,10 @@
if (packet_length(skb) > skb->dev->mtu && !skb_is_gso(skb))
kfree_skb(skb);
else {
-#ifdef CONFIG_BRIDGE_NETFILTER
/* ip_refrag calls ip_fragment, doesn't copy the MAC header. */
if (nf_bridge_maybe_copy_header(skb))
kfree_skb(skb);
-   else
-#endif
-   {
+   else {
skb_push(skb, ETH_HLEN);
 
dev_queue_xmit(skb);
--- br-nf.orig/net/bridge/br_netfilter.c2006-08-22 16:44:04.0 
-0700
+++ br-nf/net/bridge/br_netfilter.c 2006-08-22 16:48:56.0 -0700
@@ -127,14 +127,37 @@
 
 static inline void nf_bridge_save_header(struct sk_buff *skb)
 {
-int header_size = 16;
+int header_size = ETH_HLEN;
 
if (skb->protocol == htons(ETH_P_8021Q))
-   header_size = 18;
+   header_size += VLAN_HLEN;
 
memcpy(skb->nf_bridge->data, skb->data - header_size, header_size);
 }
 
+/*
+ * When forwarding bridge frames, we save a copy of the original
+ * header before processing.
+ */
+int nf_bridge_copy_header(struct sk_buff *skb)
+{
+   int err;
+int header_size = ETH_HLEN;
+
+   if (skb->protocol == htons(ETH_P_8021Q))
+   header_size += VLAN_HLEN;
+
+   err = skb_cow(skb, header_size);
+   if (err)
+   return err;
+
+   memcpy(skb->data - header_size, skb->nf_bridge->data, header_size);
+
+   if (skb->protocol == htons(ETH_P_8021Q))
+   __skb_push(skb, VLAN_HLEN);
+   return 0;
+}
+
 /* PF_BRIDGE/PRE_ROUTING */
 /* Undo the changes made for ip6tables PREROUTING and continue the
  * bridge PRE_ROUTING hook. */

--


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] bridge-netfilter fixes

2006-08-22 Thread shemminger
This set of patches fixes issues with bridge netfilter code.
First patch is for 2.6.18 and fixes a crash. Later patches
could be deferred, they just cleanup the style, usage, etc.

--


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] bridge-netfilter: memory corruption fix

2006-08-22 Thread shemminger
The bridge-netfilter code will overwrite memory if there is not headroom
in the skb to save the header.  This first showed up when using Xen with
sky2 driver that doesn't allocate the extra space.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- br-nf.orig/include/linux/netfilter_bridge.h 2006-08-22 16:43:41.0 
-0700
+++ br-nf/include/linux/netfilter_bridge.h  2006-08-22 16:45:05.0 
-0700
@@ -48,15 +48,25 @@
 
 /* Only used in br_forward.c */
 static inline
-void nf_bridge_maybe_copy_header(struct sk_buff *skb)
+int nf_bridge_maybe_copy_header(struct sk_buff *skb)
 {
+   int err;
+
if (skb->nf_bridge) {
if (skb->protocol == __constant_htons(ETH_P_8021Q)) {
+   err = skb_cow(skb, 18);
+   if (err)
+   return err;
memcpy(skb->data - 18, skb->nf_bridge->data, 18);
skb_push(skb, 4);
-   } else
+   } else {
+   err = skb_cow(skb, 16);
+   if (err)
+   return err;
memcpy(skb->data - 16, skb->nf_bridge->data, 16);
+   }
}
+   return 0;
 }
 
 /* This is called by the IP fragmenting code and it ensures there is
--- br-nf.orig/net/bridge/br_forward.c  2006-08-22 16:43:41.0 -0700
+++ br-nf/net/bridge/br_forward.c   2006-08-22 16:44:04.0 -0700
@@ -40,11 +40,15 @@
else {
 #ifdef CONFIG_BRIDGE_NETFILTER
/* ip_refrag calls ip_fragment, doesn't copy the MAC header. */
-   nf_bridge_maybe_copy_header(skb);
+   if (nf_bridge_maybe_copy_header(skb))
+   kfree_skb(skb);
+   else
 #endif
-   skb_push(skb, ETH_HLEN);
+   {
+   skb_push(skb, ETH_HLEN);
 
-   dev_queue_xmit(skb);
+   dev_queue_xmit(skb);
+   }
}
 
return 0;

--


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bcm43xx-softmac - set correct value in mac_suspended for ifdown/ifup sequence

2006-08-22 Thread Larry Finger

John,

Please apply this to wireless-2.6.

Michael - bcm43xx-d80211 probably needs this as well.

Larry

---

When bcm43xx-softmac is given an ifdown/ifup sequence, the value for bcm->mac_suspended ends up 
wrong, which leads to a large number of assert(bcm->mac_suspended>=0) messages. This one-line patch 
fixes this problem.


Signed-Off-By: Larry Finger <[EMAIL PROTECTED]>

index b095f3c..f532f3c 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3484,6 +3484,7 @@ int bcm43xx_select_wireless_core(struct
bcm43xx_macfilter_clear(bcm, BCM43xx_MACFILTER_ASSOC);
bcm43xx_macfilter_set(bcm, BCM43xx_MACFILTER_SELF, (u8 
*)(bcm->net_dev->dev_addr));
bcm43xx_security_init(bcm);
+   bcm->mac_suspended = 1;
ieee80211softmac_start(bcm->net_dev);

/* Let's go! Be careful after enabling the IRQs.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Indan Zupancic
On Wed, August 23, 2006 1:32, Alexey Kuznetsov said:

>> Isn't a socket freed until all skb are handled? In which case the limit on 
>> the number of open
>> files limits the total memory usage? (Same as with streaming sockets?)
>
> Alas. Number of closed sockets is not limited. Actually, it is limited
> by sk_max_ack_backlog*max_files, which is a lot.

Hmm... So setting sk_max_ack_backlog to 1 makes it limited by max_files,
which should make the worst case the same as for streaming sockets, right?

> The problem is specific for unconnected datagram sockets
> (predicate unix_peer(other) != sk)

Doesn't that mean that both sockets are connected to eachother? I mean,
if only this socket connects to the other the above check isn't true.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Ulrich Drepper
I so far also haven't taken the time to look exactly at the interface.
I plan to do it asap since this is IMO our big chance to get it right.
I want to have a unifying interface which can handle all the different
events we need and which come up today and tomorrow.  We have to be able
to handle not only file descriptors and AIO but also timers, signals,
message queues (OK, they are file descriptors but let's make it
official), futexes.  I'm probably missing the one or the other thing now.

DaveM says there are example programs for the current interfaces.  I
must admit I haven't seen those either.  So if possible, point the world
to them again.  If you do that now I'll review everything and write up
my recommendations re the interface before Monday.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Andrew Morton
On Tue, 22 Aug 2006 15:17:47 -0700 (PDT)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Tue, 22 Aug 2006 15:01:44 -0700
> 
> > If there _is_ something wrong with kqueue then let us identify those
> > weaknesses and then diverge.
> 
> Evgeniy already enumerated this, both on his web site and in the
> current thread.

http://tservice.net.ru/~s0mbre/, fails.  Looks in changelogs, also fails>

Best I can find is
http://tservice.net.ru/~s0mbre/blog/devel/kevent/index.html, and that's
doesn't cover these things.

At some stage we're going to need to tell Linus (for example) what we've
done and why we did it.  I don't know how to do that.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IP1000A: IC Plus update 2006-08-22

2006-08-22 Thread Francois Romieu
Jesse Huang <[EMAIL PROTECTED]> :
> Dear All:
> I had regenerate this patch from:
> git://git.kernel.org/pub/scm/linux/kernel/git/penberg/netdev-ipg-2.6.git
> 
> And, submit those modifications as one patch.

The suggestion was probably to submit the whole driver as one patch
to akpm for wider testing when it is ready (it still is a bit rough
imho). Unrelated changes make more sense in incremental, isolated
patches as you used to submit before.

I have made some surgery to apply your previous patchset with
former descriptive commit messages, plus your recent codingstyle
changes and a few more.

You'll find it either in branch 'netdev-ipg' at:
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git
or as a serie of patches at:
http://www.fr.zoreil.com/linux/2.6.x/2.6.18-rc4/ip1000

The serie of patches comes straight from the (now old) git tree.
It applies correctly against 2.6.18-git-of-the-day.

The result should not be too far from penberg git + your all-in-one
patch but I have not checked it yet. I'd appreciate if you could
review it.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov
Hello!

> Isn't a socket freed until all skb are handled? In which case the limit on 
> the number of open
> files limits the total memory usage? (Same as with streaming sockets?)

Alas. Number of closed sockets is not limited. Actually, it is limited
by sk_max_ack_backlog*max_files, which is a lot.

The problem is specific for unconnected datagram sockets
(predicate unix_peer(other) != sk)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Randy.Dunlap
On Tue, 22 Aug 2006 15:58:12 -0700 Nicholas Miell wrote:

> On Tue, 2006-08-22 at 14:37 -0700, Randy.Dunlap wrote:
> > On Tue, 22 Aug 2006 14:13:02 -0700 Nicholas Miell wrote:
> > 
> > > On Wed, 2006-08-23 at 00:16 +0400, Evgeniy Polyakov wrote:
> > > > On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL 
> > > > PROTECTED]) wrote:
> > > > > On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > > > > Of course, since you already know how all this stuff is supposed to
> > > > > work, you could maybe write it down somewhere?
> > > > 
> > > > I will write documantation, but as you can see some interfaces are
> > > > changed.
> > > 
> > > Thanks; rapidly changing interfaces need good documentation even more
> > > than stable interfaces simply because reverse engineering the intended
> > > API from a changing implementation becomes even more difficult.
> > 
> > OK, I don't quite get it.
> > Can you be precise about what you would like?
> > 
> > a.  good documentation
> > b.  a POSIX API
> > c.  a Windows-compatible API
> > d.  other?
> > 
> > and we won't make you use any of this code.
> 
> I want something that I can be confident won't be replaced again in two
> years because nobody noticed problems with the old API design or they're
> just feeling very NIH with their snazzy new feature.
> 
> Maybe then we won't end up with another in the { signal/sigaction,
> waitpid/wait4, select/pselect, poll/ppol,  msgrcv, mq_receive,
> io_getevents, aio_suspend/aio_return, epoll_wait, inotify read,
> kevent_get_events } collection -- or do you like having a maze of
> twisted interfaces, all subtly different and none supporting the
> complete feature set?
> 
> Good documentation giving enough detail to judge the design and an API
> that fits with the current POSIX API (at least, the parts that everybody
> agrees don't suck) goes a long way toward assuaging my fears that this
> won't just be another waste of effort, doomed to be replaced by the Next
> Great Thing (We Really Mean It This Time!) in unified event loop API
> design or whatever other interface somebody happens to be working on.
> 
> ---

OK, thank you for elaborating.

I suppose that I am more  {cynical, sarcastic,
practial, pragmatic}.  I don't have a crystal ball for 2 years
out and I don't know anyone who does.

IMO we do the best that we can given some human constraints
and probably some marketplace constraints (like ship something
instead of playing with it for 5 years before shipping it).


> This is made extraordinarily difficult by the fact kernel people don't
> even agree themselves on what APIs should look like anyway and Linus
> won't take a stand on the issue -- people with influence are
> simultaneously arguing things like:
> 
> - ioctls are bad because they aren't typesafe and you should use
> syscalls instead because they are typesafe
> 
> - ioctls are good, because they're much easier to add than syscalls,
> type safety can be supplied by the library wrapper, and syscalls are a
> (relatively) scarce resource, harder to wire up in the first place, and
> are more difficult to make optional or remove entirely if you decide
> they were a stupid idea.

Yes, I was recently part of that argument in Ottawa.

> - multiplexors are bad because they're too complex or not typesafe
> 
> - multiplexors are good because they save syscall slots or ioctl numbers
> and the library wrapper provides the typesafety anyway.

Multiplexors have already lost AFAIK.  Unless someone changes their
mind.  Which happens and will continue to happen.

> - instead of syscalls or ioctls, you should create a whole new
> filesystem that has a bunch of magic files that you read from and write
> to in order to talk to the kernel

Yep.  Some people like that one.  Not everyone.

> - filesystem interfaces are bad, because they're take more effort to
> write than a syscall or a ioctl and nobody seems to know how to maintain
> and evolve a filesystem-based ABI or make them easy to use outside of a
> fragile shell script (see: sysfs)

Ack.

> - that everything in those custom filesystems should ASCII strings and
> nobody needs an actual grammar describing how to parse them, we can just
> break userspace whenever we feel like it

sysfs requires one value per file.  Little parsing required.
But I don't know how to capture atomic values from N files with sysfs.

> - that everything in those custom filesystems should be C structs, and
> screw the shell scripts

Hm, I don't recall that one.

> - new filesystem metadata should be exposed by:
>   - xattrs
>   - ioctls
>   - new syscalls
>   or
>   - named streams/forks/not-xattrs
>   and three out of four of these suggestions are completely wrong for
>   some critical reason
> 
> - meanwhile, the networking folks are doing everything via AF_NETLINK
> sockets instead of syscalls or ioctl or whatever, I guess because the
> network stack is what's most familiar to them

I sympathize with you on that 

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Alexey Kuznetsov
Hello!

> >No way - timespec uses long.
> 
> I must have missed that discussion. Please enlighten me in what regard
> using an opaque type with lower resolution is preferable to a type
> defined in POSIX for this sort of purpose.

Let me explain, as a person who did this mistake and deeply
regrets about this.

F.e. in this case you just cannot use kevents in 32bit application
on x86_64, unless you add the whole translation layer inside kevent core.
Even when you deal with plain syscall, translation is a big pain,
but when you use mmapped buffer, it can be simply impossible.

F.e. my mistake was "unsigned long" in struct tpacket_hdr in linux/if_packet.h.
It makes use of mmapped packet socket essentially impossible by 32bit
applications on 64bit archs.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread David Miller
From: Nicholas Miell <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 15:58:12 -0700

> Honestly, somebody with enough clout to make it stick needs to write out
> a spec describing what new kernel interfaces should look like and how
> they should fit in with existing interfaces.

With the time you spent writing this long email alone you could have
worked on either documenting Evgeniy's interfaces or trying to write
test applications against kevent to validate how useful the interfaces
are and if there are any problems with them.

You choose to rant and complain instead of participate.

Therefore, many of us cannot take you seriously.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 14:25 -0700, David Miller wrote:
> From: Nicholas Miell <[EMAIL PROTECTED]>
> Date: Tue, 22 Aug 2006 14:13:40 -0700
> 
> > And how is the quality of the work to be judged if the work isn't
> > commented, documented and explained, especially the userland-visible
> > parts that *cannot* *ever* *be* *changed* *or* *removed* once they're in
> > a stable kernel release?
> 
> Are you even willing to look at the collection of example applications
> Evgeniy wrote against this API?
> 
> That is the true test of a set of interfaces, what happens when you
> try to actually use them in real programs.
> 
> Everything else is fluff, including standards and "documentation".
> 
> He even bothered to benchmark things, and post assosciated graphs and
> performance analysis during the course of development.

I wasn't aware that any of these existed, he didn't mention them in this
patch series. Having now looked, all I've managed to find are a series
of simple example apps that no longer work because of API changes.

Also, if you've been paying attention, you'll note that I've never
criticized the performance or quality of the underlying kevent
implementation -- as best I can tell, aside from some lockdep complaints
(which, afaik, are the result of lockdep's limitations rather than
problems with kevent), the internals of kevent are excellent.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 14:37 -0700, Randy.Dunlap wrote:
> On Tue, 22 Aug 2006 14:13:02 -0700 Nicholas Miell wrote:
> 
> > On Wed, 2006-08-23 at 00:16 +0400, Evgeniy Polyakov wrote:
> > > On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL 
> > > PROTECTED]) wrote:
> > > > On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > > > Of course, since you already know how all this stuff is supposed to
> > > > work, you could maybe write it down somewhere?
> > > 
> > > I will write documantation, but as you can see some interfaces are
> > > changed.
> > 
> > Thanks; rapidly changing interfaces need good documentation even more
> > than stable interfaces simply because reverse engineering the intended
> > API from a changing implementation becomes even more difficult.
> 
> OK, I don't quite get it.
> Can you be precise about what you would like?
> 
> a.  good documentation
> b.  a POSIX API
> c.  a Windows-compatible API
> d.  other?
> 
> and we won't make you use any of this code.

I want something that I can be confident won't be replaced again in two
years because nobody noticed problems with the old API design or they're
just feeling very NIH with their snazzy new feature.

Maybe then we won't end up with another in the { signal/sigaction,
waitpid/wait4, select/pselect, poll/ppol,  msgrcv, mq_receive,
io_getevents, aio_suspend/aio_return, epoll_wait, inotify read,
kevent_get_events } collection -- or do you like having a maze of
twisted interfaces, all subtly different and none supporting the
complete feature set?

Good documentation giving enough detail to judge the design and an API
that fits with the current POSIX API (at least, the parts that everybody
agrees don't suck) goes a long way toward assuaging my fears that this
won't just be another waste of effort, doomed to be replaced by the Next
Great Thing (We Really Mean It This Time!) in unified event loop API
design or whatever other interface somebody happens to be working on.

---

This is made extraordinarily difficult by the fact kernel people don't
even agree themselves on what APIs should look like anyway and Linus
won't take a stand on the issue -- people with influence are
simultaneously arguing things like:

- ioctls are bad because they aren't typesafe and you should use
syscalls instead because they are typesafe

- ioctls are good, because they're much easier to add than syscalls,
type safety can be supplied by the library wrapper, and syscalls are a
(relatively) scarce resource, harder to wire up in the first place, and
are more difficult to make optional or remove entirely if you decide
they were a stupid idea.

- multiplexors are bad because they're too complex or not typesafe

- multiplexors are good because they save syscall slots or ioctl numbers
and the library wrapper provides the typesafety anyway.

- instead of syscalls or ioctls, you should create a whole new
filesystem that has a bunch of magic files that you read from and write
to in order to talk to the kernel

- filesystem interfaces are bad, because they're take more effort to
write than a syscall or a ioctl and nobody seems to know how to maintain
and evolve a filesystem-based ABI or make them easy to use outside of a
fragile shell script (see: sysfs)

- that everything in those custom filesystems should ASCII strings and
nobody needs an actual grammar describing how to parse them, we can just
break userspace whenever we feel like it

- that everything in those custom filesystems should be C structs, and
screw the shell scripts

- new filesystem metadata should be exposed by:
- xattrs
- ioctls
- new syscalls
or
- named streams/forks/not-xattrs
  and three out of four of these suggestions are completely wrong for
  some critical reason

- meanwhile, the networking folks are doing everything via AF_NETLINK
sockets instead of syscalls or ioctl or whatever, I guess because the
network stack is what's most familiar to them

- and there's the usual arguments about typedefs verses bare struct
names, #defines verses enums, returning 0 on success vs. 0 on failure,
and lots of other piddly stupid stuff that somebody just needs to say
"this is how it's done and no arguing" about.

Honestly, somebody with enough clout to make it stick needs to write out
a spec describing what new kernel interfaces should look like and how
they should fit in with existing interfaces.

It'd probably make Evgeniy's life easier if you could just point at the
interface guidelines and say "you did this wrong" instead of random
people telling him to change his design and random other people telling
him to change it back.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Indan Zupancic
On Wed, August 23, 2006 0:34, Alexey Kuznetsov said:
>> > It is the only protection of commiting infinite amount of memory to a 
>> > socket.
>>
>> Doesn't the "if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)" check in
>> sock_alloc_send_pskb()
>> limit things already?
>
> Unfortunately, it does not. You can open a socket, send
> something to a selected victim, close it, and repeat this
> until receiver accumulates enough of skbs to kill the system.

Well, it seems the devil is in the details, as usual.

Isn't a socket freed until all skb are handled? In which case the limit on the 
number of open
files limits the total memory usage? (Same as with streaming sockets?)

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Jari Sundell

On 8/22/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

Word "polling" really confuses me here, but now I understand you.
Such approach actually has unresolved issues - consider for
example a situation when all provided events are ready immediately - what
should be returned (as far as I recall they are always added into kqueue in
BSDs before started to be checked, so old events will be returned
first)? And currently ready events can be read through mapped buffer
without any syscall at all.
And Linux syscall is much cheaper than BSD's one.
Consider (especially apped buffer)  that issues, it really does not cost
interface complexity.


There's no reason I can see that kqueue's kevent should not be able to
check an mmaped buffer as in your implementation, after having passed
any filter changes to the kernel.

I'm not sure if I read you correctly, but the situation where all
events are ready immediately is not a problem. Only the delta is
passed with the kevent call, so old events will still be first in the
queue. And as long as the user doesn't randomize the order of the
changelist and passes the changedlist with each kevent call, the
resulting order in which changes are received will be no different
from using individual system calls.

If there's some very specific reason the user needs to retain the
order in which events happen in the interval between adding it to the
changelist and calling kevent, he may decide to call kevent
immediately without asking for any events.


First of all, there are completely different types.
Design of the in-kernel part is very different too.


The question I'm asking is not whet ever kqueue can fit this
implementation, but rather if it is possible to make the
implementation fit kqueue. I can't really see any fundemental
differences, merely implementation details. Maybe I'm just unfamiliar
with the requirements.


> BSD's kqueue:
>
> struct kevent {
>  uintptr_t ident;/* identifier for this event */
>  short filter;   /* filter for event */
>  u_short   flags;/* action flags for kqueue */
>  u_int fflags;   /* filter flag value */
>  intptr_t  data; /* filter data value */
>  void  *udata;   /* opaque user data identifier */
> };


From your description there is a serious problem with arches which
supports different width of the pointer. I do not have sources of ny BSD
right now, but if it is really like you've described, it can not be used
in Linux at all.


Are you referring to udata or data? I'll assume the latter as the
former is more of a restriction on user-space. intptr_t is required to
be safely convertible to a void*, so I don't see what the problem
would be.


No way - timespec uses long.


I must have missed that discussion. Please enlighten me in what regard
using an opaque type with lower resolution is preferable to a type
defined in POSIX for this sort of purpose. Considering the extra code
I need to write to properly handle having just ms resolution, it
better be something fundamentally broken. ;)

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT] sky2: transmit complete alternative

2006-08-22 Thread Stephen Hemminger
Does the following get rid of the hang?

Recode the transmit completion handling to avoid races between the hardware
status report mechanism and the interrupt handler. Rather than relying on
the index value in the status ring, read the chip register and cleanup
all completed transmits. 

Reduce the transmit lock window smaller to allow more parallelism.

--- sky2.orig/drivers/net/sky2.c2006-08-22 13:45:17.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-22 14:01:52.0 -0700
@@ -135,6 +135,7 @@
 static const unsigned txqaddr[] = { Q_XA1, Q_XA2 };
 static const unsigned rxqaddr[] = { Q_R1, Q_R2 };
 static const u32 portirq_msk[] = { Y2_IS_PORT_1, Y2_IS_PORT_2 };
+static const u16 report_idx[] = { STAT_TXA1_RIDX, STAT_TXA2_RIDX };
 
 /* This driver supports yukon2 chipset only */
 static const char *yukon2_name[] = {
@@ -1189,7 +1190,6 @@
struct sky2_tx_le *le = NULL;
struct tx_ring_info *re;
unsigned i, len;
-   int avail;
dma_addr_t mapping;
u32 addr64;
u16 mss;
@@ -1332,12 +1332,8 @@
re->idx = sky2->tx_prod;
le->ctrl |= EOP;
 
-   avail = tx_avail(sky2);
-   if (mss != 0 || avail < TX_MIN_PENDING) {
-   le->ctrl |= FRC_STAT;
-   if (avail <= MAX_SKB_TX_LE)
-   netif_stop_queue(dev);
-   }
+   if (tx_avail(sky2) <= MAX_SKB_TX_LE)
+   netif_stop_queue(dev);
 
sky2_put_idx(hw, txqaddr[sky2->port], sky2->tx_prod);
 
@@ -1361,12 +1357,10 @@
u16 nxt, put;
unsigned i;
 
-   BUG_ON(done >= TX_RING_SIZE);
-
-   if (unlikely(netif_msg_tx_done(sky2)))
-   printk(KERN_DEBUG "%s: tx done, up to %u\n",
-  dev->name, done);
+   if (done == sky2->tx_cons)
+   return;
 
+   BUG_ON(done >= TX_RING_SIZE);
for (put = sky2->tx_cons; put != done; put = nxt) {
struct tx_ring_info *re = sky2->tx_ring + put;
struct sk_buff *skb = re->skb;
@@ -1391,20 +1385,26 @@
   PCI_DMA_TODEVICE);
}
 
+   if (unlikely(netif_msg_tx_done(sky2)))
+   printk(KERN_DEBUG "%s: tx done, slot %u\n",
+  dev->name, put);
+
dev_kfree_skb(skb);
}
 
+   spin_lock(&sky2->tx_lock);
sky2->tx_cons = put;
if (tx_avail(sky2) > MAX_SKB_TX_LE + 4)
netif_wake_queue(dev);
+   spin_unlock(&sky2->tx_lock);
 }
 
 /* Cleanup all untransmitted buffers, assume transmitter not running */
 static void sky2_tx_clean(struct sky2_port *sky2)
 {
-   spin_lock_bh(&sky2->tx_lock);
+   local_bh_disable();
sky2_tx_complete(sky2, sky2->tx_prod);
-   spin_unlock_bh(&sky2->tx_lock);
+   local_bh_enable();
 }
 
 /* Network shutdown */
@@ -1732,7 +1732,7 @@
if (netif_msg_timer(sky2))
printk(KERN_ERR PFX "%s: tx timeout\n", dev->name);
 
-   report = sky2_read16(hw, sky2->port == 0 ? STAT_TXA1_RIDX : 
STAT_TXA2_RIDX);
+   report = sky2_read16(hw, report_idx[sky2->port]);
done = sky2_read16(hw, Q_ADDR(txq, Q_DONE));
 
printk(KERN_DEBUG PFX "%s: transmit ring %u .. %u report=%u done=%u\n",
@@ -1747,9 +1747,7 @@
} else if (report != sky2->tx_cons) {
printk(KERN_INFO PFX "status report lost?\n");
 
-   spin_lock_bh(&sky2->tx_lock);
sky2_tx_complete(sky2, report);
-   spin_unlock_bh(&sky2->tx_lock);
} else {
printk(KERN_INFO PFX "hardware hung? flushing\n");
 
@@ -1919,15 +1917,14 @@
goto resubmit;
 }
 
-/* Transmit complete */
-static inline void sky2_tx_done(struct net_device *dev, u16 last)
+/* Transmit completion handling */
+static void sky2_tx(struct sky2_hw *hw, unsigned port)
 {
-   struct sky2_port *sky2 = netdev_priv(dev);
+   struct net_device *dev = hw->dev[port];
 
-   if (netif_running(dev)) {
-   spin_lock(&sky2->tx_lock);
-   sky2_tx_complete(sky2, last);
-   spin_unlock(&sky2->tx_lock);
+   if (dev && netif_running(dev)) {
+   u16 last = sky2_read16(hw, report_idx[port]);
+   sky2_tx_complete(netdev_priv(dev), last);
}
 }
 
@@ -1939,6 +1936,10 @@
unsigned buf_write[2] = { 0, 0 };
u16 hwidx = sky2_read16(hw, STAT_PUT_IDX);
 
+   sky2_tx(hw, 0);
+   if (hw->ports > 1)
+   sky2_tx(hw, 1);
+
rmb();
 
while (hw->st_idx != hwidx) {
@@ -2004,13 +2005,7 @@
break;
 
case OP_TXINDEXLE:
-   /* TX index reports status for both ports */
-   BUILD_BUG_ON(TX_RING_SIZE > 0x1000);
-   sky2_tx_done(hw->dev[0], status & 0xfff);
-   if (hw->dev[1])
-   sky2_tx_done(hw->dev[1],
-   

Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov
Hello!

> > It is the only protection of commiting infinite amount of memory to a 
> > socket.
> 
> Doesn't the "if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)" check in 
> sock_alloc_send_pskb()
> limit things already?

Unfortunately, it does not. You can open a socket, send
something to a selected victim, close it, and repeat this
until receiver accumulates enough of skbs to kill the system.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wireless-dev: relax sysfs permissions

2006-08-22 Thread Greg KH
On Tue, Aug 22, 2006 at 04:47:40PM +0200, Jiri Benc wrote:
> On Wed, 16 Aug 2006 15:49:45 +0200, Johannes Berg wrote:
> > The sysfs attributes add_iface and remove_iface both check for
> > CAP_NET_ADMIN whenever something is written. Hence, permissions for the
> > files should be relaxed so that someone who is not root but happens to
> > have CAP_NET_ADMIN can do things.
> 
> I'm not sure about this. Greg, what's the policy here?

I don't know, it's not a normal sysfs thing to rely on capability
checks, almost everything that I know of uses the permission bits on the
files.  But I don't have a problem with making the permissions on the
file open, yet restricting things to CAP_NET_ADMIN, if that preserves
the proper functionality.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Indan Zupancic
On Tue, August 22, 2006 22:39, Alexey Kuznetsov said:
> Feel free to do this correctly. :-)
> Deleting "wrong" code rarely helps.
>
> It is the only protection of commiting infinite amount of memory to a socket.

Doesn't the "if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)" check in 
sock_alloc_send_pskb()
limit things already? Or don't unix datagram sockets have a limited sendbuffer? 
Ouch, that
complicates things.

But wait a moment, I'm running a kernel now with the patch applied and some 
testcode shows that
everything seems to behave as expected, with send() returning EAGAIN after a 
while, and poll
giving POLLOUT when more data can be send.

What am I missing?

Indan


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 15:01:44 -0700

> If there _is_ something wrong with kqueue then let us identify those
> weaknesses and then diverge.

Evgeniy already enumerated this, both on his web site and in the
current thread.

Unlike some people seem to imply, Evgeniy did research all the other
implementations of event queueing out there, including kqueue.
He took the best of that survey, adding some of his own ideas,
and that's what kevent is.  It's not like he's some kind of
charlatan and made arbitrary decisions in his design without any
regard for what's out there already.

Again, the proof is in the pudding, he wrote applications against his
interfaces and tested them.  That's what people need to really do if
they want to judge his interface, try to write programs against it and
report back any problems they run into.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Andrew Morton
On Tue, 22 Aug 2006 14:37:47 -0700
"Randy.Dunlap" <[EMAIL PROTECTED]> wrote:

> On Tue, 22 Aug 2006 14:13:02 -0700 Nicholas Miell wrote:
> 
> > On Wed, 2006-08-23 at 00:16 +0400, Evgeniy Polyakov wrote:
> > > On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL 
> > > PROTECTED]) wrote:
> > > > On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > > > Of course, since you already know how all this stuff is supposed to
> > > > work, you could maybe write it down somewhere?
> > > 
> > > I will write documantation, but as you can see some interfaces are
> > > changed.
> > 
> > Thanks; rapidly changing interfaces need good documentation even more
> > than stable interfaces simply because reverse engineering the intended
> > API from a changing implementation becomes even more difficult.
> 
> OK, I don't quite get it.
> Can you be precise about what you would like?
> 
> a.  good documentation
> b.  a POSIX API
> c.  a Windows-compatible API
> d.  other?
> 
> and we won't make you use any of this code.
> 

Today seems to be beat-up-Nick day?

This is a major, major new addition to the kernel API.  It's a big deal. 
Getting it documented prior to committing ourselves is a useful part of the
review process.  It certainly can't hurt, and it might help.  It is a
little too soon to spend too much time on that though.  (It's actually
_better_ if someone other than the developer writes the documentation,
too).


And the "why not emulate kqueue" question strikes me as an excellent one. 
Presumably a lot of developer thought and in-field experience has gone into
kqueue.  It would benefit us to use that knowledge as much as we can.

I mean, if there's nothing wrong with kqueue then let's minimise app
developer pain and copy it exactly.  If there _is_ something wrong with
kqueue then let us identify those weaknesses and then diverge.  Doing
something which looks the same and works the same and does the same thing
but has a different API doesn't benefit anyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem

2006-08-22 Thread Stephen Hemminger
On Tue, 22 Aug 2006 16:32:33 +0200
Rainer Baumann <[EMAIL PROTECTED]> wrote:

> This is the revised trace extension to the network emulator netem.
> This extension provides emulation control based on pregenerated traces.
> 
> We first submitted this patch on 2nd of August, in the mean time 
> we integrated the comments from Stephen and fixed the listed things.
> 
> Cheers,
> Rainer

Please put patches inline, commenting is easier.

The biggest problem with this is architectural. I don't like having kernel
tightly bound to a user level control process.  If the kernel needs to keep
track of the pid of the process, the API is wrong. What about multiple instances
with multiple devices?

A better way would be to just let the user process keep filling with something
like netlink, configfs/debugfs or even character device. Just block the process
(let it hang on write), until a buffer of trace data is used up. If the process
dies or doesn't give more data then either reuse last flow or stop flowing.

Please do the diff from the proper base (see Documentation/SubmittingPatches).
You have nested one too directories.

> diff -u'rNF^function' olinux/linux-2.6.16.19/include/linux/pkt_sched.h 
> otlinux/linux-2.6.16.19/include/linux/pkt_sched.h
> --- olinux/linux-2.6.16.19/include/linux/pkt_sched.h  2006-05-31 
> 02:31:44.0 +0200
> +++ otlinux/linux-2.6.16.19/include/linux/pkt_sched.h 2006-08-22 
> 11:03:11.0 +0200
> @@ -430,11 +430,15 @@
>   TCA_NETEM_DELAY_DIST,
>   TCA_NETEM_REORDER,
>   TCA_NETEM_CORRUPT,
> + TCA_NETEM_TRACE,
> + TCA_NETEM_DATA,
> + TCA_NETEM_STATS,
>   __TCA_NETEM_MAX,
>  };
>  
>  #define TCA_NETEM_MAX (__TCA_NETEM_MAX - 1)
> -
> +#define DATA_PACKAGE 4000
> +#define MAX_FLOWS 4
>  struct tc_netem_qopt
>  {
>   __u32   latency;/* added delay (us) */
> @@ -445,6 +449,40 @@
>   __u32   jitter; /* random jitter in latency (us) */
>  };
>  
> +struct tc_netem_stats
> +{
> + int packetcount;
> + int packetok;
> + int normaldelay;
> + int drops;
> + int dupl;
> + int corrupt;
> + int noValidData;
> + int uninitialized;
> + int bufferunderrun;
> + int bufferinuseempty;
> + int noemptybuffer;
> + int readbehindbuffer;
> + int buffer1_reloads;
> + int buffer2_reloads;
> + int tobuffer1_switch;
> + int tobuffer2_switch;
> + int switch_to_emptybuffer1;
> + int switch_to_emptybuffer2; 
> +};   
> +struct tc_netem_data
> +{
> + char buf[DATA_PACKAGE];
> + int fpid;
> + int validData;

lower case structure tags please.
don't create a "blob" interface.
why not variable length? since netlink has Type Length Data


> +};
> +struct tc_netem_trace
> +{
> +  __u32   fpid;   /* pid of flowseedprocess*/
> +  __u32   def;/* default action 0=no delay, 1=drop*/
> +  __u32   ticks;  /* number of ticks corresponding to 1us*/
> +};
> +
>  struct tc_netem_corr
>  {
>   __u32   delay_corr; /* delay correlation */
> diff -u'rNF^function' olinux/linux-2.6.16.19/include/net/flowseed.h 
> otlinux/linux-2.6.16.19/include/net/flowseed.h
> --- olinux/linux-2.6.16.19/include/net/flowseed.h 1970-01-01 
> 01:00:00.0 +0100
> +++ otlinux/linux-2.6.16.19/include/net/flowseed.h2006-08-22 
> 11:03:33.0 +0200
> @@ -0,0 +1,65 @@
> +/* flowseedprocfs.h header file for the netem trace enhancement
> + */
> +
> +#ifndef _FLOWSEEDPROCFS_H
> +#define _FLOWSEEDPROCFS_H
> +#include 
> +
> +/* must be divisible by 4 (=#pkts)*/
> +#define DATA_PACKAGE 4000
> +
> +/* maximal amount of parallel flows */
> +#define MAX_FLOWS 4
> +
> +/* struct per flow - kernel */
> +typedef struct _flowbuffer {
> +char * buffer1;
> +char * buffer2;
> +char * buffer_in_use;   // buffer that is used by consumer
> +char * offsetpos;   // pointer to actual pos in the buffer in use
> +char * buffer1_empty;   // *buffer1 if buffer is empty, NULL else
> +char * buffer2_empty;   // *buffer2 if buffer is empty, NULL else
> +int flowid; // NIST Net flow id [array index]
> +int upid;   // pid of the user process corresponding to 
> this flowbuffer
> +int validDataB1;// 1 if Data in buffer1 is valid, 0 if 
> tracefile reached end and rubish is in B1
> +int validDataB2;// 1 if Data in buffer2 is valid, 0 if 
> tracefile reached end and rubish is in B2
> +} flowbuffer;

Kernel style is to not use C++ style comments.
Are the buffer's really characters or are you just using
'char *' as an opaque pointer.

> +
> +typedef struct _strdelay {
> + u_int8_t head;
> + int delay;
> +} strdelay;
> +

Kernel style is not to use typedef's and use u8 instead of u_int8_t
The choice of name 'strdelay' implies something related to strings
of characters in C.

> +struct proc_stats {
Why not a more decscript

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Randy.Dunlap
On Tue, 22 Aug 2006 14:13:02 -0700 Nicholas Miell wrote:

> On Wed, 2006-08-23 at 00:16 +0400, Evgeniy Polyakov wrote:
> > On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL 
> > PROTECTED]) wrote:
> > > On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > > Of course, since you already know how all this stuff is supposed to
> > > work, you could maybe write it down somewhere?
> > 
> > I will write documantation, but as you can see some interfaces are
> > changed.
> 
> Thanks; rapidly changing interfaces need good documentation even more
> than stable interfaces simply because reverse engineering the intended
> API from a changing implementation becomes even more difficult.

OK, I don't quite get it.
Can you be precise about what you would like?

a.  good documentation
b.  a POSIX API
c.  a Windows-compatible API
d.  other?

and we won't make you use any of this code.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread David Miller
From: Nicholas Miell <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 14:13:40 -0700

> And how is the quality of the work to be judged if the work isn't
> commented, documented and explained, especially the userland-visible
> parts that *cannot* *ever* *be* *changed* *or* *removed* once they're in
> a stable kernel release?

Are you even willing to look at the collection of example applications
Evgeniy wrote against this API?

That is the true test of a set of interfaces, what happens when you
try to actually use them in real programs.

Everything else is fluff, including standards and "documentation".

He even bothered to benchmark things, and post assosciated graphs and
performance analysis during the course of development.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Wed, 2006-08-23 at 00:16 +0400, Evgeniy Polyakov wrote:
> On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL PROTECTED]) 
> wrote:
> > On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > Of course, since you already know how all this stuff is supposed to
> > work, you could maybe write it down somewhere?
> 
> I will write documantation, but as you can see some interfaces are
> changed.

Thanks; rapidly changing interfaces need good documentation even more
than stable interfaces simply because reverse engineering the intended
API from a changing implementation becomes even more difficult.

> > > > > I will ask just one question, do _you_ propose anything here?
> > > > 
> > > > struct sigevent sigev = {
> > > > .sigev_notify = SIGEV_KEVENT,
> > > > .sigev_kevent_fd = kev_fd,
> > > > .sigev_value.sival_ptr = &MyCookie
> > > > };
> > > > 
> > > > struct itimerspec its = {
> > > > .it_value = { ... },
> > > > .it_interval = { ... }
> > > > };
> > > > 
> > > > struct timespec timeout = { .. };
> > > > 
> > > > struct ukevent events[max];
> > > > 
> > > > timer_t timer;
> > > > 
> > > > timer_create(CLOCK_MONOTONIC, &sigev, &timer);
> > > > timer_settime(timer, 0, &its, NULL);
> > > > 
> > > > /* ... */
> > > > 
> > > > kevent_get_events(kev_fd, min, max, &timeout, events, 0);
> > > > 
> > > > 
> > > > 
> > > > Which isn't all that different from what Ulrich Drepper suggested and
> > > > Solaris does right now. (timer_create would probably end up calling
> > > > kevent_ctl itself, but it obviously can't do that unless kevents
> > > > actually support real interval timers).
> > > 
> > > Ugh, rtsignals... Their's problems forced me to not implement
> > > "interrupt"-like mechanism for kevents in addition to dequeueing.
> > > 
> > > Anyway, it seems you did not read the whole thread, homepage, lwn and
> > > userpsace examples, so you do not understand what kevents are.
> > > 
> > > They are userspace requests which are returned back when they are ready.
> > > It means that userspace must provide something to kernel and ask it to
> > > notify when that "something" is ready. For example it can provide a
> > > timeout value and ask kernel to fire a timer with it and inform
> > > userspace when timeout has expired.
> > > It does not matter what timer is used there - feel free to use
> > > high-resolution one, usual timer, busyloop or anything else. Main issue 
> > > that userspace request must be completed.
> > > 
> > > What you are trying to do is to put kevents under POSIX API.
> > > That means that those kevents can not be read using
> > > kevent_get_events(), basicaly because there are no user-known kevents,
> > > i.e. user has not requested timer, so it should not receive it's
> > > notifications (otherwise it will receive everything requested by other
> > > threads and other issues, i.e. how to differentiate timer request made
> > > by timer_create(), which is not supposed to be caught by
> > > kevent_get_events()). 
> > 
> > I have no idea what you're trying to say here. I've created a timer,
> > specified which kevent queue I want it's expiry notification delivered
> > to, and armed it. Where have I not specified enough information to
> > request the reception of timer notifications?
> 
> You can do it with kevent timer notifications. Easily.
> I've even attached simple program for that.

You forgot to attach the program.

> > Also, differentiating timers made by timer_create() that aren't supposed
> > to deliver events via kevent_get_events() is easy -- their .sigev_notify
> > isn't SIGEV_KEVENT.
> 
> What should be returned to user? 
> What should be placed into user's data, into id? 

The cookie I passed in -- in this example, it was &MyCookie.

> How user can determine that given event fires after which
> initial value?

I don't know what this means.

> Finally, if you think that kevents should use different API for
> different events, think about complicated userspace code which must know
> tons of syscalls for the same task.

I don't think cramming everything together into the same syscall is any
better. In fact, a series of discrete, easy-to-understand function calls
is a hell of a lot easier to deal with than a single call that takes an
array of large multi-purpose structures, especially when most of those
function calls have standard specified behavior.

In fact, I doubt anything will *ever* use kevents directly -- it's
either going to be something like libevent which wraps this stuff
portably or the app's own portability layer or GLib's event loop or
something else that abstracts away the fact that nobody can agree on
what the primitives for a unified event loop should be. There's nothing
like another layer of indirection to solve your problems.

> > > You could implement POSIX timer _fully_ on top of kevents, i.e. both
> > > create and read, for example network AIO is implemented in that way -
> > > there is a system calls aio_send()/

Re: [PATCH] Add wireless statics to bcm43xx-d80211

2006-08-22 Thread Larry Finger

Jiri Benc wrote:

On Mon, 14 Aug 2006 08:29:08 -0500, Larry Finger wrote:
This patch implements wireless statistics for bcm43xx using the d80211 stack.It 
also sets a framework for the implementation in other drivers that use the 
d80211 code. The component parts have been circulated on the netdev mailing 
list, and all suggested changes have been incorporated. The specific changes are 
as follows:


Please, separate the d80211 part and the bcm43xx-d80211 part into
two patches.


OK. I split it into parts for the initial RFC submission, then combined them after I thought the 
comments were all in. Sorry.



--- a/include/net/d80211.h
+++ b/include/net/d80211.h
@@ -205,6 +205,9 @@ struct ieee80211_rx_status {
 int channel;
 int phymode;
 int ssi;
+   int maxssi;


Why is maxssi here? Can it really change between received frames?


No it cannot change between frames; however, the max value can be very different for different 
drivers using d80211. On the bcm43xx, it is 60; whereas 100 seems to be a better value for the 
rt2x00 chips. Adding it here seemed like a good way to handle this situation. Do you suggest 
something else?



[...]
--- a/net/d80211/ieee80211.c
+++ b/net/d80211/ieee80211.c
@@ -3174,6 +3174,9 @@ ieee80211_rx_h_sta_process(struct ieee80
sta->rx_fragments++;
sta->rx_bytes += rx->skb->len;
sta->last_rssi = rx->u.rx.status->ssi;
+   sta->last_signal = rx->u.rx.status->signal;
+   sta->last_noise = rx->u.rx.status->noise;
+   sta->max_rssi = rx->u.rx.status->maxssi;


Again, I see no reason why max_rssi should be in sta structure.


Again to pass the differing values for different drivers.


[...]
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -337,6 +337,9 @@ struct ieee80211_local {
struct net_device *apdev; /* wlan#ap - management frames (hostapd) */
int open_count;
int monitors;
+   int link_quality;
+   int noise;
+   struct iw_statistics wstats;


Why are these three variables in ieee80211_local? They are not used
anywhere.


You are right about the first two; however, wstats is used in the new
routine ieee80211_get_wireless_stats.


[...]
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -1580,6 +1580,16 @@ static int ieee80211_ioctl_giwrange(stru
range->min_frag = 256;
range->max_frag = 2346;
 
+	range->max_qual.qual = 100;

+   range->max_qual.level = 152;  /* set floor at -104 dBm (152 - 256) */


I would suggest using -110 dBm as a floor (to be compatible with RCPI
definition, see mail from Simon Barber describing it). Or is there any
particular reason for -104 dBm?


It is the value previously used in the softmac version of bcm43xx. A value of -110 would obviously 
be better.



[...]
--- a/net/d80211/sta_info.h
+++ b/net/d80211/sta_info.h
@@ -82,6 +82,9 @@ struct sta_info {
unsigned long rx_dropped; /* number of dropped MPDUs from this STA */
 
 	int last_rssi; /* RSSI of last received frame from this STA */

+   int last_signal; /* signal of last received frame from this STA */
+   int last_noise; /* noise of last received frame from this STA */


Add these two variables also to sysfs, please.


Will be done.

Larry
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 13:36 -0700, David Miller wrote:
> From: Nicholas Miell <[EMAIL PROTECTED]>
> Date: Tue, 22 Aug 2006 13:00:23 -0700
> 
> > I'm not the one proposing the new (potentially wrong) interface. The
> > onus isn't on me.
> 
> You can't demand a volunteer to do work, period.
> 
> If it matters to you, you have the option of doing the work.
> Otherwise you can't complain.

So if a volunteer does bad work, I'm obligated to accept it just because
I haven't done better?

Alternately, if a volunteer does bad work, must it be merged into the
kernel because there's isn't a better implementation? (I believe that
was tried at least once with devfs.)

And how is the quality of the work to be judged if the work isn't
commented, documented and explained, especially the userland-visible
parts that *cannot* *ever* *be* *changed* *or* *removed* once they're in
a stable kernel release?

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov
Hello!

> Either this, or it should be implemented correctly, which means poll needs
> to be fixed to also check for max_dgram_qlen,

Feel free to do this correctly. :-)
Deleting "wrong" code rarely helps.

It is the only protection of commiting infinite amount of memory to a socket.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread David Miller
From: Nicholas Miell <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 13:00:23 -0700

> I'm not the one proposing the new (potentially wrong) interface. The
> onus isn't on me.

You can't demand a volunteer to do work, period.

If it matters to you, you have the option of doing the work.
Otherwise you can't complain.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lksctp-developers] [PATCH 3/5][SCTP]: Remove multiple levels of msecs to jiffies conversions.

2006-08-22 Thread David Miller
From: Vlad Yasevich <[EMAIL PROTECTED]>
Date: Tue, 22 Aug 2006 09:44:37 -0400

> Again, the variables exposed by the user interface have been and remain in 
> milliseconds.

I misread the patch, sorry :)

Thanks for clearing this up, I'll apply the patch.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 12:57:38PM -0700, Nicholas Miell ([EMAIL PROTECTED]) 
wrote:
> On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> > On Tue, Aug 22, 2006 at 02:29:48AM -0700, Nicholas Miell ([EMAIL 
> > PROTECTED]) wrote:
> > > > > Is any of this documented anywhere? I'd think that any new userspace
> > > > > interfaces should have man pages explaining their use and some example
> > > > > code before getting merged into the kernel to shake out any interface
> > > > > problems.
> > > > 
> > > > There are two excellent articles on lwn.net
> > > 
> > > Google knows of one and it doesn't actually explain how to use kevents.
> > 
> > http://lwn.net/Articles/192964/
> > http://lwn.net/Articles/172844/
> > 
> > In the thread there were enough links to homepage where you can find
> > several examples of how to use kevents (and timers among others) with
> > old interfaces and new ones.
> > 
> 
> Oh, I found both of those. Neither of them told me what values I could
> use in a struct kevent_user_control or what they meant or what any of
> the fields in a struct ukevent or struct kevent_id meant or what I'm
> supposed to pass in kevent_get_event's "void* buf", or many other things
> that I don't remember now. 

Well, I think LWN has very good explaination of what all parameters
mean, but it is possible that there can be some white areas.
No one forbids to look into userspace examples, link to them was posted
a lot of times.

> In short, I'm stuck trying to reverse engineer from the source what the
> API is supposed to be (which might not even be what is actually
> implemented due to the as of yet unfound bug).
> 
> Of course, since you already know how all this stuff is supposed to
> work, you could maybe write it down somewhere?

I will write documantation, but as you can see some interfaces are
changed.

> 
> > > > I will ask just one question, do _you_ propose anything here?
> > > >  
> > > 
> > > struct sigevent sigev = {
> > >   .sigev_notify = SIGEV_KEVENT,
> > >   .sigev_kevent_fd = kev_fd,
> > >   .sigev_value.sival_ptr = &MyCookie
> > > };
> > > 
> > > struct itimerspec its = {
> > >   .it_value = { ... },
> > >   .it_interval = { ... }
> > > };
> > > 
> > > struct timespec timeout = { .. };
> > > 
> > > struct ukevent events[max];
> > > 
> > > timer_t timer;
> > > 
> > > timer_create(CLOCK_MONOTONIC, &sigev, &timer);
> > > timer_settime(timer, 0, &its, NULL);
> > > 
> > > /* ... */
> > > 
> > > kevent_get_events(kev_fd, min, max, &timeout, events, 0);
> > > 
> > > 
> > > 
> > > Which isn't all that different from what Ulrich Drepper suggested and
> > > Solaris does right now. (timer_create would probably end up calling
> > > kevent_ctl itself, but it obviously can't do that unless kevents
> > > actually support real interval timers).
> > 
> > Ugh, rtsignals... Their's problems forced me to not implement
> > "interrupt"-like mechanism for kevents in addition to dequeueing.
> > 
> > Anyway, it seems you did not read the whole thread, homepage, lwn and
> > userpsace examples, so you do not understand what kevents are.
> > 
> > They are userspace requests which are returned back when they are ready.
> > It means that userspace must provide something to kernel and ask it to
> > notify when that "something" is ready. For example it can provide a
> > timeout value and ask kernel to fire a timer with it and inform
> > userspace when timeout has expired.
> > It does not matter what timer is used there - feel free to use
> > high-resolution one, usual timer, busyloop or anything else. Main issue 
> > that userspace request must be completed.
> > 
> > What you are trying to do is to put kevents under POSIX API.
> > That means that those kevents can not be read using
> > kevent_get_events(), basicaly because there are no user-known kevents,
> > i.e. user has not requested timer, so it should not receive it's
> > notifications (otherwise it will receive everything requested by other
> > threads and other issues, i.e. how to differentiate timer request made
> > by timer_create(), which is not supposed to be caught by
> > kevent_get_events()).
> > 
> 
> I have no idea what you're trying to say here. I've created a timer,
> specified which kevent queue I want it's expiry notification delivered
> to, and armed it. Where have I not specified enough information to
> request the reception of timer notifications?

You can do it with kevent timer notifications. Easily.
I've even attached simple program for that.

> Also, differentiating timers made by timer_create() that aren't supposed
> to deliver events via kevent_get_events() is easy -- their .sigev_notify
> isn't SIGEV_KEVENT.

What should be returned to user? What should be placed into user's data,
into id? How user can determine that given event fires after which
initial value?
Finally, if you think that kevents should use different API for
different events, think about complicated userspace code which must know
tons of syscalls for the same task.

> > Yo

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 10:59 -0400, James Morris wrote:
> On Tue, 22 Aug 2006, Nicholas Miell wrote:
> 
> > In this brave new world of always stable kernel development, the time a
> > new interface has for public testing before a new kernel release is
> > drastically shorter than the old unstable development series, and if
> > nobody is documenting how this stuff is supposed to work and
> > demonstrating how it will be used, then mistakes are bound to slip
> > through.
> 
> Feel free to provide the documentation.  Perhaps, even as much as you've 
> written so far in these emails would be enough.
> 

I'm not the one proposing the new (potentially wrong) interface. The
onus isn't on me.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Nicholas Miell
On Tue, 2006-08-22 at 14:03 +0400, Evgeniy Polyakov wrote:
> On Tue, Aug 22, 2006 at 02:29:48AM -0700, Nicholas Miell ([EMAIL PROTECTED]) 
> wrote:
> > > > Is any of this documented anywhere? I'd think that any new userspace
> > > > interfaces should have man pages explaining their use and some example
> > > > code before getting merged into the kernel to shake out any interface
> > > > problems.
> > > 
> > > There are two excellent articles on lwn.net
> > 
> > Google knows of one and it doesn't actually explain how to use kevents.
> 
> http://lwn.net/Articles/192964/
> http://lwn.net/Articles/172844/
> 
> In the thread there were enough links to homepage where you can find
> several examples of how to use kevents (and timers among others) with
> old interfaces and new ones.
> 

Oh, I found both of those. Neither of them told me what values I could
use in a struct kevent_user_control or what they meant or what any of
the fields in a struct ukevent or struct kevent_id meant or what I'm
supposed to pass in kevent_get_event's "void* buf", or many other things
that I don't remember now. 

In short, I'm stuck trying to reverse engineer from the source what the
API is supposed to be (which might not even be what is actually
implemented due to the as of yet unfound bug).

Of course, since you already know how all this stuff is supposed to
work, you could maybe write it down somewhere?


> > > I will ask just one question, do _you_ propose anything here?
> > >  
> > 
> > struct sigevent sigev = {
> > .sigev_notify = SIGEV_KEVENT,
> > .sigev_kevent_fd = kev_fd,
> > .sigev_value.sival_ptr = &MyCookie
> > };
> > 
> > struct itimerspec its = {
> > .it_value = { ... },
> > .it_interval = { ... }
> > };
> > 
> > struct timespec timeout = { .. };
> > 
> > struct ukevent events[max];
> > 
> > timer_t timer;
> > 
> > timer_create(CLOCK_MONOTONIC, &sigev, &timer);
> > timer_settime(timer, 0, &its, NULL);
> > 
> > /* ... */
> > 
> > kevent_get_events(kev_fd, min, max, &timeout, events, 0);
> > 
> > 
> > 
> > Which isn't all that different from what Ulrich Drepper suggested and
> > Solaris does right now. (timer_create would probably end up calling
> > kevent_ctl itself, but it obviously can't do that unless kevents
> > actually support real interval timers).
> 
> Ugh, rtsignals... Their's problems forced me to not implement
> "interrupt"-like mechanism for kevents in addition to dequeueing.
> 
> Anyway, it seems you did not read the whole thread, homepage, lwn and
> userpsace examples, so you do not understand what kevents are.
> 
> They are userspace requests which are returned back when they are ready.
> It means that userspace must provide something to kernel and ask it to
> notify when that "something" is ready. For example it can provide a
> timeout value and ask kernel to fire a timer with it and inform
> userspace when timeout has expired.
> It does not matter what timer is used there - feel free to use
> high-resolution one, usual timer, busyloop or anything else. Main issue 
> that userspace request must be completed.
> 
> What you are trying to do is to put kevents under POSIX API.
> That means that those kevents can not be read using
> kevent_get_events(), basicaly because there are no user-known kevents,
> i.e. user has not requested timer, so it should not receive it's
> notifications (otherwise it will receive everything requested by other
> threads and other issues, i.e. how to differentiate timer request made
> by timer_create(), which is not supposed to be caught by
> kevent_get_events()).
> 

I have no idea what you're trying to say here. I've created a timer,
specified which kevent queue I want it's expiry notification delivered
to, and armed it. Where have I not specified enough information to
request the reception of timer notifications?

Also, differentiating timers made by timer_create() that aren't supposed
to deliver events via kevent_get_events() is easy -- their .sigev_notify
isn't SIGEV_KEVENT.

> You could implement POSIX timer _fully_ on top of kevents, i.e. both
> create and read, for example network AIO is implemented in that way -
> there is a system calls aio_send()/aio_recv() and aio_sendfile() which
> create kevent internally and then get it's readiness notifications over
> provided callback, process data and finally remove kevent,
> so POSIX timers could create timer kevent, wait until it is ready, in
> completeness callback it would call signal delivering mechanism...
> 

Yes, but that would be stupid. The kernel already has a fully functional
POSIX timer implementation, so throwing it out to reimplement it using
kevents would be a waste of effort, especially considering that your
kevent timers can't fully express a POSIX interval timer.

Now, if there were some way for me to ask that an interval timer queue
it's expiry notices into a kevent queue, that would combine the best of
both worlds.

> But there are no reading mechanism in POSIX timers (I mean not reading
> p

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 09:14:30PM +0200, Jari Sundell ([EMAIL PROTECTED]) 
wrote:
> Changing kevents are done with a separate system call from polling
> afaics, thus every change requires a context switch. This in contrast
> to BSD's kqueue which allows user-space to pass the changes when
> kevent (polling) is called.
> 
> It may also choose to update the filters immediately with the same call.

Word "polling" really confuses me here, but now I understand you.
Such approach actually has unresolved issues - consider for
example a situation when all provided events are ready immediately - what
should be returned (as far as I recall they are always added into kqueue in
BSDs before started to be checked, so old events will be returned
first)? And currently ready events can be read through mapped buffer
without any syscall at all.
And Linux syscall is much cheaper than BSD's one.
Consider (especially apped buffer)  that issues, it really does not cost
interface complexity.

> >> Maybe this is a topic that will singe my fur, but what is wrong with the
> >> kqueue API? Will I really have to implement support for yet another event
> >> API in my program.
> >
> >Why did I not implemented it like Solaris did?
> >Or FreeBSD did?
> >It was designed with features mention on AIO homepage in mind, but not
> >to be compatible with some other implementation.
> >And why should it be?
> 
> If it can be, why should it not be? At least, if you reinvent the
> wheel its advantages should be obvious.
> 
> Considering that kqueue is available on more popular OSes like darwin
> it would ease portability greatly if there was a shared event API.
> That is, unless you think there's something fundamentally wrong with
> their design.

First of all, there are completely different types.
Design of the in-kernel part is very different too.

> Your interface:
> 
> +asmlinkage long sys_kevent_get_events(int ctl_fd, unsigned int min,
> unsigned int max,
> +   unsigned int timeout, void __user *buf, unsigned flags);
> +asmlinkage long sys_kevent_ctl(int ctl_fd, unsigned int cmd, unsigned
> int num, void __user *buf);
> 
> BSD's kqueue:
> 
> struct kevent {
>  uintptr_t ident;/* identifier for this event */
>  short filter;   /* filter for event */
>  u_short   flags;/* action flags for kqueue */
>  u_int fflags;   /* filter flag value */
>  intptr_t  data; /* filter data value */
>  void  *udata;   /* opaque user data identifier */
> };


>From your description there is a serious problem with arches which
supports different width of the pointer. I do not have sources of ny BSD
right now, but if it is really like you've described, it can not be used
in Linux at all.

> int kevent(int kq, const struct kevent *changelist, int nchanges,
> struct kevent *eventlist, int nevents, const struct timespec
> *timeout);
> 
> The only thing missing in BSD's kevent is the min/max parameters, the
> various filters in kevent_get_events either have equivalent filters or
> could be added as extensions. (I didn't look too carefully through
> them)
> 
> On the other hand, your API lacks the ability to pass changes when
> polling, as mentioned above. It would be preferable if the timeout
> parameter was either timespec or timeval.

No way - timespec uses long.

> Rakshasa

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Indan Zupancic
Hello,

Here's a patch to get rid of max_dgram_qlen proc option. All it does is
slow down unix datagram packet sending, without giving the program any
control over it.

Applying it decreases code size, simplifies the code and makes poll
behaviour more logical for connected datagram sockets in regard to
POLLOUT. With the current code poll can say the socket is writable,
because it only checks the buffers, while sendmsg will fail because too
many packets are queued up already. In practice this means that a slow
reader will cause a non-blocking sender to hog the CPU.

Either this, or it should be implemented correctly, which means poll needs
to be fixed to also check for max_dgram_qlen, and maybe an ioctl should be
added to change the variable per socket.

Greetings,

Indan


diff --git a/net/unix/Makefile b/net/unix/Makefile
--- a/net/unix/Makefile Sun Aug 06 19:00:05 2006 +
+++ b/net/unix/Makefile Tue Aug 22 21:06:09 2006 +0200
@@ -5,4 +5,3 @@ obj-$(CONFIG_UNIX)  += unix.o
 obj-$(CONFIG_UNIX) += unix.o

 unix-y := af_unix.o garbage.o
-unix-$(CONFIG_SYSCTL)  += sysctl_net_unix.o
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
--- a/net/unix/af_unix.cSun Aug 06 19:00:05 2006 +
+++ b/net/unix/af_unix.cTue Aug 22 21:05:39 2006 +0200
@@ -117,8 +117,6 @@
 #include 
 #include 

-int sysctl_unix_max_dgram_qlen = 10;
-
 struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 DEFINE_SPINLOCK(unix_table_lock);
 static atomic_t unix_nr_socks = ATOMIC_INIT(0);
@@ -586,7 +584,6 @@ static struct sock * unix_create1(struct
&af_unix_sk_receive_queue_lock_key);

sk->sk_write_space  = unix_write_space;
-   sk->sk_max_ack_backlog  = sysctl_unix_max_dgram_qlen;
sk->sk_destruct = unix_sock_destructor;
u = unix_sk(sk);
u->dentry = NULL;
@@ -1379,23 +1376,6 @@ restart:
goto out_unlock;
}

-   if (unix_peer(other) != sk &&
-   (skb_queue_len(&other->sk_receive_queue) >
-other->sk_max_ack_backlog)) {
-   if (!timeo) {
-   err = -EAGAIN;
-   goto out_unlock;
-   }
-
-   timeo = unix_wait_for_peer(other, timeo);
-
-   err = sock_intr_errno(timeo);
-   if (signal_pending(current))
-   goto out_free;
-
-   goto restart;
-   }
-
skb_queue_tail(&other->sk_receive_queue, skb);
unix_state_runlock(other);
other->sk_data_ready(other, len);
@@ -2076,7 +2056,6 @@ static int __init af_unix_init(void)
 #ifdef CONFIG_PROC_FS
proc_net_fops_create("unix", 0, &unix_seq_fops);
 #endif
-   unix_sysctl_register();
 out:
return rc;
 }
@@ -2084,7 +2063,6 @@ static void __exit af_unix_exit(void)
 static void __exit af_unix_exit(void)
 {
sock_unregister(PF_UNIX);
-   unix_sysctl_unregister();
proc_net_remove("unix");
proto_unregister(&unix_proto);
 }
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c deleted 
file mode 100644
--- a/net/unix/sysctl_net_unix.cSun Aug 06 19:00:05 2006 +
+++ /dev/null   Thu Jan 01 00:00:00 1970 +
@@ -1,60 +0,0 @@
-/*
- * NET4:   Sysctl interface to net af_unix subsystem.
- *
- * Authors:Mike Shaver.
- *
- * This program is free software; you can redistribute it and/or - 
*   modify it under the terms
of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include 
-
-#include 
-
-static ctl_table unix_table[] = {
-   {
-   .ctl_name   = NET_UNIX_MAX_DGRAM_QLEN,
-   .procname   = "max_dgram_qlen",
-   .data   = &sysctl_unix_max_dgram_qlen,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = &proc_dointvec
-   },
-   { .ctl_name = 0 }
-};
-
-static ctl_table unix_net_table[] = {
-   {
-   .ctl_name   = NET_UNIX,
-   .procname   = "unix",
-   .mode   = 0555,
-   .child  = unix_table
-   },
-   { .ctl_name = 0 }
-};
-
-static ctl_table unix_root_table[] = {
-   {
-   .ctl_name   = CTL_NET,
-   .procname   = "net",
-   .mode   = 0555,
-   .child  = unix_net_table
-   },
-   { .ctl_name = 0 }
-};
-
-static struct ctl_table_header * unix_sysctl_header;
-
-void unix_sysctl_register(void)
-{
-   unix_sysctl_header = register_sysctl_table(unix_root_table, 0);
-}
-
-void unix_sysctl_unregister(void)
-{
-   unregister_sysctl_table(unix_sysctl_header);
-}
-




-
To unsubscribe from this list: send the line "unsubscribe netdev" 

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Jari Sundell

On 8/22/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> Not to mention the name used causes (at least me) some confusion with BSD's
> kqueue implementation. Skimming over the patches it actually looks somewhat
> like kqueue with the more interesting features removed, like the ability to
> pass the filter changes simultaneously with polling.

I do not understand, what do you mean?
It is obviously allowed to poll and change kevents at the same time.


Changing kevents are done with a separate system call from polling
afaics, thus every change requires a context switch. This in contrast
to BSD's kqueue which allows user-space to pass the changes when
kevent (polling) is called.

It may also choose to update the filters immediately with the same call.


> Maybe this is a topic that will singe my fur, but what is wrong with the
> kqueue API? Will I really have to implement support for yet another event
> API in my program.

Why did I not implemented it like Solaris did?
Or FreeBSD did?
It was designed with features mention on AIO homepage in mind, but not
to be compatible with some other implementation.
And why should it be?


If it can be, why should it not be? At least, if you reinvent the
wheel its advantages should be obvious.

Considering that kqueue is available on more popular OSes like darwin
it would ease portability greatly if there was a shared event API.
That is, unless you think there's something fundamentally wrong with
their design.

Your interface:

+asmlinkage long sys_kevent_get_events(int ctl_fd, unsigned int min,
unsigned int max,
+   unsigned int timeout, void __user *buf, unsigned flags);
+asmlinkage long sys_kevent_ctl(int ctl_fd, unsigned int cmd, unsigned
int num, void __user *buf);

BSD's kqueue:

struct kevent {
 uintptr_t ident;/* identifier for this event */
 short filter;   /* filter for event */
 u_short   flags;/* action flags for kqueue */
 u_int fflags;   /* filter flag value */
 intptr_t  data; /* filter data value */
 void  *udata;   /* opaque user data identifier */
};

int kevent(int kq, const struct kevent *changelist, int nchanges,
struct kevent *eventlist, int nevents, const struct timespec
*timeout);

The only thing missing in BSD's kevent is the min/max parameters, the
various filters in kevent_get_events either have equivalent filters or
could be added as extensions. (I didn't look too carefully through
them)

On the other hand, your API lacks the ability to pass changes when
polling, as mentioned above. It would be preferable if the timeout
parameter was either timespec or timeval.

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/18] d80211: get rid of MICHAEL_MIC_HWACCEL define

2006-08-22 Thread Jiri Benc
On Mon, 21 Aug 2006 09:41:23 +0200, Johannes Berg wrote:
> The symbol MICHAEL_MIC_HWACCEL is always defined and hence
> all the ifdefs using it are useless. This patch removes it.
> 
> [...]
> @@ -87,9 +84,7 @@ ieee80211_tx_h_michael_mic_add(struct ie
>   u16 fc;
>   struct sk_buff *skb = tx->skb;
>   int authenticator;
> -#if defined(CONFIG_HOSTAPD_WPA_TESTING) || defined(MICHAEL_MIC_HWACCEL)
>   int wpa_test = 0;
> -#endif

When you're touching this, could you #ifdef out wpa_test when
CONFIG_HOSTAPD_WPA_TESTING is not defined? This could be a part of this
patch.

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/5] d80211/bcm43xx: fix build for ARM

2006-08-22 Thread Michael Buesch
On Tuesday 22 August 2006 19:34, David Kimdon wrote:
> ARM targets support udelay(N) where N <= 2000.
> Use mdelay() when N >= 2000.
> 
> Signed-off-by: David Kimdon <[EMAIL PROTECTED]>
> 
> Index: wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
> ===
> --- wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
> +++ wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
> @@ -291,7 +291,7 @@ int bcm43xx_pctl_set_crystal(struct bcm4
>   err = bcm43xx_pci_write_config32(bcm, BCM43xx_PCTL_OUT, out);
>   if (err)
>   goto err_pci;
> - udelay(5000);
> + mdelay(5);

I am going to convert this to msleep anyway. (Patch is almost done)
So please drop this hunk.

>   } else {
>   if (bcm->current_core->rev < 5)
>   return 0;
> Index: wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
> ===
> --- wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
> +++ wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
> @@ -1687,7 +1687,7 @@ int bcm43xx_radio_selectchannel(struct b
>   radio->channel = channel;
>   //XXX: Using the longer of 2 timeouts (8000 vs 2000 usecs). Specs states
>   // that 2000 usecs might suffice.
> - udelay(8000);
> + mdelay(8);
>  
>   return 0;
>  }

Well, yeah. Please resubmit this patch with only this hunk.
I don't like that long delay here. I am searching for a good
solution, but I think we should live with it for now.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/18] d80211: fix some sparse warnings

2006-08-22 Thread Jiri Benc
On Mon, 21 Aug 2006 09:41:19 +0200, Johannes Berg wrote:
> This patch fixes some warnings from sparse in d80211. Also
> fixes indentation in places near where the changes were.

I can't say I like those nearby indentation fixes at this time, but
I can live with it :-)

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/18] d80211: LED triggers

2006-08-22 Thread Jiri Benc
On Tue, 22 Aug 2006 09:54:02 -0700, Jouni Malinen wrote:
> Is someone using these or planning on using them? I have been open to
> just removing all code due to lack of active use.

I would apply the patch - it is an useful feature and it can be compiled
out.

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/18] d80211: get rid of sta_aid in favour of keeping track of TIM

2006-08-22 Thread Jiri Benc
On Mon, 21 Aug 2006 09:41:14 +0200, Johannes Berg wrote:
> I think this is not correct if a STA is removed for which packets
> are buffered, but if it is still wrong then that case was never
> correct to start with if the hw has a set_tim callback.

You're right, good catch.

Just minor things:

> [...]
> - if (num_bits) {
> + if (have_bits) {
>   /* Find largest even number N1 so that bits numbered 1 through
>* (N1 x 8) - 1 in the bitmap are 0 and number N2 so that bits
>* (N2 + 1) x 8 through 2007 are 0. */
>   n1 = 0;
> - for (i = 0; i < sizeof(bitmap); i++) {
> - if (bitmap[i]) {
> + /* 251 = max size of tim bitmap in beacon */
> + for (i = 0; i < 251; i++) {

Please, use a constant here.

> [...]
> @@ -211,13 +213,10 @@ struct ieee80211_if_ap {
>   u8 *generic_elem;
>   size_t generic_elem_len;
>  
> - /* TODO: sta_aid could be replaced by 2008-bit large bitfield of
> -  * that could be used in TIM element generation. This would also
> -  * make TIM element generation a bit faster. */
> - /* AID mapping to station data. NULL, if AID is free. AID is in the
> -  * range 1..2007 and sta_aid[i] corresponds to AID i+1. */
> - struct sta_info *sta_aid[MAX_AID_TABLE_SIZE];
> - int max_aid; /* largest aid currently in use */
> + /* yes, this looks ugly, but guarantees that we can later use
> +  * bitmap_empty :)
> +  * NB: don't ever use set_bit, use bss_tim_set/bss_tim_clear! */
> + u8 tim[sizeof(unsigned long)*BITS_TO_LONGS(MAX_AID_TABLE_SIZE+1)];

Hm, adding spaces here would extend the line above 80 characters... But
this way it doesn't look good. What to do here? I'd prefer leaving the
line a little over 80 chars in this case. What do you think?

> [...]
> --- wireless-dev.orig/net/d80211/sta_info.c   2006-08-20 14:56:17.418192788 
> +0200
> +++ wireless-dev/net/d80211/sta_info.c2006-08-20 14:56:20.588192788 
> +0200
> @@ -424,13 +424,6 @@ void sta_info_remove_aid_ptr(struct sta_
>   sdata = IEEE80211_DEV_TO_SUB_IF(sta->dev);
>   if (sta->aid <= 0 || !sdata->bss)
>   return;
> -
> - sdata->bss->sta_aid[sta->aid - 1] = NULL;
> - if (sta->aid == sdata->bss->max_aid) {
> - while (sdata->bss->max_aid > 0 &&
> -!sdata->bss->sta_aid[sdata->bss->max_aid - 1])
> - sdata->bss->max_aid--;
> - }
>  }

Why are you not calling bss_tim_clear here? Am I missing something?

Also, adding hw->set_tim call here should fix the problem you described
at the beginning of the mail.

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 3/3] kevent: Timer notifications.

2006-08-22 Thread Evgeniy Polyakov
On Mon, Aug 21, 2006 at 04:25:49PM +0200, Thomas Gleixner ([EMAIL PROTECTED]) 
wrote:
> > Not everymachine has them 
> 
> Every machine has hrtimers - not necessarily with high resolution timer
> support, but the core code is there in any case and it is designed to
> provide fine grained timers. 
> 
> In case of high resolution time support one would expect that the "fine
> grained" timer event is actually fine grained.

Ok, I should reformulate, that currently not every machine has support
in kernel. Obviously each machine has a clock which runs faster than
jiffies.
And as a side note - kevents were created half a year ago - there were
no hrtimers in kernel in that time, btw, does kernel have high-resolutin
clock engine already in?

> > and getting into account possibility that
> > userspace can be scheduled away, it will be overkill.
> 
> If you think out your argument then everything which is fine grained or
> high responsive should be removed from userspace access for the very
> same reason. Please look at the existing users of the hrtimer subsystem
> - all of them are exposed to userspace.

Getting into account that system call gets more than 100 nsec, and one
should create kevent and then read it (with at least three rescheduling
- after two syscalls and wake up), it is not exactly the best way to
obtain nanoseconds resolution. And even one usec is good one for
userspace, and I can create an interface through kevents, but let's get
it real - if we still can not agree on other issues, should we do it
right now? I would like kevent core's issues are resolved and everyone
become happy with it before adding new kevent users.

If everyone says "yes, replace usual timers with high-resolution ones",
then ok, I will schedule it for the next patchset.

>   tglx
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 06:57:05PM +0200, Jari Sundell ([EMAIL PROTECTED]) 
wrote:
> On 8/22/06, Nicholas Miell <[EMAIL PROTECTED]> wrote:
> >
> >
> >OK, so with literally a dozen different interfaces to queue events to
> >userspace, all of which are apparently inadequate and in need of
> >replacement by kevent, don't you want to slow down a bit and make sure
> >that the kevent API is correct before it becomes permanent and then just
> >has to be replaced *again* ?
> >
> 
> Not to mention the name used causes (at least me) some confusion with BSD's
> kqueue implementation. Skimming over the patches it actually looks somewhat
> like kqueue with the more interesting features removed, like the ability to
> pass the filter changes simultaneously with polling.

I do not understand, what do you mean?
It is obviously allowed to poll and change kevents at the same time.

> Maybe this is a topic that will singe my fur, but what is wrong with the
> kqueue API? Will I really have to implement support for yet another event
> API in my program.

Why did I not implemented it like Solaris did?
Or FreeBSD did?
It was designed with features mention on AIO homepage in mind, but not
to be compatible with some other implementation.
And why should it be?

> Rakshasa

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/5] d80211/bcm43xx: fix build for ARM

2006-08-22 Thread David Kimdon
ARM targets support udelay(N) where N <= 2000.
Use mdelay() when N >= 2000.

Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Index: wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
===
--- wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
+++ wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_power.c
@@ -291,7 +291,7 @@ int bcm43xx_pctl_set_crystal(struct bcm4
err = bcm43xx_pci_write_config32(bcm, BCM43xx_PCTL_OUT, out);
if (err)
goto err_pci;
-   udelay(5000);
+   mdelay(5);
} else {
if (bcm->current_core->rev < 5)
return 0;
Index: wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
===
--- wireless-dev.orig/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
+++ wireless-dev/drivers/net/wireless/d80211/bcm43xx/bcm43xx_radio.c
@@ -1687,7 +1687,7 @@ int bcm43xx_radio_selectchannel(struct b
radio->channel = channel;
//XXX: Using the longer of 2 timeouts (8000 vs 2000 usecs). Specs states
// that 2000 usecs might suffice.
-   udelay(8000);
+   mdelay(8);
 
return 0;
 }

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/5] d80211: add ioctl to stop data frame tx

2006-08-22 Thread David Kimdon
This ioctl is used when radar is delected on a channel.  Data frames must stop
but management frames must be allowed to continue for some time to communicate
the channel switch to stations.

Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Index: linux-2.6.16/net/d80211/hostapd_ioctl.h
===
--- linux-2.6.16.orig/net/d80211/hostapd_ioctl.h
+++ linux-2.6.16/net/d80211/hostapd_ioctl.h
@@ -93,6 +93,7 @@ enum {
PRISM2_PARAM_SPECTRUM_MGMT = 1044,
PRISM2_PARAM_USER_SPACE_MLME = 1045,
PRISM2_PARAM_MGMT_IF = 1046,
+   PRISM2_PARAM_STOP_DATA_FRAME_TX = 1047,
/* NOTE: Please try to coordinate with other active development
 * branches before allocating new param numbers so that each new param
 * will be unique within all branches and the allocated number will not
Index: linux-2.6.16/net/d80211/ieee80211.c
===
--- linux-2.6.16.orig/net/d80211/ieee80211.c
+++ linux-2.6.16/net/d80211/ieee80211.c
@@ -1240,6 +1240,15 @@ static int ieee80211_tx(struct net_devic
return 0;
}
 
+   if (unlikely(local->stop_data_frame_tx)) {
+   struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
+   u16 fc = le16_to_cpu(hdr->frame_control);
+   if ((fc & IEEE80211_FCTL_FTYPE) == IEEE80211_FTYPE_DATA) {
+   dev_kfree_skb(skb);
+   return 0;
+   }
+   }
+
__ieee80211_tx_prepare(&tx, skb, dev, control);
sta = tx.sta;
tx.u.tx.mgmt_interface = mgmt;
Index: linux-2.6.16/net/d80211/ieee80211_i.h
===
--- linux-2.6.16.orig/net/d80211/ieee80211_i.h
+++ linux-2.6.16/net/d80211/ieee80211_i.h
@@ -532,6 +532,8 @@ struct ieee80211_local {
* (1 << MODE_*) */
 
int user_space_mlme;
+   int stop_data_frame_tx; /* Set to 1 to stop transmission
+* of data frames. */
 };
 
 enum ieee80211_link_state_t {
Index: linux-2.6.16/net/d80211/ieee80211_ioctl.c
===
--- linux-2.6.16.orig/net/d80211/ieee80211_ioctl.c
+++ linux-2.6.16/net/d80211/ieee80211_ioctl.c
@@ -1300,6 +1300,14 @@ static int ieee80211_ioctl_set_radio_ena
 return ieee80211_hw_config(dev);
 }
 
+static int ieee80211_ioctl_set_stop_data_frame_tx(struct net_device *dev,
+ int val)
+{
+   struct ieee80211_local *local = dev->ieee80211_ptr;
+local->stop_data_frame_tx = val;
+return 0;
+}
+
 static int
 ieee80211_ioctl_set_tx_queue_params(struct net_device *dev,
struct prism2_hostapd_param *param)
@@ -2612,6 +2620,9 @@ static int ieee80211_ioctl_prism2_param(
case PRISM2_PARAM_USER_SPACE_MLME:
local->user_space_mlme = value;
break;
+   case PRISM2_PARAM_STOP_DATA_FRAME_TX:
+ret = ieee80211_ioctl_set_stop_data_frame_tx(dev, value);
+   break;
default:
ret = -EOPNOTSUPP;
break;

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/5] d80211: allow for large scan results

2006-08-22 Thread David Kimdon
Fix a problem where incomplete scan results could be returned if the
environment includes a large number of devices.  Do not truncate the
scan results and allow a result to contain more than IW_SCAN_MAX_DATA
bytes.

Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Index: wireless-dev/net/d80211/ieee80211_sta.c
===
--- wireless-dev.orig/net/d80211/ieee80211_sta.c
+++ wireless-dev/net/d80211/ieee80211_sta.c
@@ -2753,6 +2753,10 @@ int ieee80211_sta_scan_results(struct ne
spin_lock_bh(&local->sta_bss_lock);
list_for_each(ptr, &local->sta_bss_list) {
bss = list_entry(ptr, struct ieee80211_sta_bss, list);
+   if (buf + len - current_ev <= IW_EV_ADDR_LEN) {
+   spin_unlock_bh(&local->sta_bss_lock);
+   return -E2BIG;
+   }
current_ev = ieee80211_sta_scan_result(dev, bss, current_ev,
   end_buf);
}
Index: wireless-dev/net/d80211/ieee80211_ioctl.c
===
--- wireless-dev.orig/net/d80211/ieee80211_ioctl.c
+++ wireless-dev/net/d80211/ieee80211_ioctl.c
@@ -1998,7 +1998,7 @@ static int ieee80211_ioctl_giwscan(struc
struct ieee80211_local *local = dev->ieee80211_ptr;
if (local->sta_scanning)
return -EAGAIN;
-   res = ieee80211_sta_scan_results(dev, extra, IW_SCAN_MAX_DATA);
+   res = ieee80211_sta_scan_results(dev, extra, data->length);
if (res >= 0) {
data->length = res;
return 0;

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/5] d80211 patches

2006-08-22 Thread David Kimdon
Hi,

Here are some patches for d80211.

Thanks,

David

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/5] d80211: fix interface removal

2006-08-22 Thread David Kimdon
Calls to ieee80211_if_remove() should use the ieee80211 interface types.
Convert interface type from hostapd to ieee80211 format.

Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Index: wireless-dev/net/d80211/ieee80211_ioctl.c
===
--- wireless-dev.orig/net/d80211/ieee80211_ioctl.c
+++ wireless-dev/net/d80211/ieee80211_ioctl.c
@@ -1076,14 +1076,21 @@ static int ieee80211_ioctl_add_if(struct
 static int ieee80211_ioctl_remove_if(struct net_device *dev,
 struct prism2_hostapd_param *param)
 {
-   if (param->u.if_info.type != HOSTAP_IF_WDS &&
-   param->u.if_info.type != HOSTAP_IF_VLAN &&
-   param->u.if_info.type != HOSTAP_IF_BSS &&
-   param->u.if_info.type != HOSTAP_IF_STA) {
-return -EINVAL;
+   unsigned int type;
+
+   if (param->u.if_info.type == HOSTAP_IF_WDS) {
+   type = IEEE80211_IF_TYPE_WDS;
+   } else if (param->u.if_info.type == HOSTAP_IF_VLAN) {
+   type = IEEE80211_IF_TYPE_VLAN;
+   } else if (param->u.if_info.type == HOSTAP_IF_BSS) {
+   type = IEEE80211_IF_TYPE_AP;
+   } else if (param->u.if_info.type == HOSTAP_IF_STA) {
+   type = IEEE80211_IF_TYPE_STA;
+   } else {
+return -EINVAL;
}
-   return ieee80211_if_remove(dev, param->u.if_info.name,
-  param->u.if_info.type);
+
+   return ieee80211_if_remove(dev, param->u.if_info.name, type);
 }
 
 

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/5] d80211: fix multiple device support

2006-08-22 Thread David Kimdon
Fix interpretation of dev_alloc_name() return value.  dev_alloc_name()
returns the number of the unit assigned or a negative errno code.

Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Index: wireless-dev/net/d80211/ieee80211_iface.c
===
--- wireless-dev.orig/net/d80211/ieee80211_iface.c
+++ wireless-dev/net/d80211/ieee80211_iface.c
@@ -64,7 +64,7 @@ int ieee80211_if_add(struct net_device *
} while (i < 1);
} else if (format) {
ret = dev_alloc_name(ndev, name);
-   if (ret)
+   if (ret < 0)
goto fail;
} else {
snprintf(ndev->name, IFNAMSIZ, "%s", name);

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pcnet32: break in 2.6.18-rc1 identified

2006-08-22 Thread Don Fry
A change I made for 2.6.17 and another for 2.6.18 do not work on older
pcnet32 chips which I do not have access to.  If the chip is a 79C970 or
79C965, do not try and suspend or check the link status.
I have tested with a 79C970A, 79C971, 79C972, 79C973, 79C975, 79C976,
and 79C978.

Please apply to 2.6.18.

Signed-off-by:  Don Fry <[EMAIL PROTECTED]>

--- linux-2.6.18-rc3-git1/drivers/net/orig.pcnet32.cTue Aug  1 14:47:07 2006
+++ linux-2.6.18-rc3-git1/drivers/net/pcnet32.c Thu Aug  3 08:36:26 2006
@@ -202,6 +202,8 @@ static int homepna[MAX_UNITS];
 #define CSR15  15
 #define PCNET32_MC_FILTER  8
 
+#define PCNET32_79C970A0x2621
+
 /* The PCNET32 Rx and Tx ring descriptors. */
 struct pcnet32_rx_head {
u32 base;
@@ -289,6 +291,7 @@ struct pcnet32_private {
 
/* each bit indicates an available PHY */
u32 phymask;
+   unsigned short  chip_version;   /* which variant this is */
 };
 
 static int pcnet32_probe_pci(struct pci_dev *, const struct pci_device_id *);
@@ -724,9 +727,11 @@ static u32 pcnet32_get_link(struct net_d
spin_lock_irqsave(&lp->lock, flags);
if (lp->mii) {
r = mii_link_ok(&lp->mii_if);
-   } else {
+   } else if (lp->chip_version >= PCNET32_79C970A) {
ulong ioaddr = dev->base_addr;  /* card base I/O address */
r = (lp->a.read_bcr(ioaddr, 4) != 0xc0);
+   } else {/* can not detect link on really old chips */
+   r = 1;
}
spin_unlock_irqrestore(&lp->lock, flags);
 
@@ -1091,6 +1096,10 @@ static int pcnet32_suspend(struct net_de
ulong ioaddr = dev->base_addr;
int ticks;
 
+   /* really old chips have to be stopped. */
+   if (lp->chip_version < PCNET32_79C970A)
+   return 0;
+
/* set SUSPEND (SPND) - CSR5 bit 0 */
csr5 = a->read_csr(ioaddr, CSR5);
a->write_csr(ioaddr, CSR5, csr5 | CSR5_SUSPEND);
@@ -1529,6 +1538,7 @@ pcnet32_probe1(unsigned long ioaddr, int
lp->mii_if.reg_num_mask = 0x1f;
lp->dxsuflo = dxsuflo;
lp->mii = mii;
+   lp->chip_version = chip_version;
lp->msg_enable = pcnet32_debug;
if ((cards_found >= MAX_UNITS)
|| (options[cards_found] > sizeof(options_mapping)))
@@ -1839,10 +1849,7 @@ static int pcnet32_open(struct net_devic
val |= 2;
} else if (lp->options & PCNET32_PORT_ASEL) {
/* workaround of xSeries250, turn on for 79C975 only */
-   i = ((lp->a.read_csr(ioaddr, 88) |
- (lp->a.
-  read_csr(ioaddr, 89) << 16)) >> 12) & 0x;
-   if (i == 0x2627)
+   if (lp->chip_version == 0x2627)
val |= 3;
}
lp->a.write_bcr(ioaddr, 9, val);
@@ -1986,9 +1993,11 @@ static int pcnet32_open(struct net_devic
 
netif_start_queue(dev);
 
-   /* Print the link status and start the watchdog */
-   pcnet32_check_media(dev, 1);
-   mod_timer(&(lp->watchdog_timer), PCNET32_WATCHDOG_TIMEOUT);
+   if (lp->chip_version >= PCNET32_79C970A) {
+   /* Print the link status and start the watchdog */
+   pcnet32_check_media(dev, 1);
+   mod_timer(&(lp->watchdog_timer), PCNET32_WATCHDOG_TIMEOUT);
+   }
 
i = 0;
while (i++ < 100)

-- 
Don Fry
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Jari Sundell

Not to mention the name used causes (at least me) some confusion with
BSD's kqueue implementation. Skimming over the patches it actually
looks somewhat like kqueue with the more interesting features removed,
like the ability to pass the filter changes simultaneously with
polling.

Maybe this is a topic that will singe my fur, but what is wrong with
the kqueue API? Will I really have to implement support for yet
another event API in my program.

Rakshasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 3/3] kevent: Timer notifications.

2006-08-22 Thread Thomas Gleixner
On Mon, 2006-08-21 at 15:18 +0400, Evgeniy Polyakov wrote:
> On Mon, Aug 21, 2006 at 12:12:39PM +0100, Christoph Hellwig ([EMAIL 
> PROTECTED]) wrote:
> > On Mon, Aug 21, 2006 at 02:19:49PM +0400, Evgeniy Polyakov wrote:
> > > 
> > > 
> > > Timer notifications.
> > > 
> > > Timer notifications can be used for fine grained per-process time 
> > > management, since interval timers are very inconvenient to use, 
> > > and they are limited.
>
> > Shouldn't this at leat use a hrtimer?
> 
> Not everymachine has them 

Every machine has hrtimers - not necessarily with high resolution timer
support, but the core code is there in any case and it is designed to
provide fine grained timers. 

In case of high resolution time support one would expect that the "fine
grained" timer event is actually fine grained.

> and getting into account possibility that
> userspace can be scheduled away, it will be overkill.

If you think out your argument then everything which is fine grained or
high responsive should be removed from userspace access for the very
same reason. Please look at the existing users of the hrtimer subsystem
- all of them are exposed to userspace.

tglx


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/18] d80211: LED triggers

2006-08-22 Thread Jouni Malinen
On Mon, Aug 21, 2006 at 09:41:08AM +0200, Johannes Berg wrote:
> This patch makes d80211 export LED triggers for rx/tx and introduces
> functions to allow device drivers to query the trigger names for setting
> default triggers. It also cleans up the Makefile LED related stuff.

Is someone using these or planning on using them? I have been open to
just removing all code due to lack of active use.

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/18] d80211: clean up exports

2006-08-22 Thread Jouni Malinen
On Mon, Aug 21, 2006 at 09:41:15AM +0200, Johannes Berg wrote:
> This puts all EXPORT_SYMBOL() macros along with the function being exported,
> and changes some exports that are only relevant to rate control modules
> and to be GPL-only because they rate control modules need to be built against
> the internal ieee80211_i.h header.

Moving the EXPORT_SYMBOL definitions sounds good, but I would like to
keep changes between EXPORT_SYMBOL and EXPORT_SYMBOL_GPL separate from
this kind of cleanup. In addition, I'm not personally a huge fan of the
EXPORT_SYMBOL_GPL in the first place since I believe the GPL should
cover this without additional changes in the source code. In other
words, I would prefer that the EXPORT_SYMBOL would not be changed to
EXPORT_SYMBOL_GPL here.

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] IP100A: Solve host error problem in low performance embedded system when continune down and up.

2006-08-22 Thread Randy.Dunlap
On Tue, 22 Aug 2006 14:31:32 -0400 Jesse Huang wrote:

> From: Jesse Huang <[EMAIL PROTECTED]>
> 
> Change Logs:
>- Solve host error problem in low performance embedded 
>  system when continune down and up.
> 
> Signed-off-by: Jesse Huang <[EMAIL PROTECTED]>
> 
> ---
> 
>  sundance.c |   30 +-
>  1 files changed, 25 insertions(+), 5 deletions(-)

Full path/file names above and below, please.

> a88c635933a981dd4fca87e5b8ca9426c5c98013
> diff --git a/sundance.c b/sundance.c
> index 424aebd..de55e0f 100755
> --- a/sundance.c
> +++ b/sundance.c
> @@ -1647,6 +1647,14 @@ static int netdev_close(struct net_devic
>   struct sk_buff *skb;
>   int i;
>  
> + /* Wait and kill tasklet */
> + tasklet_kill(&np->rx_tasklet);
> + tasklet_kill(&np->tx_tasklet);
> +   np->cur_tx = 0;
> +   np->dirty_tx = 0;

Use same indentation/whitespace as surrounding code.
(tabs, not spaces)

> + np->cur_task = 0;
> + np->last_tx = 0;
> +
>   netif_stop_queue(dev);
>  
>   if (netif_msg_ifdown(np)) {
> @@ -1667,9 +1675,20 @@ static int netdev_close(struct net_devic
>   /* Stop the chip's Tx and Rx processes. */
>   iowrite16(TxDisable | RxDisable | StatsDisable, ioaddr + MACCtrl1);
>  
> - /* Wait and kill tasklet */
> - tasklet_kill(&np->rx_tasklet);
> - tasklet_kill(&np->tx_tasklet);
> +for (i = 2000; i > 0; i--) {
> +  if ((ioread32(ioaddr + DMACtrl) &0xC000) == 0)
> + break;
> +  mdelay(1);
> +}
> +
> +iowrite16(GlobalReset | DMAReset | FIFOReset |NetworkReset, ioaddr 
> +ASICCtrl + 2);
> +
> +for (i = 2000; i > 0; i--)
> +{
> +  if ((ioread16(ioaddr + ASICCtrl +2) &ResetBusy) == 0)
> + break;
> +  mdelay(1);
> +}

Same comment about indentation/whitespace.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] add nl80211

2006-08-22 Thread Johannes Berg
On Tue, 2006-08-22 at 15:52 +0200, Johannes Berg wrote:
> + /* make it complain if wiphy is set and is different or invalid */
> + if (info->attrs[NL80211_ATTR_IFINDEX]) {
> + wiphy = nla_get_u32(info->attrs[NL80211_ATTR_IFINDEX]);

That's what you get for writing code too quickly ;)
Both of these should be NL80211_ATTR_WIPHY of course!

johannes
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread James Morris
On Tue, 22 Aug 2006, Nicholas Miell wrote:

> In this brave new world of always stable kernel development, the time a
> new interface has for public testing before a new kernel release is
> drastically shorter than the old unstable development series, and if
> nobody is documenting how this stuff is supposed to work and
> demonstrating how it will be used, then mistakes are bound to slip
> through.

Feel free to provide the documentation.  Perhaps, even as much as you've 
written so far in these emails would be enough.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wireless-dev: relax sysfs permissions

2006-08-22 Thread Jiri Benc
On Wed, 16 Aug 2006 15:49:45 +0200, Johannes Berg wrote:
> The sysfs attributes add_iface and remove_iface both check for
> CAP_NET_ADMIN whenever something is written. Hence, permissions for the
> files should be relaxed so that someone who is not root but happens to
> have CAP_NET_ADMIN can do things.

I'm not sure about this. Greg, what's the policy here?

Note that there is also another way for adding and removing interfaces -
via nl80211 netlink interface (not finished yet but will go in shortly).

> 
> Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>
> 
> --- wireless-dev.orig/net/d80211/ieee80211_sysfs.c2006-08-16 
> 15:45:41.0 +0200
> +++ wireless-dev/net/d80211/ieee80211_sysfs.c 2006-08-16 15:46:05.0 
> +0200
> @@ -195,8 +195,8 @@
>  __IEEE80211_LOCAL_SHOW(rate_ctrl_alg);
>  
>  static struct class_device_attribute ieee80211_class_dev_attrs[] = {
> - __ATTR(add_iface, S_IWUSR, NULL, store_add_iface),
> - __ATTR(remove_iface, S_IWUSR, NULL, store_remove_iface),
> + __ATTR(add_iface, S_IWUGO, NULL, store_add_iface),
> + __ATTR(remove_iface, S_IWUGO, NULL, store_remove_iface),
>   __ATTR(channel, S_IRUGO, ieee80211_local_show_channel, NULL),
>   __ATTR(frequency, S_IRUGO, ieee80211_local_show_frequency, NULL),
>   __ATTR(radar_detect, S_IRUGO, ieee80211_local_show_radar_detect, NULL),
> 
> 

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16.19 2/2] LARTC: trace control for netem: kernelspace

2006-08-22 Thread Rainer Baumann
Trace Control for Netem: Emulate network properties such as
long-dependency and self-similarity of cross-traffic.

The delay, drop, duplication and corruption values are readout in user
space and sent to kernel space via procfs.
The kernel determines the time when new values should be sent by the use
of SIGSTOP and SIGCONT signals.
In order to have always packet action values ready to apply, there are
two buffers that hold these values.
Packet action values can be read from one buffer and the other buffer
can be refilled with new values simultaneously.
If a buffer is empty it will be switched to the other buffer and a
SIGCONT signal is sent in order to receive new packet action values.

Having applied the delay value to a packet, the packet gets processed by
the original netem functions.

Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]>

---

Patch for linux kernel 2.6.16.19: http://tcn.hypert.net/tcnKernel.patch




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16.19 1/2] LARTC: trace control for netem: userspace

2006-08-22 Thread Rainer Baumann
Trace Control for Netem: Emulate network properties such as
long-dependency and self-similarity of cross-traffic.

The directory tc/netem was split in two parts, one containing the
original distributions and the other the tools to generate trace files
as well as the program responsible for reading the delay values from the
trace file and sending them to the kernel (called flowseed).
If the trace option is set, netem starts the flowseedprocess and
initializes the kernel. To be able to kill the flowseedprocess, in case
the command was faulty, the PID of the flowseedprocess is passed to the
netem kernel module. If the kernel receives packet delay data from a not
registered PID, the Process will be killed. The flowseedprocess does not
send data to the kernel until the registration is completed.

Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]>

---

Patch for iproute2-2.6.16-060323: http://tcn.hypert.net/tcnIproute.patch




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16.19 0/2] LARTC: trace control for netem

2006-08-22 Thread Rainer Baumann
This is the revised trace extension to the network emulator netem.
This extension provides emulation control based on pregenerated traces.

We first submitted this patch on 2nd of August, in the mean time 
we integrated the comments from Stephen and fixed the listed things.

Cheers,
Rainer


 Original Message 
Subject:Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem
Date:   Wed, 2 Aug 2006 11:19:21 -0700
From:   Stephen Hemminger <[EMAIL PROTECTED]>
To: Rainer Baumann <[EMAIL PROTECTED]>
CC: netdev@VGER.KERNEL.ORG, [EMAIL PROTECTED]
References: <[EMAIL PROTECTED]>

> On Wed, 02 Aug 2006 19:21:27 +0200
> > Rainer Baumann <[EMAIL PROTECTED]> wrote:
> >
> >   
>> >> Hi,
>> >>
>> >> We developed an extension to the network emulator netem, that provides
>> >> emulation of long term network properties such as long-range dependence
>> >> and self-similarity of cross-traffic. It is not possible to emulate
>> >> these properties with the  statistical tables for the packet delay
>> >> values used by the original netem.
>> >>
>> >> We read the values for the packet delay, drop, loss and corruption from
>> >> a pre-generated trace file. This trace file is obtained by monitoring
>> >> network traffic and writing all actions to a trace file. During the
>> >> emulation the packets get processed according the values in such a trace
>> >> file. Detailed information are available on our
>> >> Webseitehttp://tcn.hypert.net
>> >>
>> >> A new option (trace) has been added to the netem command. If the trace
>> >> option is used, the values for packet delay etc. are read from a trace
>> >> file, afterwards the packets are processed by the normal netem functions.
>> >> The packet action values are readout from the trace file in user space
>> >> and sent to kernel space via procfs.
>> >>
>> >> The evaluation results show similar behavior for our enhancement and the
>> >> original netem with respect to packet delay precision and packet loss at
>> >> high load (e.g. 80'000 packets per second).
>> >> It is possible to add, change or delete multiple netem qdiscs on-the-fly
>> >> (original netem qdiscs and trace qdiscs mixed).
>> >>
>> >> We are looking forward for any comments, feedback and suggestions!
>> >>
>> >> Thanks,
>> >> Rainer
>> >> 
> >
> > I like the idea and want to get it incorporated.
> >
> > Major things that need fixing:
> > * Don't extend size of tc_netem_qopt instead use a new netlink
> >   payload.
> > + add type to TCA_NETEM_ enum
> > + new structure containing the payload
> >   This allows for binary compatiablity.
> >
> > * Don't use proc for a interface to netem features. Use netlink.
> >   Either add a new command (or option) to the iproute2 commands
> >   to handle flow table, or add a new payload.
> >
> >
> > Minor stuff:
> > * the bzero macro in netem is a BSDism, just use memset
> > * bad indentation and style issues.
> > * minor whitespace damage in several places in patch
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >   


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take9 2/2] kevent: poll/select() notifications. Timer notifications.

2006-08-22 Thread Davide Libenzi

On Wed, 16 Aug 2006, Christoph Hellwig wrote:


On Mon, Aug 14, 2006 at 10:21:36AM +0400, Evgeniy Polyakov wrote:


poll/select() notifications. Timer notifications.

This patch includes generic poll/select and timer notifications.

kevent_poll works simialr to epoll and has the same issues (callback
is invoked not from internal state machine of the caller, but through
process awake).


I'm not a big fan of duplicating code over and over.  kevent is a candidate
for a generic event devlivery mechanisms which is a _very_ good thing.  But
starting that system by duplicating existing functionality is not very nice.

What speaks against a patch the recplaces the epoll core by something that
build on kevent while still supporting the epoll interface as a compatibility
shim?


Sorry, I'm catching up with a huge post-vacation backlog, so I didn't have 
the time to look at the source code. But, if kevent performance is same or 
better, and the external epoll interface is fully supported, than I think 
the shim layer idea is a good one. Provided the shim being smaller than 
eventpoll.c :)




- Davide


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.19 PATCH 3/7] ehea: queue management

2006-08-22 Thread Jan-Bernd Themann
Hi,

On Tuesday 22 August 2006 16:01, Arnd Bergmann wrote:
> > +   u64 rpage = 0;
> > +   int ret;
> > +   int cnt = 0;
> > +   void *vpage = NULL;
> > +
> > +   ret = hw_queue_ctor(hw_queue, nr_pages, EHEA_PAGESIZE, wqe_size);
> > +   if (ret)
> > +   return ret;
> > +
> > +   for (cnt = 0; cnt < nr_pages; cnt++) {
> > +   vpage = hw_qpageit_get_inc(hw_queue);
> > +   if (!vpage) {
> > +   ehea_error("hw_qpageit_get_inc failed");
> > +   goto qp_alloc_register_exit0;
> > +   }
> > +   rpage = virt_to_abs(vpage);
> 
> As someone mentioned before, the initialization to 0 or NULL
> is pointless here, as the variables are always assigned before
> they are used. There are a number of other places in your
> code that do similar things, you should probably go through
> these and remove the initializers.
> 
> If you indeed need something to be initialized, it is good practice
> to do the initialization as late as possible, e.g.
> 
>   int foo;
>   ...
>   foo = 0;
>   do_foo(foo);
> 
> to make it clear that you have a reason to initialize it.
> 
>   Arnd <><
> 

Agreed. We started to remove some but apparrently not all.
We'll go through the code and remove them where possible.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add wireless statics to bcm43xx-d80211

2006-08-22 Thread Jiri Benc
On Mon, 14 Aug 2006 08:29:08 -0500, Larry Finger wrote:
> This patch implements wireless statistics for bcm43xx using the d80211 
> stack.It 
> also sets a framework for the implementation in other drivers that use the 
> d80211 code. The component parts have been circulated on the netdev mailing 
> list, and all suggested changes have been incorporated. The specific changes 
> are 
> as follows:

Please, separate the d80211 part and the bcm43xx-d80211 part into
two patches.

> --- a/include/net/d80211.h
> +++ b/include/net/d80211.h
> @@ -205,6 +205,9 @@ struct ieee80211_rx_status {
>  int channel;
>  int phymode;
>  int ssi;
> + int maxssi;

Why is maxssi here? Can it really change between received frames?

> [...]
> --- a/net/d80211/ieee80211.c
> +++ b/net/d80211/ieee80211.c
> @@ -3174,6 +3174,9 @@ ieee80211_rx_h_sta_process(struct ieee80
>   sta->rx_fragments++;
>   sta->rx_bytes += rx->skb->len;
>   sta->last_rssi = rx->u.rx.status->ssi;
> + sta->last_signal = rx->u.rx.status->signal;
> + sta->last_noise = rx->u.rx.status->noise;
> + sta->max_rssi = rx->u.rx.status->maxssi;

Again, I see no reason why max_rssi should be in sta structure.

> [...]
> --- a/net/d80211/ieee80211_i.h
> +++ b/net/d80211/ieee80211_i.h
> @@ -337,6 +337,9 @@ struct ieee80211_local {
>   struct net_device *apdev; /* wlan#ap - management frames (hostapd) */
>   int open_count;
>   int monitors;
> + int link_quality;
> + int noise;
> + struct iw_statistics wstats;

Why are these three variables in ieee80211_local? They are not used
anywhere.

> [...]
> --- a/net/d80211/ieee80211_ioctl.c
> +++ b/net/d80211/ieee80211_ioctl.c
> @@ -1580,6 +1580,16 @@ static int ieee80211_ioctl_giwrange(stru
>   range->min_frag = 256;
>   range->max_frag = 2346;
>  
> + range->max_qual.qual = 100;
> + range->max_qual.level = 152;  /* set floor at -104 dBm (152 - 256) */

I would suggest using -110 dBm as a floor (to be compatible with RCPI
definition, see mail from Simon Barber describing it). Or is there any
particular reason for -104 dBm?

> [...]
> --- a/net/d80211/sta_info.h
> +++ b/net/d80211/sta_info.h
> @@ -82,6 +82,9 @@ struct sta_info {
>   unsigned long rx_dropped; /* number of dropped MPDUs from this STA */
>  
>   int last_rssi; /* RSSI of last received frame from this STA */
> + int last_signal; /* signal of last received frame from this STA */
> + int last_noise; /* noise of last received frame from this STA */

Add these two variables also to sysfs, please.

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] make d80211 use nl80211

2006-08-22 Thread Johannes Berg
This patch makes d80211 partially configurable using the
infrastructure that nl80211 provides. So far, it allows
packet injection and adding/removing virtual interfaces.

Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>

--- wireless-dev.orig/net/d80211/Kconfig2006-08-22 15:47:46.0 
+0200
+++ wireless-dev/net/d80211/Kconfig 2006-08-22 15:47:48.0 +0200
@@ -3,6 +3,7 @@ config D80211
select CRYPTO
select CRYPTO_ARC4
select CRYPTO_AES
+   select NETLINK_80211
---help---
This option enables the hardware independent IEEE 802.11
networking stack.
--- wireless-dev.orig/net/d80211/Makefile   2006-08-22 15:47:44.0 
+0200
+++ wireless-dev/net/d80211/Makefile2006-08-22 15:47:48.0 +0200
@@ -8,6 +8,7 @@ obj-$(CONFIG_D80211) += 80211.o rate_con
sta_info.o \
wep.o \
wpa.o \
+   ieee80211_cfg.o \
ieee80211_scan.o \
ieee80211_sta.o \
ieee80211_dev.o \
--- wireless-dev.orig/net/d80211/ieee80211.c2006-08-22 15:47:46.0 
+0200
+++ wireless-dev/net/d80211/ieee80211.c 2006-08-22 15:47:48.0 +0200
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include "wme.h"
 #include "aes_ccm.h"
 #include "ieee80211_led.h"
+#include "ieee80211_cfg.h"
 
 /* See IEEE 802.1H for LLC/SNAP encapsulation/decapsulation */
 /* Ethernet-II snap header (RFC1042 for most EtherTypes) */
@@ -348,6 +350,16 @@ ieee80211_tx_h_rate_ctrl(struct ieee8021
 {
struct rate_control_extra extra;
 
+   /* FIXME
+   if (tx->dev == tx->local->mdev &&
+   (inject rate set)) {
+   a
+   tx->u.tx.rate = ...
+   etc etc
+   return TXRX_CONTINUE;
+   }
+   */
+
memset(&extra, 0, sizeof(extra));
extra.mgmt_data = tx->sdata &&
tx->sdata->type == IEEE80211_IF_TYPE_MGMT;
@@ -753,6 +765,13 @@ ieee80211_tx_h_misc(struct ieee80211_txr
u16 dur;
struct ieee80211_tx_control *control = tx->u.tx.control;
 
+   /* FIXME
+   if (tx->dev == tx->local->mdev) {
+   set up retry limit, ...
+   based on injection parameters
+   }
+   */
+
if (!is_multicast_ether_addr(hdr->addr1)) {
if (tx->skb->len + FCS_LEN > tx->local->rts_threshold &&
tx->local->rts_threshold < IEEE80211_MAX_RTS_THRESHOLD) {
@@ -878,6 +897,9 @@ ieee80211_tx_h_check_assoc(struct ieee80
 #endif /* CONFIG_D80211_VERBOSE_DEBUG */
u32 sta_flags;
 
+   if (unlikely(tx->dev == tx->local->mdev))
+   return TXRX_CONTINUE;
+
if (unlikely(tx->local->sta_scanning != 0) &&
((tx->fc & IEEE80211_FCTL_FTYPE) != IEEE80211_FTYPE_MGMT ||
 (tx->fc & IEEE80211_FCTL_STYPE) != IEEE80211_STYPE_PROBE_REQ))
@@ -981,6 +1003,12 @@ static void purge_old_ps_buffers(struct 
 static inline ieee80211_txrx_result
 ieee80211_tx_h_multicast_ps_buf(struct ieee80211_txrx_data *tx)
 {
+   /* FIXME
+   if (unlikely(tx->dev == tx->local->mdev &&
+   (inject flags) & NL80211_FLAG_NOBUFFER))
+   return TXRX_CONTINUE;
+   */
+
/* broadcast/multicast frame */
/* If any of the associated stations is in power save mode,
 * the frame is buffered to be sent after DTIM beacon frame */
@@ -1408,11 +1436,12 @@ static int ieee80211_master_start_xmit(s
 
control.ifindex = odev->ifindex;
control.type = osdata->type;
-   control.req_tx_status = pkt_data->req_tx_status;
-   control.do_not_encrypt = pkt_data->do_not_encrypt;
+   control.req_tx_status = !!(pkt_data->flags & NL80211_FLAG_TXSTATUS);
+   control.do_not_encrypt = !(pkt_data->flags & NL80211_FLAG_ENCRYPT);
control.pkt_type =
-   pkt_data->pkt_probe_resp ? PKT_PROBE_RESP : PKT_NORMAL;
-   control.requeue = pkt_data->requeue;
+   (pkt_data->internal_flags & TX_FLAG_PROBERESP) ?
+   PKT_PROBE_RESP : PKT_NORMAL;
+   control.requeue = !!(pkt_data->internal_flags & TX_FLAG_REQUEUE);
control.queue = pkt_data->queue;
 
ret = ieee80211_tx(odev, skb, &control,
@@ -1588,8 +1617,10 @@ static int ieee80211_subif_start_xmit(st
pkt_data = (struct ieee80211_tx_packet_data *)skb->cb;
memset(pkt_data, 0, sizeof(struct ieee80211_tx_packet_data));
pkt_data->ifindex = sdata->dev->ifindex;
-   pkt_data->mgmt_iface = (sdata->type == IEEE80211_IF_TYPE_MGMT);
-   pkt_data->do_not_encrypt = no_encrypt;
+   if (sdata->type == IEEE80211_IF_TYPE_MGMT)
+   pkt_data->internal_flags |= TX_FLAG_INJECTED;
+   if (!no_encrypt)
+   pkt_data->flags |= NL80211_FLAG_ENCRYPT;
 
skb->dev = sdata->master;
sdata->stats.tx_packets++;
@@ -1640,11 +1671,12 @@ ieee80211_mgmt_start_xmit(struct sk_buff
pkt_data = (struct ieee80211_

[RFC] add nl80211

2006-08-22 Thread Johannes Berg
This patch adds nl80211, a netlink based configuration
system for wireless hardware.

It currently features a few helper commands and commands to
add and remove virtual interfaces and to inject packets.
Support for nl80211 in d80211 is in a follow-up patch.

It requires the patches in
http://marc.theaimsgroup.com/?l=linux-netdev&m=115625436628696&w=2
and
http://marc.theaimsgroup.com/?l=linux-netdev&m=115625168405439&w=2

(the latter doesn't apply cleanly against wireless-dev, but you can
safely ignore the pieces that don't, at least for wireless testing :) )

Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>

--- /dev/null   1970-01-01 00:00:00.0 +
+++ wireless-dev/include/net/nl80211.h  2006-08-22 15:47:47.0 +0200
@@ -0,0 +1,79 @@
+#ifndef __NET_NL80211_H
+#define __NET_NL80211_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * 802.11 netlink in-kernel interface
+ *
+ * Copyright 2006 Johannes Berg <[EMAIL PROTECTED]>
+ */
+
+/**
+ * struct nl80211_ops - backend description for wireless configuration
+ *
+ * This struct is registered by fullmac card drivers and/or wireless stacks
+ * in order to handle configuration requests on their interfaces.
+ *
+ * The priv pointer passed to each call is the pointer that was
+ * registered in nl80211_register_driver().
+ *
+ * All callbacks except where otherwise noted should return 0
+ * on success or a negative error code.
+ *
+ * @list_interfaces: for each interfaces belonging to the wiphy identified
+ *  by the priv pointer, call the one() function with the
+ *  given data and the ifindex. This callback is required.
+ *
+ * @inject_packet: inject the given frame with the NL80211_FLAG_*
+ *flags onto the given queue.
+ *
+ * @add_virtual_intf: create a new virtual interface with the given name
+ *
+ * @del_virtual_intf: remove the virtual interface determined by ifindex.
+ */
+struct nl80211_ops {
+   int (*list_interfaces)(void *priv, void *data,
+  int (*one)(void *data, int ifindex));
+   int (*inject_packet)(void *priv, void *frame, int framelen,
+u32 flags, int queue);
+
+   int (*add_virtual_intf)(void *priv, char *name);
+   int (*del_virtual_intf)(void *priv, int ifindex);
+
+   /* more things to be added...
+*
+* for a (*configure)(...) call I'd probably guess that the
+* best bet would be to have one call that returns all
+* possible options, one that sets them based on the
+* struct genl_info *info, and one for that optimised
+* set-at-once thing.
+*/
+};
+
+/*
+ * register a given method structure with the nl80211 system
+ * and associate the 'priv' pointer with it.
+ * NOTE: for proper operation, this priv pointer MUST also be
+ * assigned to each &struct net_device's @ieee80211_ptr member!
+ */
+extern int nl80211_register(struct nl80211_ops *ops, void *priv);
+/*
+ * unregister a device with the given priv pointer.
+ * After this call, no more requests can be made with this priv
+ * pointer, but the call may sleep to wait for an outstanding
+ * request that is being handled.
+ */
+extern void nl80211_unregister(void *priv);
+
+/* helper functions */
+extern void *nl80211hdr_put(struct sk_buff *skb, u32 pid,
+   u32 seq, int flags, u8 cmd);
+extern void *nl80211msg_new(struct sk_buff **skb, u32 pid,
+   u32 seq, int flags, u8 cmd);
+
+#endif /* __NET_NL80211_H */
--- wireless-dev.orig/net/Kconfig   2006-08-22 15:47:32.0 +0200
+++ wireless-dev/net/Kconfig2006-08-22 15:47:47.0 +0200
@@ -250,6 +250,9 @@ source "net/ieee80211/Kconfig"
 config WIRELESS_EXT
bool
 
+config NETLINK_80211
+   tristate
+
 endif   # if NET
 endmenu # Networking
 
--- wireless-dev.orig/net/Makefile  2006-08-22 15:47:32.0 +0200
+++ wireless-dev/net/Makefile   2006-08-22 15:47:47.0 +0200
@@ -44,6 +44,7 @@ obj-$(CONFIG_ECONET)  += econet/
 obj-$(CONFIG_VLAN_8021Q)   += 8021q/
 obj-$(CONFIG_IP_DCCP)  += dccp/
 obj-$(CONFIG_IP_SCTP)  += sctp/
+obj-$(CONFIG_NETLINK_80211)+= wireless/
 obj-$(CONFIG_D80211)   += d80211/
 obj-$(CONFIG_IEEE80211)+= ieee80211/
 obj-$(CONFIG_TIPC) += tipc/
--- /dev/null   1970-01-01 00:00:00.0 +
+++ wireless-dev/net/wireless/Makefile  2006-08-22 15:47:47.0 +0200
@@ -0,0 +1,4 @@
+obj-$(CONFIG_NETLINK_80211) += cfg80211.o
+
+cfg80211-objs := \
+   nl80211.o
--- /dev/null   1970-01-01 00:00:00.0 +
+++ wireless-dev/net/wireless/nl80211.c 2006-08-22 15:47:47.0 +0200
@@ -0,0 +1,515 @@
+/*
+ * This is the new netlink-based wireless configuration interface.
+ *
+ * Copyright 2006 Johannes Berg <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Joh

Re: [Lksctp-developers] [PATCH 3/5][SCTP]: Remove multiple levels of msecs to jiffies conversions.

2006-08-22 Thread Vlad Yasevich
On Tue August 22 2006 03:22, David Miller wrote:
> From: Sridhar Samudrala <[EMAIL PROTECTED]>
> Date: Fri, 18 Aug 2006 11:22:37 -0700
>
> > [SCTP]: Remove multiple levels of msecs to jiffies conversions.
> >
> > The SCTP sysctl entries are displayed in milliseconds, but stored
> > internally in jiffies. This results in multiple levels of msecs to
> > jiffies conversion and as a result produces a truncation error. This
> > patch makes things consistent in that we store and display defaults
> > in milliseconds and only convert once for use by association.
> > This patch also adds some sane min/max values so that we don't go off
> > the deep end.
> >
> > Signed-off-by: Vladislav Yasevich <[EMAIL PROTECTED]>
> > Signed-off-by: Sridhar Samudrala <[EMAIL PROTECTED]>
>
> This cannot be done.
>
> The syctl values have a fixed format, like any other portion
> of the API exposed to userspace.
>
> So you cannot arbitrarily change the units to/from milliseconds
> and jiffies.

Dave

We are not changing the units exposed to the user.  We have used and
are still using  milliseconds to communicate with the user.  What this patch 
removes is the initial conversion of user milliseconds to kernel internal 
jiffies that are stored into the SCTP globabl variables.  SCTP uses these 
globals initialize per socket variables that are in milliseconds, so ended up 
doing a useless conversion back to ms to initialize these socket variables.  
During this conversion, we ended up with truncation error that this patch 
corrects.

Again, the variables exposed by the user interface have been and remain in 
milliseconds.

-vlad
>
> -
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> ___
> Lksctp-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/lksctp-developers
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 800+ byte inlines in include/net/pkt_act.h

2006-08-22 Thread jamal
On Mon, 2006-21-08 at 16:38 -0700, David Miller wrote:
> From: jamal <[EMAIL PROTECTED]>
> Date: Mon, 21 Aug 2006 08:26:00 -0400
> 
> > As per last discussion, either Patrick McHardy or myself are going
> > to work on it - at some point. Please be patient. The other
> > alternative is: you fix it and send patches.
> 
> I'm working on it right now.  This code is really gross and needs
> to be fixed immediately.
> 
> What I'll do is define a "struct tcf_common" and have the generic
> interfaces take that as well as a "struct tcf_hashinfo *" parameter to
> deal with the individual hash tables.
> 

Sounds reasonable. May actually be close to what Patrick and I had in
discussion (I cant find my notes) i.e hashinfo would contain
table{size,index,mask,lock, and pointer to table}
After staring at the code for a minute, I think the challenges you may
face are in the conversions of: tcf_ {dump_walker(), del_walker() and
generic_walker()}

Thanks for taking this up Dave. And if you get it started and get
distracted somewhere, I could take it over.

> We define all of this templated stuff then don't even use it in
> act_police.c, we just duplicate everything!

act_police deviates from the generic layout; the intent is to allow for
that. The desire was/is for usability for whoever uses the generic
layout (read: joe-netfilter) could write a single page of code quickly
to do something powerful (like gact for example). It is turning out code
augmentation is not such a practical idea in the kernel.

cheers,
jamal

> Absolutely unbelievable.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.19 PATCH 7/7] ehea: Makefile & Kconfig

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/Kconfig  |9 +
 drivers/net/Makefile |1 +
 2 files changed, 10 insertions(+)



diff -Nurp -X dontdiff linux-2.6.18-rc4-git1/drivers/net/Kconfig 
patched_kernel/drivers/net/Kconfig
--- linux-2.6.18-rc4-git1/drivers/net/Kconfig   2006-08-06 11:20:11.0 
-0700
+++ patched_kernel/drivers/net/Kconfig  2006-08-22 06:00:49.545435280 -0700
@@ -2277,6 +2277,15 @@ config CHELSIO_T1
   To compile this driver as a module, choose M here: the module
   will be called cxgb.
 
+config EHEA
+   tristate "eHEA Ethernet support"
+   depends on IBMEBUS
+   ---help---
+ This driver supports the IBM pSeries eHEA ethernet adapter.
+
+ To compile the driver as a module, choose M here. The module
+ will be called ehea.
+
 config IXGB
tristate "Intel(R) PRO/10GbE support"
depends on PCI
diff -Nurp -X dontdiff linux-2.6.18-rc4-git1/drivers/net/Makefile 
patched_kernel/drivers/net/Makefile
--- linux-2.6.18-rc4-git1/drivers/net/Makefile  2006-08-06 11:20:11.0 
-0700
+++ patched_kernel/drivers/net/Makefile 2006-08-22 05:53:59.254861851 -0700
@@ -10,6 +10,7 @@ obj-$(CONFIG_E1000) += e1000/
 obj-$(CONFIG_IBM_EMAC) += ibm_emac/
 obj-$(CONFIG_IXGB) += ixgb/
 obj-$(CONFIG_CHELSIO_T1) += chelsio/
+obj-$(CONFIG_EHEA) += ehea/
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND 1/2] [NETLINK]: Improve string attribute validation

2006-08-22 Thread Thomas Graf
Searching for '\0' with strnchr() doesn't really work :-)

Introduces a new attribute type NLA_NUL_STRING to support NUL
terminated strings. Attributes of this kind require to carry
a terminating NUL within the maximum specified in the policy.

The `old' NLA_STRING which is not required to be NUL terminated
is extended to provide means to specify a maximum length of the
string.

Aims at easing the pain with using nla_strlcpy() on temporary
buffers.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.19.git/include/net/netlink.h
===
--- net-2.6.19.git.orig/include/net/netlink.h
+++ net-2.6.19.git/include/net/netlink.h
@@ -167,6 +167,7 @@ enum {
NLA_FLAG,
NLA_MSECS,
NLA_NESTED,
+   NLA_NUL_STRING,
__NLA_TYPE_MAX,
 };
 
@@ -175,21 +176,27 @@ enum {
 /**
  * struct nla_policy - attribute validation policy
  * @type: Type of attribute or NLA_UNSPEC
- * @minlen: Minimal length of payload required to be available
+ * @len: Type specific length of payload
  *
  * Policies are defined as arrays of this struct, the array must be
  * accessible by attribute type up to the highest identifier to be expected.
  *
+ * Meaning of `len' field:
+ *NLA_STRING   Maximum length of string
+ *NLA_NUL_STRING   Maximum length of string (excluding NUL)
+ *NLA_FLAG Unused
+ *All otherExact length of attribute payload
+ *
  * Example:
  * static struct nla_policy my_policy[ATTR_MAX+1] __read_mostly = {
  * [ATTR_FOO] = { .type = NLA_U16 },
- * [ATTR_BAR] = { .type = NLA_STRING },
- * [ATTR_BAZ] = { .minlen = sizeof(struct mystruct) },
+ * [ATTR_BAR] = { .type = NLA_STRING, len = BARSIZ },
+ * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
  * };
  */
 struct nla_policy {
u16 type;
-   u16 minlen;
+   u16 len;
 };
 
 /**
Index: net-2.6.19.git/net/netlink/attr.c
===
--- net-2.6.19.git.orig/net/netlink/attr.c
+++ net-2.6.19.git/net/netlink/attr.c
@@ -20,7 +20,6 @@ static u16 nla_attr_minlen[NLA_TYPE_MAX+
[NLA_U16]   = sizeof(u16),
[NLA_U32]   = sizeof(u32),
[NLA_U64]   = sizeof(u64),
-   [NLA_STRING]= 1,
[NLA_NESTED]= NLA_HDRLEN,
 };
 
@@ -28,7 +27,7 @@ static int validate_nla(struct nlattr *n
struct nla_policy *policy)
 {
struct nla_policy *pt;
-   int minlen = 0;
+   int minlen = 0, attrlen = nla_len(nla);
 
if (nla->nla_type <= 0 || nla->nla_type > maxtype)
return 0;
@@ -37,16 +36,46 @@ static int validate_nla(struct nlattr *n
 
BUG_ON(pt->type > NLA_TYPE_MAX);
 
-   if (pt->minlen)
-   minlen = pt->minlen;
-   else if (pt->type != NLA_UNSPEC)
-   minlen = nla_attr_minlen[pt->type];
+   switch (pt->type) {
+   case NLA_FLAG:
+   if (attrlen > 0)
+   return -ERANGE;
+   break;
+
+   case NLA_NUL_STRING:
+   if (pt->len)
+   minlen = min_t(int, attrlen, pt->len + 1);
+   else
+   minlen = attrlen;
+
+   if (!minlen || memchr(nla_data(nla), '\0', minlen) == NULL)
+   return -EINVAL;
+   /* fall through */
+
+   case NLA_STRING:
+   if (attrlen < 1)
+   return -ERANGE;
 
-   if (pt->type == NLA_FLAG && nla_len(nla) > 0)
-   return -ERANGE;
+   if (pt->len) {
+   char *buf = nla_data(nla);
 
-   if (nla_len(nla) < minlen)
-   return -ERANGE;
+   if (buf[attrlen - 1] == '\0')
+   attrlen--;
+
+   if (attrlen > pt->len)
+   return -ERANGE;
+   }
+   break;
+
+   default:
+   if (pt->len)
+   minlen = pt->len;
+   else if (pt->type != NLA_UNSPEC)
+   minlen = nla_attr_minlen[pt->type];
+
+   if (attrlen < minlen)
+   return -ERANGE;
+   }
 
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.19 PATCH 6/7] ehea: eHEA Makefile

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/ehea/Makefile |7 +++
 1 file changed, 7 insertions(+)



--- linux-2.6.18-rc4-git1-orig/drivers/net/ehea/Makefile1969-12-31 
16:00:00.0 -0800
+++ kernel/drivers/net/ehea/Makefile2006-08-22 06:05:26.965093280 -0700
@@ -0,0 +1,7 @@
+#
+# Makefile for the eHEA ethernet device driver for IBM eServer System p
+#
+
+ehea-y = ehea_main.o ehea_phyp.o ehea_qmr.o ehea_ethtool.o ehea_phyp.o
+obj-$(CONFIG_EHEA) += ehea.o
+
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.19 PATCH 5/7] ehea: main header files

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/ehea/ehea.h|  437 +
 drivers/net/ehea/ehea_hw.h |  290 +
 2 files changed, 727 insertions(+)



--- linux-2.6.18-rc4-git1-orig/drivers/net/ehea/ehea.h  1969-12-31 
16:00:00.0 -0800
+++ kernel/drivers/net/ehea/ehea.h  2006-08-22 06:05:29.284374423 -0700
@@ -0,0 +1,437 @@
+/*
+ *  linux/drivers/net/ehea/ehea.h
+ *
+ *  eHEA ethernet device driver for IBM eServer System p
+ *
+ *  (C) Copyright IBM Corp. 2006
+ *
+ *  Authors:
+ *   Christoph Raisch <[EMAIL PROTECTED]>
+ *   Jan-Bernd Themann <[EMAIL PROTECTED]>
+ *   Thomas Klein <[EMAIL PROTECTED]>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __EHEA_H__
+#define __EHEA_H__
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define DRV_NAME   "ehea"
+#define DRV_VERSION"EHEA_0019"
+
+#define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER)
+
+#define EHEA_MAX_ENTRIES_RQ1 32767
+#define EHEA_MAX_ENTRIES_RQ2 16383
+#define EHEA_MAX_ENTRIES_RQ3 16383
+#define EHEA_MAX_ENTRIES_SQ  32767
+#define EHEA_MIN_ENTRIES_QP  127
+
+#define EHEA_NUM_TX_QP 1
+
+#ifdef EHEA_SMALL_QUEUES
+#define EHEA_MAX_CQE_COUNT  1023
+#define EHEA_DEF_ENTRIES_SQ 1023
+#define EHEA_DEF_ENTRIES_RQ14095
+#define EHEA_DEF_ENTRIES_RQ21023
+#define EHEA_DEF_ENTRIES_RQ31023
+#define EHEA_SWQE_REFILL_TH  100
+#else
+#define EHEA_MAX_CQE_COUNT 32000
+#define EHEA_DEF_ENTRIES_SQ16000
+#define EHEA_DEF_ENTRIES_RQ1   32080
+#define EHEA_DEF_ENTRIES_RQ24020
+#define EHEA_DEF_ENTRIES_RQ34020
+#define EHEA_SWQE_REFILL_TH 1000
+#endif
+
+#define EHEA_MAX_ENTRIES_EQ 20
+
+#define EHEA_SG_SQ  2
+#define EHEA_SG_RQ1 1
+#define EHEA_SG_RQ2 0
+#define EHEA_SG_RQ3 0
+
+#define EHEA_MAX_PACKET_SIZE9022   /* for jumbo frames */
+#define EHEA_RQ2_PKT_SIZE   1522
+#define EHEA_LL_PKT_SIZE 256   /* low latency */
+
+#define EHEA_POLL_MAX_RWQE  1000
+
+/* Send completion signaling */
+#define EHEA_SIG_IV_LONG   4
+
+/* Protection Domain Identifier */
+#define EHEA_PD_ID0xaabcdeff
+
+#define EHEA_RQ2_THRESHOLD1
+#define EHEA_RQ3_THRESHOLD9/* use RQ3 threshold of 1522 bytes */
+
+#define EHEA_SPEED_10G 1
+#define EHEA_SPEED_1G   1000
+#define EHEA_SPEED_100M  100
+#define EHEA_SPEED_10M10
+
+/* Broadcast/Multicast registration types */
+#define EHEA_BCMC_SCOPE_ALL0x08
+#define EHEA_BCMC_SCOPE_SINGLE 0x00
+#define EHEA_BCMC_MULTICAST0x04
+#define EHEA_BCMC_BROADCAST0x00
+#define EHEA_BCMC_UNTAGGED 0x02
+#define EHEA_BCMC_TAGGED   0x00
+#define EHEA_BCMC_VLANID_ALL   0x01
+#define EHEA_BCMC_VLANID_SINGLE0x00
+
+/* Use this define to kmallocate pHYP control blocks */
+#define H_CB_ALIGNMENT 4096
+
+#define EHEA_CACHE_LINE  128
+
+/* Memory Regions */
+#define EHEA_MR_MAX_TX_PAGES   20
+#define EHEA_MR_TX_DATA_PN  3
+#define EHEA_MR_ACC_CTRL   0x0080
+#define EHEA_RWQES_PER_MR_RQ2  10
+#define EHEA_RWQES_PER_MR_RQ3  10
+
+#define EHEA_WATCH_DOG_TIMEOUT 10*HZ
+
+
+void ehea_set_ethtool_ops(struct net_device *netdev);
+
+/* utility functions */
+
+#define ehea_info(fmt, args...) \
+   printk(KERN_INFO DRV_NAME ": " fmt "\n", ## args)
+
+#define ehea_error(fmt, args...) \
+   printk(KERN_ERR DRV_NAME ": Error in %s: " fmt "\n", __func__, ## args)
+
+#ifdef DEBUG
+#define ehea_debug(fmt, args...) \
+   printk(KERN_DEBUG DRV_NAME ": " fmt, ## args)
+#else
+#define ehea_debug(fmt, args...) do {} while (0)
+#endif
+
+void ehea_dump(void *adr, int len, char *msg);
+
+#define EHEA_BMASK(pos, length) (((pos) << 16) + (length))
+
+#define EHEA_BMASK_IBM(from, to) (((63 - to) << 16) + ((to) - (from) + 1))
+
+#define EHEA_BMASK_SHIFTPOS(mask) (((mask) >> 16) & 0x)
+
+#define EHEA_BMASK_MASK(mask) \
+   (0xULL >> ((64 - (mask)) & 0x))
+
+#define EHEA_BMASK_SET(mask, value) \
+((EHEA_BMASK_MASK(mask) & ((u64)(value))) << EHEA_BMASK_SHIFTPOS(mask))
+
+#define EHEA_BMASK_GET(mask, value) \
+(EHEA_BMASK_MASK(mask) & (((u64)(value)) >> EHEA_BMASK_SHIFTPOS(mask)))
+
+/*
+ * Generic ehea page
+ */
+struct ehea_page {

[2.6.19 PATCH 3/7] ehea: queue management

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/ehea/ehea_qmr.c |  634 
 drivers/net/ehea/ehea_qmr.h |  367 +
 2 files changed, 1001 insertions(+)



--- linux-2.6.18-rc4-git1-orig/drivers/net/ehea/ehea_qmr.c  1969-12-31 
16:00:00.0 -0800
+++ kernel/drivers/net/ehea/ehea_qmr.c  2006-08-22 06:05:29.120372939 -0700
@@ -0,0 +1,634 @@
+/*
+ *  linux/drivers/net/ehea/ehea_qmr.c
+ *
+ *  eHEA ethernet device driver for IBM eServer System p
+ *
+ *  (C) Copyright IBM Corp. 2006
+ *
+ *  Authors:
+ *   Christoph Raisch <[EMAIL PROTECTED]>
+ *   Jan-Bernd Themann <[EMAIL PROTECTED]>
+ *   Thomas Klein <[EMAIL PROTECTED]>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "ehea.h"
+#include "ehea_phyp.h"
+#include "ehea_qmr.h"
+
+static void *hw_qpageit_get_inc(struct hw_queue *queue)
+{
+   void *retvalue = hw_qeit_get(queue);
+
+   queue->current_q_offset += queue->pagesize;
+   if (queue->current_q_offset > queue->queue_length) {
+   queue->current_q_offset -= queue->pagesize;
+   retvalue = NULL;
+   } else if (((u64) retvalue) & (EHEA_PAGESIZE-1)) {
+   ehea_error("not on pageboundary");
+   retvalue = NULL;
+   }
+   return retvalue;
+}
+
+static int hw_queue_ctor(struct hw_queue *queue, const u32 nr_of_pages,
+ const u32 pagesize, const u32 qe_size)
+{
+   int pages_per_kpage = PAGE_SIZE / pagesize;
+   int i;
+
+   if ((pagesize > PAGE_SIZE) || (!pages_per_kpage)) {
+   ehea_error("pagesize conflict! kernel pagesize=%d, "
+  "ehea pagesize=%d", (int)PAGE_SIZE, (int)pagesize);
+   return -EINVAL;
+   }
+
+   queue->queue_length = nr_of_pages * pagesize;
+   queue->queue_pages = kmalloc(nr_of_pages * sizeof(void*), GFP_KERNEL);
+   if (!queue->queue_pages) {
+   ehea_error("no mem for queue_pages");
+   return -ENOMEM;
+   }
+
+   /*
+* allocate pages for queue:
+* outer loop allocates whole kernel pages (page aligned) and
+* inner loop divides a kernel page into smaller hea queue pages
+*/
+   i = 0;
+   while (i < nr_of_pages) {
+   int k;
+   u8 *kpage = (u8*)get_zeroed_page(GFP_KERNEL);
+   if (!kpage)
+   goto hw_queue_ctor_exit0;
+   for (k = 0; k < pages_per_kpage && i < nr_of_pages; k++) {
+   (queue->queue_pages)[i] = (struct ehea_page *)kpage;
+   kpage += pagesize;
+   i++;
+   }
+   }
+
+   queue->current_q_offset = 0;
+   queue->qe_size = qe_size;
+   queue->pagesize = pagesize;
+   queue->toggle_state = 1;
+
+   return 0;
+
+hw_queue_ctor_exit0:
+   for (i = 0; i < nr_of_pages; i += pages_per_kpage) {
+   if (!(queue->queue_pages)[i])
+   break;
+   free_page((unsigned long)(queue->queue_pages)[i]);
+   }
+   return -ENOMEM;
+}
+
+static void hw_queue_dtor(struct hw_queue *queue)
+{
+   int pages_per_kpage = PAGE_SIZE / queue->pagesize;
+   int i;
+   int nr_pages;
+
+   if (!queue || !queue->queue_pages)
+   return;
+
+   nr_pages = queue->queue_length / queue->pagesize;
+
+   for (i = 0; i < nr_pages; i += pages_per_kpage)
+   free_page((unsigned long)(queue->queue_pages)[i]);
+
+   kfree(queue->queue_pages);
+}
+
+struct ehea_cq *ehea_create_cq(struct ehea_adapter *adapter,
+  int nr_of_cqe, u64 eq_handle, u32 cq_token)
+{
+   struct ehea_cq *cq = NULL;
+   struct h_epa epa;
+
+   u64 *cq_handle_ref;
+   u32 act_nr_of_entries;
+   u32 act_pages;
+   u64 hret;
+   int ret;
+   u32 counter;
+   void *vpage = NULL;
+   u64 rpage = 0;
+
+   cq = kzalloc(sizeof(*cq), GFP_KERNEL);
+   if (!cq) {
+   ehea_error("no mem for cq");
+   goto create_cq_exit0;
+   }
+
+   cq->attr.max_nr_of_cqes = nr_of_cqe;
+   cq->attr.cq_token = cq_token;
+   cq->attr.eq_handle = eq_handle;
+
+   cq->adapter = adapter;
+
+

[2.6.19 PATCH 2/7] ehea: pHYP interface

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/ehea/ehea_hcall.h |   51 ++
 drivers/net/ehea/ehea_phyp.c  |  834 ++
 drivers/net/ehea/ehea_phyp.h  |  479 
 3 files changed, 1364 insertions(+)



--- linux-2.6.18-rc4-git1-orig/drivers/net/ehea/ehea_phyp.c 1969-12-31 
16:00:00.0 -0800
+++ kernel/drivers/net/ehea/ehea_phyp.c 2006-08-22 06:05:28.920371128 -0700
@@ -0,0 +1,834 @@
+/*
+ *  linux/drivers/net/ehea/ehea_phyp.c
+ *
+ *  eHEA ethernet device driver for IBM eServer System p
+ *
+ *  (C) Copyright IBM Corp. 2006
+ *
+ *  Authors:
+ *   Christoph Raisch <[EMAIL PROTECTED]>
+ *   Jan-Bernd Themann <[EMAIL PROTECTED]>
+ *   Thomas Klein <[EMAIL PROTECTED]>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "ehea_phyp.h"
+
+
+static inline u16 get_order_of_qentries(u16 queue_entries)
+{
+   u8 ld = 1;  /*  logarithmus dualis */
+   while (((1U << ld) - 1) < queue_entries)
+   ld++;
+   return ld - 1;
+}
+
+
+/* Defines for H_CALL H_ALLOC_RESOURCE */
+#define H_ALL_RES_TYPE_QP1
+#define H_ALL_RES_TYPE_CQ2
+#define H_ALL_RES_TYPE_EQ3
+#define H_ALL_RES_TYPE_MR5
+#define H_ALL_RES_TYPE_MW6
+
+static long ehea_hcall_9arg_9ret(unsigned long opcode,
+unsigned long arg1, unsigned long arg2,
+unsigned long arg3, unsigned long arg4,
+unsigned long arg5, unsigned long arg6,
+unsigned long arg7, unsigned long arg8,
+unsigned long arg9, unsigned long *out1,
+unsigned long *out2,unsigned long *out3,
+unsigned long *out4,unsigned long *out5,
+unsigned long *out6,unsigned long *out7,
+unsigned long *out8,unsigned long *out9)
+{
+   long hret = H_HARDWARE;
+   int i, sleep_msecs;
+
+   for (i = 0; i < 5; i++) {
+   hret = plpar_hcall_9arg_9ret(opcode,arg1, arg2, arg3, arg4,
+arg5, arg6, arg7, arg8, arg9, out1,
+out2, out3, out4, out5, out6, out7,
+out8, out9);
+   if (H_IS_LONG_BUSY(hret)) {
+   sleep_msecs = get_longbusy_msecs(hret);
+   msleep_interruptible(sleep_msecs);
+   continue;
+   }
+
+   if (hret < H_SUCCESS)
+   ehea_error("op=%lx hret=%lx "
+  "i1=%lx i2=%lx i3=%lx i4=%lx i5=%lx i6=%lx "
+  "i7=%lx i8=%lx i9=%lx "
+  "o1=%lx o2=%lx o3=%lx o4=%lx o5=%lx o6=%lx "
+  "o7=%lx o8=%lx o9=%lx",
+  opcode, hret, arg1, arg2, arg3, arg4, arg5,
+  arg6, arg7, arg8, arg9, *out1, *out2, *out3,
+  *out4, *out5, *out6, *out7, *out8, *out9);
+   return hret;
+   }
+   return H_BUSY;
+}
+
+u64 ehea_h_query_ehea_qp(const u64 hcp_adapter_handle, const u8 qp_category,
+const u64 qp_handle, const u64 sel_mask, void *cb_addr)
+{
+   u64 dummy;
+
+   if u64)cb_addr) & (PAGE_SIZE - 1)) != 0) {
+   ehea_error("not on pageboundary");
+   return H_PARAMETER;
+   }
+
+   return ehea_hcall_9arg_9ret(H_QUERY_HEA_QP,
+   hcp_adapter_handle, /* R4 */
+   qp_category,/* R5 */
+   qp_handle,  /* R6 */
+   sel_mask,   /* R7 */
+   virt_to_abs(cb_addr),   /* R8 */
+   0, 0, 0, 0, /* R9-R12 */
+   &dummy, /* R4 */
+   &dummy, /* R5 */
+   &dummy,   

[2.6.19 PATCH 4/7] ehea: ethtool interface

2006-08-22 Thread Jan-Bernd Themann
Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> 


 drivers/net/ehea/ehea_ethtool.c |  244 
 1 file changed, 244 insertions(+)



--- linux-2.6.18-rc4-git1-orig/drivers/net/ehea/ehea_ethtool.c  1969-12-31 
16:00:00.0 -0800
+++ kernel/drivers/net/ehea/ehea_ethtool.c  2006-08-22 06:05:29.197373636 
-0700
@@ -0,0 +1,244 @@
+/*
+ *  linux/drivers/net/ehea/ehea_ethtool.c
+ *
+ *  eHEA ethernet device driver for IBM eServer System p
+ *
+ *  (C) Copyright IBM Corp. 2006
+ *
+ *  Authors:
+ *   Christoph Raisch <[EMAIL PROTECTED]>
+ *   Jan-Bernd Themann <[EMAIL PROTECTED]>
+ *   Thomas Klein <[EMAIL PROTECTED]>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "ehea.h"
+#include "ehea_phyp.h"
+
+
+static int netdev_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   u64 hret;
+   struct ehea_port *port = netdev_priv(dev);
+   struct ehea_adapter *adapter = port->adapter;
+   struct hcp_ehea_port_cb4 *cb4;
+
+   cb4 = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
+   if (!cb4) {
+   ehea_error("no mem for cb4");
+   return -ENOMEM;
+   }
+
+   hret = ehea_h_query_ehea_port(adapter->handle, port->logical_port_id,
+ H_PORT_CB4, H_PORT_CB4_ALL, cb4);
+   if (hret != H_SUCCESS) {
+   ehea_error("query_ehea_port failed");
+   kfree(cb4);
+   return -EIO;
+   }
+
+   if (netif_msg_hw(port))
+   ehea_dump(cb4, sizeof(*cb4), "netdev_get_settings");
+
+   if (netif_carrier_ok(dev)) {
+   switch(cb4->port_speed){
+   case H_PORT_SPEED_10M_H:
+   cmd->speed = SPEED_10;
+   cmd->duplex = DUPLEX_HALF;
+   break;
+   case H_PORT_SPEED_10M_F:
+   cmd->speed = SPEED_10;
+   cmd->duplex = DUPLEX_FULL;
+   break;
+   case H_PORT_SPEED_100M_H:
+   cmd->speed = SPEED_100;
+   cmd->duplex = DUPLEX_HALF;
+   break;
+   case H_PORT_SPEED_100M_F:
+   cmd->speed = SPEED_100;
+   cmd->duplex = DUPLEX_FULL;
+   break;
+   case H_PORT_SPEED_1G_F:
+   cmd->speed = SPEED_1000;
+   cmd->duplex = DUPLEX_FULL;
+   break;
+   case H_PORT_SPEED_10G_F:
+   cmd->speed = SPEED_1;
+   cmd->duplex = DUPLEX_FULL;
+   break;
+   }
+   } else {
+   cmd->speed = -1;
+   cmd->duplex = -1;
+   }
+
+   cmd->supported = (SUPPORTED_1baseT_Full | SUPPORTED_1000baseT_Full
+  | SUPPORTED_100baseT_Full |  SUPPORTED_100baseT_Half
+  | SUPPORTED_10baseT_Full | SUPPORTED_10baseT_Half
+  | SUPPORTED_Autoneg | SUPPORTED_FIBRE);
+
+   cmd->advertising = (ADVERTISED_1baseT_Full | ADVERTISED_Autoneg
+| ADVERTISED_FIBRE);
+
+   cmd->port = PORT_FIBRE;
+   cmd->autoneg = AUTONEG_ENABLE;
+
+   kfree(cb4);
+   return 0;
+}
+
+static void netdev_get_drvinfo(struct net_device *dev,
+  struct ethtool_drvinfo *info)
+{
+   strlcpy(info->driver, DRV_NAME, sizeof(info->driver) - 1);
+   strlcpy(info->version, DRV_VERSION, sizeof(info->version) - 1);
+}
+
+static u32 netdev_get_msglevel(struct net_device *dev)
+{
+   struct ehea_port *port = netdev_priv(dev);
+   return port->msg_enable;
+}
+
+static void netdev_set_msglevel(struct net_device *dev, u32 value)
+{
+   struct ehea_port *port = netdev_priv(dev);
+   port->msg_enable = value;
+}
+
+static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = {
+   {"poll_max_processed"},
+   {"queue_stopped"},
+   {"min_swqe_avail"},
+   {"poll_receive_err"},
+   {"pkt_send"},
+   {"pkt_xmit"},
+   {"send_tasklet"},
+   {"ehea_poll"},
+   {"nwqe"},
+   {"swqe_available_0"},
+   {"sig_comp_iv"},
+   {"rxo"},
+   {"rx64"},
+   {"rx65"},
+   {"rx1

[2.6.19 PATCH 0/7] ehea: IBM eHEA Ethernet Device Driver

2006-08-22 Thread Jan-Bernd Themann
Hi,

this is our current version of the IBM eHEA Ethernet Device Driver.
Thanks for the quick and helpful comments so far. Further comments
are highly appreciated.

Things we are currently working on:
- Implementation of promiscious mode support


Thanks,
Jan-Bernd

Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
Changelog-by:  Jan-Bernd Themann <[EMAIL PROTECTED]>

Differences to patch set http://www.spinics.net/lists/netdev/msg12326.html

Changelog:

- Error recovery
- improvements according to mailing list comments


 drivers/net/Kconfig |9 
 drivers/net/Makefile|1 
 drivers/net/ehea/Makefile   |7 
 drivers/net/ehea/ehea.h |  437 ++
 drivers/net/ehea/ehea_ethtool.c |  244 +++
 drivers/net/ehea/ehea_hcall.h   |   51 
 drivers/net/ehea/ehea_hw.h  |  290 
 drivers/net/ehea/ehea_main.c| 2636 
 drivers/net/ehea/ehea_phyp.c|  834 
 drivers/net/ehea/ehea_phyp.h|  479 +++
 drivers/net/ehea/ehea_qmr.c |  634 +
 drivers/net/ehea/ehea_qmr.h |  367 +
 12 files changed, 5989 insertions(+)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [NETLINK]: Improve string attribute validation

2006-08-22 Thread Thomas Graf
Introduces a new attribute type NLA_NUL_STRING to support NUL
terminated strings. Attributes of this kind require to carry
a terminating NUL within the maximum specified in the policy.

The `old' NLA_STRING which is not required to be NUL terminated
is extended to provide means to specify a maximum length of the
string.

Aims at easing the pain with using nla_strlcpy() on temporary
buffers.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.19.git/include/net/netlink.h
===
--- net-2.6.19.git.orig/include/net/netlink.h
+++ net-2.6.19.git/include/net/netlink.h
@@ -167,6 +167,7 @@ enum {
NLA_FLAG,
NLA_MSECS,
NLA_NESTED,
+   NLA_NUL_STRING,
__NLA_TYPE_MAX,
 };
 
@@ -175,21 +176,27 @@ enum {
 /**
  * struct nla_policy - attribute validation policy
  * @type: Type of attribute or NLA_UNSPEC
- * @minlen: Minimal length of payload required to be available
+ * @len: Type specific length of payload
  *
  * Policies are defined as arrays of this struct, the array must be
  * accessible by attribute type up to the highest identifier to be expected.
  *
+ * Meaning of `len' field:
+ *NLA_STRING   Maximum length of string
+ *NLA_NUL_STRING   Maximum length of string (excluding NUL)
+ *NLA_FLAG Unused
+ *All otherExact length of attribute payload
+ *
  * Example:
  * static struct nla_policy my_policy[ATTR_MAX+1] __read_mostly = {
  * [ATTR_FOO] = { .type = NLA_U16 },
- * [ATTR_BAR] = { .type = NLA_STRING },
- * [ATTR_BAZ] = { .minlen = sizeof(struct mystruct) },
+ * [ATTR_BAR] = { .type = NLA_STRING, len = BARSIZ },
+ * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
  * };
  */
 struct nla_policy {
u16 type;
-   u16 minlen;
+   u16 len;
 };
 
 /**
Index: net-2.6.19.git/net/netlink/attr.c
===
--- net-2.6.19.git.orig/net/netlink/attr.c
+++ net-2.6.19.git/net/netlink/attr.c
@@ -20,7 +20,6 @@ static u16 nla_attr_minlen[NLA_TYPE_MAX+
[NLA_U16]   = sizeof(u16),
[NLA_U32]   = sizeof(u32),
[NLA_U64]   = sizeof(u64),
-   [NLA_STRING]= 1,
[NLA_NESTED]= NLA_HDRLEN,
 };
 
@@ -28,7 +27,7 @@ static int validate_nla(struct nlattr *n
struct nla_policy *policy)
 {
struct nla_policy *pt;
-   int minlen = 0;
+   int minlen = 0, attrlen = nla_len(nla);
 
if (nla->nla_type <= 0 || nla->nla_type > maxtype)
return 0;
@@ -37,16 +36,46 @@ static int validate_nla(struct nlattr *n
 
BUG_ON(pt->type > NLA_TYPE_MAX);
 
-   if (pt->minlen)
-   minlen = pt->minlen;
-   else if (pt->type != NLA_UNSPEC)
-   minlen = nla_attr_minlen[pt->type];
+   switch (pt->type) {
+   case NLA_FLAG:
+   if (attrlen > 0)
+   return -ERANGE;
+   break;
+
+   case NLA_NUL_STRING:
+   if (pt->len)
+   minlen = min_t(int, attrlen, pt->len + 1);
+   else
+   minlen = attrlen;
+
+   if (!minlen || strnchr(nla_data(nla), minlen, '\0') == NULL)
+   return -EINVAL;
+   /* fall through */
+
+   case NLA_STRING:
+   if (attrlen < 1)
+   return -ERANGE;
 
-   if (pt->type == NLA_FLAG && nla_len(nla) > 0)
-   return -ERANGE;
+   if (pt->len) {
+   char *buf = nla_data(nla);
 
-   if (nla_len(nla) < minlen)
-   return -ERANGE;
+   if (buf[attrlen - 1] == '\0')
+   attrlen--;
+
+   if (attrlen > pt->len)
+   return -ERANGE;
+   }
+   break;
+
+   default:
+   if (pt->len)
+   minlen = pt->len;
+   else if (pt->type != NLA_UNSPEC)
+   minlen = nla_attr_minlen[pt->type];
+
+   if (attrlen < minlen)
+   return -ERANGE;
+   }
 
return 0;
 }

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHSET] Validation for netlink string attributes

2006-08-22 Thread Thomas Graf
Validation of netlink string attributes was weak, forcing everyone
to use nla_strlcpy() to copy the attribute into a temporary buffer.

This patchset implements length validation checks for existing
NLA_STRING attributes and adds a new type NLA_NUL_STRING for
NUL terminated strings.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][REPOST] WAN: fix C101 card carrier handling

2006-08-22 Thread Krzysztof Halasa
Hi,

One of my recent changes broke C101 carrier handling, this patch
fixes it. Also fixes an old TX underrun checking bug.

2.6.18 material. Please apply.
Thanks.

Signed-off-by: Krzysztof Halasa <[EMAIL PROTECTED]>

diff --git a/drivers/net/wan/c101.c b/drivers/net/wan/c101.c
index 435e91e..6b63b35 100644
--- a/drivers/net/wan/c101.c
+++ b/drivers/net/wan/c101.c
@@ -118,7 +118,7 @@ #include "hd6457x.c"
 
 static inline void set_carrier(port_t *port)
 {
-   if (!sca_in(MSCI1_OFFSET + ST3, port) & ST3_DCD)
+   if (!(sca_in(MSCI1_OFFSET + ST3, port) & ST3_DCD))
netif_carrier_on(port_to_dev(port));
else
netif_carrier_off(port_to_dev(port));
@@ -127,10 +127,10 @@ static inline void set_carrier(port_t *p
 
 static void sca_msci_intr(port_t *port)
 {
-   u8 stat = sca_in(MSCI1_OFFSET + ST1, port); /* read MSCI ST1 status */
+   u8 stat = sca_in(MSCI0_OFFSET + ST1, port); /* read MSCI ST1 status */
 
-   /* Reset MSCI TX underrun status bit */
-   sca_out(stat & ST1_UDRN, MSCI0_OFFSET + ST1, port);
+   /* Reset MSCI TX underrun and CDCD (ignored) status bit */
+   sca_out(stat & (ST1_UDRN | ST1_CDCD), MSCI0_OFFSET + ST1, port);
 
if (stat & ST1_UDRN) {
struct net_device_stats *stats = hdlc_stats(port_to_dev(port));
@@ -138,6 +138,7 @@ static void sca_msci_intr(port_t *port)
stats->tx_fifo_errors++;
}
 
+   stat = sca_in(MSCI1_OFFSET + ST1, port); /* read MSCI1 ST1 status */
/* Reset MSCI CDCD status bit - uses ch#2 DCD input */
sca_out(stat & ST1_CDCD, MSCI1_OFFSET + ST1, port);
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validation

2006-08-22 Thread Thomas Graf
Converts existing NLA_STRING attributes to use the new
validation features, saving a couple of temporary buffers.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.19.git/net/core/rtnetlink.c
===
--- net-2.6.19.git.orig/net/core/rtnetlink.c
+++ net-2.6.19.git/net/core/rtnetlink.c
@@ -369,8 +369,8 @@ static int rtnl_dump_ifinfo(struct sk_bu
 }
 
 static struct nla_policy ifla_policy[IFLA_MAX+1] __read_mostly = {
-   [IFLA_IFNAME]   = { .type = NLA_STRING },
-   [IFLA_MAP]  = { .minlen = sizeof(struct rtnl_link_ifmap) },
+   [IFLA_IFNAME]   = { .type = NLA_STRING, .len = IFNAMSIZ-1 },
+   [IFLA_MAP]  = { .len = sizeof(struct rtnl_link_ifmap) },
[IFLA_MTU]  = { .type = NLA_U32 },
[IFLA_TXQLEN]   = { .type = NLA_U32 },
[IFLA_WEIGHT]   = { .type = NLA_U32 },
@@ -390,9 +390,8 @@ static int rtnl_setlink(struct sk_buff *
if (err < 0)
goto errout;
 
-   if (tb[IFLA_IFNAME] &&
-   nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ) >= IFNAMSIZ)
-   return -EINVAL;
+   if (tb[IFLA_IFNAME])
+   nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
 
err = -EINVAL;
ifm = nlmsg_data(nlh);
Index: net-2.6.19.git/net/netlink/genetlink.c
===
--- net-2.6.19.git.orig/net/netlink/genetlink.c
+++ net-2.6.19.git/net/netlink/genetlink.c
@@ -455,7 +455,8 @@ static struct sk_buff *ctrl_build_msg(st
 
 static struct nla_policy ctrl_policy[CTRL_ATTR_MAX+1] __read_mostly = {
[CTRL_ATTR_FAMILY_ID]   = { .type = NLA_U16 },
-   [CTRL_ATTR_FAMILY_NAME] = { .type = NLA_STRING },
+   [CTRL_ATTR_FAMILY_NAME] = { .type = NLA_NUL_STRING,
+   .len = GENL_NAMSIZ - 1 },
 };
 
 static int ctrl_getfamily(struct sk_buff *skb, struct genl_info *info)
@@ -470,12 +471,9 @@ static int ctrl_getfamily(struct sk_buff
}
 
if (info->attrs[CTRL_ATTR_FAMILY_NAME]) {
-   char name[GENL_NAMSIZ];
-
-   if (nla_strlcpy(name, info->attrs[CTRL_ATTR_FAMILY_NAME],
-   GENL_NAMSIZ) >= GENL_NAMSIZ)
-   goto errout;
+   char *name;
 
+   name = nla_data(info->attrs[CTRL_ATTR_FAMILY_NAME]);
res = genl_family_find_byname(name);
}
 
Index: net-2.6.19.git/net/core/fib_rules.c
===
--- net-2.6.19.git.orig/net/core/fib_rules.c
+++ net-2.6.19.git/net/core/fib_rules.c
@@ -161,9 +161,6 @@ int fib_nl_newrule(struct sk_buff *skb, 
if (err < 0)
goto errout;
 
-   if (tb[FRA_IFNAME] && nla_len(tb[FRA_IFNAME]) > IFNAMSIZ)
-   goto errout;
-
rule = kzalloc(ops->rule_size, GFP_KERNEL);
if (rule == NULL) {
err = -ENOMEM;
@@ -177,10 +174,7 @@ int fib_nl_newrule(struct sk_buff *skb, 
struct net_device *dev;
 
rule->ifindex = -1;
-   if (nla_strlcpy(rule->ifname, tb[FRA_IFNAME],
-   IFNAMSIZ) >= IFNAMSIZ)
-   goto errout_free;
-
+   nla_strlcpy(rule->ifname, tb[FRA_IFNAME], IFNAMSIZ);
dev = __dev_get_by_name(rule->ifname);
if (dev)
rule->ifindex = dev->ifindex;
Index: net-2.6.19.git/net/decnet/dn_rules.c
===
--- net-2.6.19.git.orig/net/decnet/dn_rules.c
+++ net-2.6.19.git/net/decnet/dn_rules.c
@@ -111,7 +111,7 @@ errout:
 }
 
 static struct nla_policy dn_fib_rule_policy[FRA_MAX+1] __read_mostly = {
-   [FRA_IFNAME]= { .type = NLA_STRING },
+   [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
[FRA_PRIORITY]  = { .type = NLA_U32 },
[FRA_SRC]   = { .type = NLA_U16 },
[FRA_DST]   = { .type = NLA_U16 },
Index: net-2.6.19.git/net/ipv4/devinet.c
===
--- net-2.6.19.git.orig/net/ipv4/devinet.c
+++ net-2.6.19.git/net/ipv4/devinet.c
@@ -85,7 +85,7 @@ static struct nla_policy ifa_ipv4_policy
[IFA_ADDRESS]   = { .type = NLA_U32 },
[IFA_BROADCAST] = { .type = NLA_U32 },
[IFA_ANYCAST]   = { .type = NLA_U32 },
-   [IFA_LABEL] = { .type = NLA_STRING },
+   [IFA_LABEL] = { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
 };
 
 static void rtmsg_ifa(int event, struct in_ifaddr *, struct nlmsghdr *, u32);
Index: net-2.6.19.git/net/ipv4/fib_rules.c
===
--- net-2.6.19.git.orig/net/ipv4/fib_rules.c
+++ net-2.6.19.git/net/ipv4/fib_rules.c
@@ -178,7 +178,7 @@ static struct fib_table *fib_empty_table
 }
 
 static s

Re: [PATCH] kevent_user: remove non-chardev interface

2006-08-22 Thread Evgeniy Polyakov
On Tue, Aug 22, 2006 at 01:27:31PM +0100, Christoph Hellwig ([EMAIL PROTECTED]) 
wrote:
> On Tue, Aug 22, 2006 at 04:17:10PM +0400, Evgeniy Polyakov wrote:
> > I personally do not have objections against it, but it introduces
> > additional complexies - one needs to open /dev/kevent and then perform
> > syscalls on top of returuned file descriptor.
> 
> it disalllows
> 
> int fd = sys_kevent_ctl(, KEVENT_CTL_INIT, , );
> 
> in favour of only
> 
> int fd = open("/dev/kevent", O_SOMETHING);
> 
> which doesn't seem like a problem, especially as I really badly hope
> no one will use the syscalls but some library instead.

Yep, exactly about above open/kevent_ctl I'm talking.
I still have a system which has ioctl() based kevent setup, and it
works - I really do not want to rise another flamewar about which
approach is better. If no one will complain until tomorrow I will commit
it.

> In addition to that I'm researching whether there's a better way to
> implement the other functionality instead of the two syscalls.  But I'd
> rather let code speak, so wait for some patches from me on that.

There were implementation with pure ioctl() and with one syscall for all
oprations (and control block embedded in it), all were rejected in
favour of two syscalls, so I'm waiting for your patches.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >